There are (mostly) two types of concurrency bugs: Non deadlock and deadlock bugs.
Of the non-deadlock bugs, the main types are:
- Atomicity Violation Bugs
- Order Violation Bugs
The rest of this section deals with deadlock bugs.
There are 4 conditions that must be satisfied for a deadlock bug:
- Mutual exclusion
- Hold-and-wait: They hold the lock while waiting for other resources
- No preemption: Resources cannot be forcible removed from threads that hold them.
- Circular wait: There is a circular chain of threads s.t. each holds one or more resources that are being requested by the next thread in the chain.
To solve a deadlock bug, you need to remove only one of these conditions.
Circular Wait
This is perhaps the most common way to prevent such bugs. One way to do this is to provide a total ordering on lock acquisition. If you always acquire locks in a certain order, you won’t get a deadlock.
When you have more then one lock, use a partial ordering.
Be careful though: If you have two locks, and a function:
func(lock1, lock2) where func will grab these locks, it may seem as if they are always grabbed in the same order, but if you have a call func(A, B) and another call func(B, A), then you don’t!
One way to get around this is to acquire them in the order of their memory location! Do it in the order of their address in memory.
Hold and Wait
You can avoid this by acquiring all locks atomically. How? Introduce a new lock for this. Always get this new lock before acquiring the other locks, and then release it at the end.
Now even if some other thread acquires the locks in another order, we won’t get a deadlock as long as the first lock it acquired was the new lock.
pthread_mutex_lock(prevention); // begin acquisition pthread_mutex_lock(L1); pthread_mutex_lock(L2); ... pthread_mutex_unlock(prevention); // end
The downside? You need to know exactly which locks need to be held to acquire them at once. And of course, this solution also seriously limits concurrency!
No Preemption
Some APIs provide something like trylock which will grab the lock if available, or return an error code. You can then try again later.
An example:
top: pthread_mutex_lock(L1); if (pthread_mutex_trylock(L2) != 0) { pthread_mutex_unlock(L1); goto top; }
It doesn’t matter what order threads acquire L1 and L2, there will be no deadlock.
You can have a livelock, though. This will happen if the threads keep trying to get the locks and are timed such that they always are blocked.
One solution to the livelock problem: Add a random delay before trying again.
Keep in mind that we’re not really dealing with the problem. If we’re going to release the lock, we need to “undo” everything we did while the lock was held (release memory, etc). The issue is that some of these details may be buried in some API due to the desire for having encapsulation that you may not know all the steps you need to do when releasing the lock after a failed attempt.
Mutual Exclusion
Could you just redesign not to have a mutex at all? In some cases, yes. These are called lock-free solutions. Sometimes with some help from HW.
Suppose your hardware had an atomic CompareAndSwap instruction that checks if the value at an address is as expected, and then changes it to a new value. It returns 1 if it succeeded, or 0 if it didn’t (i.e. the value was not the expected value).
Now look at this:
void AtomicIncrement(int *value, int amount) { do { int old = *value; } while (CompareAndSwap(value, old, old + amount) == 0); }
This will get the value at the address pointed to by value. It then checks if the value hasn’t changed, and increments it by an amount. If the value did change, it tries again before succeeding.
The big picture goal is it’s trying to increment the value by a certain amount, but it will wait till the value is stable before doing it. No lock needed!
Another example: Say you want to insert a node at the head of a list. Here is a lock-free solution:
void insert(int value) { node_t *n = malloc(sizeof(node_t)); assert(n != NULL); n->value = value; do { n->next = head; } while (CompareAndSwap(&head, n->next, n) == 0); }
Another approach to avoid deadlock is by clever scheduling. This is probably only useful in environments like embedded systems, where you may know the universe well enough to know orderings, etc to build an appropriate scheduler.
Look up the Banker’s Algorithm!
One last way to avoid deadlock is “detect and recover”. This is to allow a deadlock to occur occasionally, but then to be able to detect and recover from it. A trivial example is that if this happens in the OS, say once a year, simply allow it to happen and reboot. It’s not intellectually satisfying, but is a perfectly valid engineering solution for many domains.