Concurrency and lock-free basics
Concurrency is the property where multiple computations are executed simultaneously. In the domain of modern game programming, that usually means having multiple threads running on multiple cores, potentially located on multiple physically separated processors. One way to ensure progression in the executing threads is to avoid critical sections and locks on shared resources, and design your code to use lock-free algorithms.
The definition of the term "lock-free" has changed over time and is now used to denote the property of an algorithm where the suspension of one thread will not block the potential progress of other threads. It is important to note though that lock-free does NOT imply wait-free. A thread can still stall in a lock-free algorithm, but only if at least one other thread has made some progress. In comparison, using a lock on a shared resource will stall all other threads trying to access that resource if the thread holding the lock is suspended during that critical section.
Lock-free algorithms are usually implemented using atomic read-modify-write primitives, such as CAS (compare-and-swap). An atomic instruction is instantaneous, or at least appears that way to the rest of the system, and is usually defined with a succeed-or-fail interface where a failure to perform the operation has no visible side-effects on the system. In the case of a CAS, a failure occurs when another thread has changed the value, i.e made progress. A quick pseudo-C-code example of updating a shared resource using an atomic CAS:
Basically the thread busy-loops while waiting for the atomic operation to succeed. The CAS operation only succeeds and stores the new_value if the current value of the resource equals the old_value passed in. It's not wait-free, but if the thread stalls (loops), it means that some other thread has succeeded in the operation it's performing on the shared resource.
A potential pitfall of the CAS operation is known as the ABA problem. If some other thread performs some operation that stores the value B and then restores the value A, the thread performing the CAS will think the value is not modified and the CAS will succeed. If the value is some bit pattern that has some other instrinsic meaning (like a memory address, or a wrapping counter), you could be in serious problem. The solution in these cases is usually to use a double-length variable, for example using a 64-bit value for a 32-bit counter, and using the remaining bits as an "operation counter" which is increased every time an operation is performed. This will solve the ABA problem since the restoration of value A will have a different bit pattern due to the increasing counter in the top 32 bits.
So, now that we have a basic atomic operation, what can we do with it? As an example, let's compare the implementation of a LIFO/stack. A quick lock-based algorithm would look like:
In comparison, a lock-free implementation would look like
The definition of the term "lock-free" has changed over time and is now used to denote the property of an algorithm where the suspension of one thread will not block the potential progress of other threads. It is important to note though that lock-free does NOT imply wait-free. A thread can still stall in a lock-free algorithm, but only if at least one other thread has made some progress. In comparison, using a lock on a shared resource will stall all other threads trying to access that resource if the thread holding the lock is suspended during that critical section.
Lock-free algorithms are usually implemented using atomic read-modify-write primitives, such as CAS (compare-and-swap). An atomic instruction is instantaneous, or at least appears that way to the rest of the system, and is usually defined with a succeed-or-fail interface where a failure to perform the operation has no visible side-effects on the system. In the case of a CAS, a failure occurs when another thread has changed the value, i.e made progress. A quick pseudo-C-code example of updating a shared resource using an atomic CAS:
var my_shared_resource; var new_value; do { old_value = my_shared_resource; new_value = calculate_something( old_value ); } while( !atomic_cas( &my_shared_resource, old_value, new_value ) );
Basically the thread busy-loops while waiting for the atomic operation to succeed. The CAS operation only succeeds and stores the new_value if the current value of the resource equals the old_value passed in. It's not wait-free, but if the thread stalls (loops), it means that some other thread has succeeded in the operation it's performing on the shared resource.
A potential pitfall of the CAS operation is known as the ABA problem. If some other thread performs some operation that stores the value B and then restores the value A, the thread performing the CAS will think the value is not modified and the CAS will succeed. If the value is some bit pattern that has some other instrinsic meaning (like a memory address, or a wrapping counter), you could be in serious problem. The solution in these cases is usually to use a double-length variable, for example using a 64-bit value for a 32-bit counter, and using the remaining bits as an "operation counter" which is increased every time an operation is performed. This will solve the ABA problem since the restoration of value A will have a different bit pattern due to the increasing counter in the top 32 bits.
So, now that we have a basic atomic operation, what can we do with it? As an example, let's compare the implementation of a LIFO/stack. A quick lock-based algorithm would look like:
var top var mutex function push( item ) { lock( mutex ) item->next = top top = item unlock( mutex ) } function pop() { lock( mutex ) var item = top top = item->next unlock( mutex ) return item }
In comparison, a lock-free implementation would look like
var top function push( item ) { var old_top do { old_top = top item->next = old_top } while( !atomic_cas( &top, old_top, item ) ) } function pop() { var item, next_top do { item = top next_top = item->next } while( !atomic_cas( &top, item, next_top ) ) return item }