Multi-copy atomicity
An access is multi-copy atomic (also called store/write atomicity) when there is a single logical moment where a store issued by some CPU will become visible to all other CPUs. It is allowed that the issuing CPU can see the store early e.g. due to store to load forwarding and therefore the name multi-copy atomic is a bit misleading.
The consequence of multi-copy atomicity is that stores issued by different CPUs to different addresses will be seen by other CPUs in the same order. Therefore multi-copy atomicity guarantees that some total order over the stores exists. Examples of multi-copy atomic systems are X86 with its TSO (Total Store Order) memory model and ARMv8.
When a single logical moment does not exist, the access is called non-multi-copy atomic. Some causes of accesses to become non-multi-copy atomic:- the store buffers are shared with a strict subset of the CPUs
- a CPU that commits a store to the cache, doesn't wait for the cache line to be invalidated on all other CPUs
IRIW
The litmus test for multi-copy atomicity is the IRIW litmus test: independent reads of independent writes. The test is shown below:CPU1: A=1 CP2: B=1 CPU3: r1=A [LoadLoad] r2=B CPU4: r3=B [LoadLoad] r4=ACould it be that r1=1,r2=0,r3=1,r4=0? So could it be that the stores to A and B are seen in different orders? If accesses are multi-copy atomic, then this can't happen because stores to different addresses issued by different CPUs need to be seen in the same order. But when accesses are non-multi-copy atomic this can happen because stores issued by different CPUs to different addresses can be seen in different orders.
C++ atomic
With C++11 a new memory model was introduced including atomics. For this post, I'll only look at the loads/stores operations on the atomic and ignore RMW operations like compare_exchange_weak/strong. A load/store provides the following guarantees:- atomicity: the operations are indivisible and you don't end up with a value that is the combination of multiple reads/writes
- synchronization: depending in the memory_order it will order surrounding loads/stores
By default all loads/stores use memory_order_seq_cst.
The problem in my understanding was that I assumed that multi-copy atomicity was part of the atomicity guarantee of the atomic independent of the memory_order. This is incorrect.
Only in the case of memory_order_seq_cst, multi-copy atomicity is implied because sequential consistency requires that some total order over all loads/stores exists and hence a total order over the stores exists. Therefore if the IRIW example would be implemented using memory_order_seq_cst, then the observation isn't possible.
But when a memory_order below memory_order_seq_cst is used, multi-copy atomicity is not implied. And as a consequence, if the IRIW example would be implemented using memory_order_release stores and memory_order_acquire loads, the given observation is possible.
Conclusion
Multi-copy atomicity and atomicity are different concepts and atomicity doesn't imply multi-copy atomicity. They are overloaded terms and it is very easy to get confused.Update
It seems there are conflicting single-copy atomicity definitions as can be seen in this Twitter discussion. In some academic papers, single-copy atomic is a stricter version of multi-copy atomic whereby even the issuing CPU needs to see the store in the same order. But in the ARM and PowerPC reference manuals, it effectively means that when you load/store some value, you don't end up with a value that is a mix of multiple loads/stores. It doesn't say anything about the order of stores issued by different CPUs to different addresses.
The original version of this post was based on the stricter version, but it seems that the ARM/PowerPC definition is more widely used and therefore I updated the post. I would like to thank Alexey Shipilev and Travis Downs to point out the conflict.
Geen opmerkingen:
Een reactie posten