Oleg Zabluda's blog
Thursday, September 27, 2012
 
Brief history of Intel's x86 memory model (MM) documentation.
Brief history of Intel's x86 memory model (MM) documentation.

Jun 30, 2005 - Dark ages. Nobody knows anything, as evidenced by an expert and guru Paul McKenney' excellent paper in Linux Journal listing x86 MM too conservatively, because Intel wouldn't confirm or deny anything.
http://www.linuxjournal.com/article/8211?page=0,1
(Since then updated in his excellent book "Is Parallel Programming Hard, And, If So, What Can You Do About It?" section "C.7. MEMORY-BARRIER INSTRUCTIONS FOR SPECIFIC CPUS"
http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
Last Intel SDM like this was (rev-22, Nov. 2006)

Aug 2007 - Intel released White Paper (later merged into Intel SDM (including rev.26–28), and AMD put into the manual, documentation of their memory ordering guarantees (causal consistency - CC aka transitive visibility i.e if CPU 0 sees a store by CPU 1, then CPU 0 is guaranteed to see all stores that CPU 1 saw prior to its store.), saying that using XCHG for load or store, does guarantee sequential consistency (SC).
http://www.justsoftwaresolutions.co.uk/threading/intel-and-amd-memory-ordering-defined.html

but "clarifying" that putting MFENCE between ordinary aligned store and load does not (and never did). because there is no Total Store Order (TSO). Hilarity ensues. Have to use LOCK XCHG for stores instead, but it does not work for arbitrary large objects, rendering Java "volatile" unimplementable on 32 bit, because Java requres 64 bit fields:
https://blogs.oracle.com/dave/entry/java_memory_model_concerns_on
http://www.justsoftwaresolutions.co.uk/threading/intel-memory-ordering-and-c++-memory-model.html

However, it contained bugs (behavior disallowed by x86-CC [1], but observed with actual CPUs) having to do with CPUs reading its own writes early from its own store buffer, before the stores are visible to other CPUs (store forwarding)
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-745.pdf (2009)
http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf (2010)

(Nov. 2008) Intel SDM rev-29 and AMD yet again clarified their memory models. which now guarantees TSO, but not CC, meaning that MFENCE is not even needed for CC acq-rel (?), only for SC. Whew.
https://blogs.oracle.com/dave/entry/x86_platform_memory_model_clarifications

[1] Example n6 shows that proc0 may see causally-related stores in the order {x=0; y=0; x=1; y=2; x=2;}, while all other CPUs may see them in the order {x=0; y=0; y=2; x=2; x=1;}, proving that x86-TSO is not CC. memory_order_acquire_release in this example wouldn't change anything (i.e. still allowed) because none of the reads on proc0 read any of the values stored by proc1, so no synchronizes-with relations hold.
Note also that TSO is "except CPUs that does the storing", which doesn't inhibit memory_order_acquire_release. C++11 acquire_release (but not seq_cst) ordering still holds with regular stores and loads, because each thread must execute in program order anyway, so each CPU reading its stores in program order imposes no additional information. Also see
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2633.html

Also see examples n5/n4b and Table in 2.4 for the more differences between x86-CC and x86-TSO, Intel and AMD, various revisions of the specs (with more possible bugs), and what is actually observed on actual CPUs.

Bonus:
http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://www.linuxjournal.com/article/8211?page=0,1

Labels:


| |

Home

Powered by Blogger