Oleg Zabluda's blog
Thursday, September 27, 2012
All aligned locked instructions on Intel Xeon X5650 2.66 Ghz (HP Z600) Westmere-EP run in 19 cc [1] if they don't...
All aligned locked instructions on Intel Xeon X5650 2.66 Ghz (HP Z600) Westmere-EP run in 19 cc [1] if they don't cross cacheline. But if they do, they run in 4-6K cc. Crossing page-line boundary adds no additional penalty, apparently they take a global lock. I measured MOV, XCHG, INC, ADD, XADD, AND, OR, CMPXCHG, etc. Exceptions were MFENCE, which takes 24 cc, and LFENCE, SFENSE, which takes 8-15cc. 

If you want to know why (and who wouldn't), and how to (maybe) make lemonade out of it, read these:


http://www.drdobbs.com/article/print?articleId=221600290&siteSectionName=parallel ("Locks" section)


[1] Update: Sandy Bridge - 16 cc, Haswell - 12 cc.


Brief history of Intel's x86 memory model (MM) documentation.
Brief history of Intel's x86 memory model (MM) documentation.

Jun 30, 2005 - Dark ages. Nobody knows anything, as evidenced by an expert and guru Paul McKenney' excellent paper in Linux Journal listing x86 MM too conservatively, because Intel wouldn't confirm or deny anything.
(Since then updated in his excellent book "Is Parallel Programming Hard, And, If So, What Can You Do About It?" section "C.7. MEMORY-BARRIER INSTRUCTIONS FOR SPECIFIC CPUS"
Last Intel SDM like this was (rev-22, Nov. 2006)

Aug 2007 - Intel released White Paper (later merged into Intel SDM (including rev.26–28), and AMD put into the manual, documentation of their memory ordering guarantees (causal consistency - CC aka transitive visibility i.e if CPU 0 sees a store by CPU 1, then CPU 0 is guaranteed to see all stores that CPU 1 saw prior to its store.), saying that using XCHG for load or store, does guarantee sequential consistency (SC).

but "clarifying" that putting MFENCE between ordinary aligned store and load does not (and never did). because there is no Total Store Order (TSO). Hilarity ensues. Have to use LOCK XCHG for stores instead, but it does not work for arbitrary large objects, rendering Java "volatile" unimplementable on 32 bit, because Java requres 64 bit fields:

However, it contained bugs (behavior disallowed by x86-CC [1], but observed with actual CPUs) having to do with CPUs reading its own writes early from its own store buffer, before the stores are visible to other CPUs (store forwarding)
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-745.pdf (2009)
http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf (2010)

(Nov. 2008) Intel SDM rev-29 and AMD yet again clarified their memory models. which now guarantees TSO, but not CC, meaning that MFENCE is not even needed for CC acq-rel (?), only for SC. Whew.

[1] Example n6 shows that proc0 may see causally-related stores in the order {x=0; y=0; x=1; y=2; x=2;}, while all other CPUs may see them in the order {x=0; y=0; y=2; x=2; x=1;}, proving that x86-TSO is not CC. memory_order_acquire_release in this example wouldn't change anything (i.e. still allowed) because none of the reads on proc0 read any of the values stored by proc1, so no synchronizes-with relations hold.
Note also that TSO is "except CPUs that does the storing", which doesn't inhibit memory_order_acquire_release. C++11 acquire_release (but not seq_cst) ordering still holds with regular stores and loads, because each thread must execute in program order anyway, so each CPU reading its stores in program order imposes no additional information. Also see

Also see examples n5/n4b and Table in 2.4 for the more differences between x86-CC and x86-TSO, Intel and AMD, various revisions of the specs (with more possible bugs), and what is actually observed on actual CPUs.



"Lux Aeterna" (Latin for "Eternal Light") is an awesome composition by Clint Mansell (1963-) and the theme for...
"Lux Aeterna" (Latin for "Eternal Light") is an awesome composition by Clint Mansell (1963-) and the theme for Darren Aronofsky's (1969-) awesome "Requiem for a Dream" (2000),  NC-17, one of the best movies ever made. Performed by Kronos Quartet (1973-), based in San Francisco since 1978.

Requiem For A Dream Lux Aeterna FULL ORCHESTRA
Kronos Quartet - Requiem for a Dream (complete)
Requiem For A Dream - Full Theme Song



Song subtitle каверы is an old genre. You can find lots and lots of it on Youtube, for example:
Song subtitle каверы is an old genre. You can find lots and lots of it on Youtube, for example:

"O Fortuna" Misheard Lyrics

The very first I ever saw (before youtube) was "две мохнатые бляди"
две мохнатые бляди (реальный перевод)

My partial use of this genre (contains "шмел денги НЕТ!"):


"O Fortuna" Misheard Lyrics (via Tatiana Goldina [1])
"O Fortuna" Misheard Lyrics (via Tatiana Goldina [1])
O Fortuna Misheard Lyrics (Animated)

Original Latin Lyrics
Carmina Burana - I. O Fortuna (w/ English subtitles)
Carmina Burana ~ O Fortuna | Carl Orff ~ André Rieu

http://en.wikipedia.org/wiki/Carmina_Burana """
Carmina Burana is a scenic cantata composed by Carl Orff (1895-1982) in 1935 and 1936. It is based on 24 of the poems found in the medieval collection Carmina Burana ("Songs from Beuern"), original text dating mostly from the 11th or 12th century, including some from the 13th century.  Michel Hofmann, then a young law student and Latin and Greek enthusiast, assisted Orff in the selection and organization of 24 of these poems into a libretto, mostly in Latin verse, with a small amount of Middle High German and Old Provençal. The selection covers a wide range of topics, as familiar in the 13th century as they are in the 21st century: the fickleness of fortune and wealth, the ephemeral nature of life, the joy of the return of Spring, and the pleasures and perils of drinking, gluttony, gambling and lust.
Carmina Burana is part of Trionfi, the musical triptych that also includes the cantata Catulli Carmina and Trionfo di Afrodite. The first and last movements are called "Fortuna Imperatrix Mundi" and start with the very well-known "O Fortuna".

[1] https://plus.google.com/103264568525243687606/posts/aWFzxknhcnP


Powered by Blogger