Oleg Zabluda's blog
Thursday, September 27, 2012
 
All aligned locked instructions on Intel Xeon X5650 2.66 Ghz (HP Z600) Westmere-EP run in 19 cc [1] if they don't...
All aligned locked instructions on Intel Xeon X5650 2.66 Ghz (HP Z600) Westmere-EP run in 19 cc [1] if they don't cross cacheline. But if they do, they run in 4-6K cc. Crossing page-line boundary adds no additional penalty, apparently they take a global lock. I measured MOV, XCHG, INC, ADD, XADD, AND, OR, CMPXCHG, etc. Exceptions were MFENCE, which takes 24 cc, and LFENCE, SFENSE, which takes 8-15cc. 

If you want to know why (and who wouldn't), and how to (maybe) make lemonade out of it, read these:

http://www.intel.com/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf
http://www.realworldtech.com/common-system-interface/

https://blogs.oracle.com/dave/entry/qpi_quiescence
http://www.drdobbs.com/article/print?articleId=221600290&siteSectionName=parallel ("Locks" section)

http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect

[1] Update: Sandy Bridge - 16 cc, Haswell - 12 cc.
https://plus.google.com/+OlegZabluda/posts/RRUbSJV4ift
http://www.drdobbs.com/article/print?articleId=221600290&siteSectionName=parallel

Labels:


| |

Home

Powered by Blogger