Intel® 64 Architecture Memory Ordering White Paper
http://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf
Overview This document depicts Intel 64 memory ordering at a level that is architecturally visible to software.1 The principles and examples provide software writers with a clear understanding of the results that different sequences of memory access instructions may produce. This document does not provide memory ordering rules for I/O operations. This document discusses only software-visible behavior. Hardware is allowed to perform any optimizations as long as it does not violate any of the visibility principles. (It may even execute a single memory access more than once; if it does, only the final execution is visible to software.) For example “loads are not reordered” means that “loads do not appear to be reordered,” not that the hardware is restricted in how loads are internally implemented. The principles identified here apply to code as executed by the processor, that is, to code after all compiler optimizations have been performed and the final executable generated. This document is subject to future revision.
The term Intel 64 refers to both IA-32 and Intel® 64 processors.