**What is a memory model, anyway? **
In multiprocessorsystems, processors generally have one or more layers of memory cache, whichimproves performance both by speeding access to data (because the data iscloser to the processor) and reducing traffic on the shared memory bus (becausemany memory operations can be satisfied by local caches.) Memory caches canimprove performance tremendously, but they present a host of new challenges.What, for example, happens when two processors examine the same memory locationat the same time? Under what conditions will they see the same value?
在多处理器系统中,处理器通常有一级或多级缓存来加速获取数据(因为数据离处理器更近)和减少共享内存宽带上的交通(因为很多内存操作能通过本地缓存满足)缓存能够极大的提升性能,但它们也带来了一系列新的挑战。例如,当两个处理器在同一时刻检查相同内存位置时会发生什么?什么条件下它们会看到相同的值?
At the processorlevel, a memory model defines necessary and sufficient conditions for knowingthat writes to memory by other processors are visible to the current processor,and writes by the current processor are visible to other processors. Someprocessors exhibit a strong memory model, where all processors see exactly thesame value for any given memory location at all times. Other processors exhibita weaker memory model, where special instructions, called memory barriers, arerequired to flush or invalidate the local processor cache in order to seewrites made by other processors or make writes by this processor visible toothers. These memory barriers are usually performed when lock and unlockactions are taken; they are invisible to programmers in a high level language.
在处理器层面上,内存模型定义了一个充要条件:其它处理器对内存的写入对当前处理器是可见的,而且当前处理器对内存写入对其它处理器也是可见的。某些处理器有一个强内存模型,该模型能够保证对于在任何时刻的任何给定内存,所有的处理器都准确的看到相同的值。其它处理器具有弱内存模型,需要一种特殊指令,叫做内存屏障,来刷新或使本地处理器缓存失效,以便于使其它处理器看到写入操作或者看到其他处理器的写入操作。这些内存屏障通常在lock或unlock动作发生时完成。内存屏障在高级语言中对程序员是不可见的。
It can sometimesbe easier to write programs for strong memory models, because of the reducedneed for memory barriers. However, even on some of the strongest memory models,memory barriers are often necessary; quite frequently their placement iscounterintuitive. Recent trends in processor design have encouraged weakermemory models, because the relaxations they make for cache consistency allowfor greater scalability across multiple processors and larger amounts ofmemory.
有时候为强内存模型写程序可能更简单,因为不需要内存屏障。然而,即使在某些强内存模型中,内存屏障也通常是必要的。设置内存屏障常常违反我们的直觉。近来处理器设计近期趋势趋向使用弱内存模型,因为若内存模型对缓存一致性的削弱允许在跨处理器和大内存方面有更大的扩展性。
The issue of whena write becomes visible to another thread is compounded by the compiler'sreordering of code. For example, the compiler might decide that it is moreefficient to move a write operation later in the program; as long as this codemotion does not change the program's semantics, it is free to do so. If acompiler defers an operation, another thread will not see it until it isperformed; this mirrors the effect of caching.
一个线程的写操作对其它线程可见是由于编译器代码重排导致的。例如,编译器可能认为将一个写操作放在后面更加有效率,只要该代码的移动不会改变程序语意。如果编译器推迟一个操作,其它线程将不会看到它直到它执行完毕,这反映了缓存的影响。
Moreover, writesto memory can be moved earlier in a program; in this case, other threads mightsee a write before it actually "occurs" in the program. All of thisflexibility is by design -- by giving the compiler, runtime, or hardware theflexibility to execute operations in the optimal order, within the bounds ofthe memory model, we can achieve higher performance.
此外,程序的写入操作可能会被移动到前面,在这种情况下,其它线程可能会看到一个比它实际发生更早的写操作。所有这些灵活性都是通过设计实现的——通过给编译器、运行时或硬件以优化的顺序自由执行操作。在内存模型的限定内,我们可以得到更高的性能。
A simple exampleof this can be seen in the following code:
一个简单的例子:
Class Reordering {
int x = 0, y = 0;
public void writer() {
x = 1;
y = 2;
}
public void reader() {
int r1 = y;
int r2 = x;
}
}
Let's say thatthis code is executed in two threads concurrently, and the read of y sees thevalue 2. Because this write came after the write to x, the programmer mightassume that the read of x must see the value 1. However, the writes may havebeen reordered. If this takes place, then the write to y could happen, thereads of both variables could follow, and then the write to x could take place.The result would be that r1 has the value 2, but r2 has the value 0.
让我们看在两个并发线程中执行该代码,读取y值会得到2.因为该写入在x写入之后,程序员可能会认为读取x值会得到1.然而,写入顺序可能会重排。如果重排发生了,那么可能会发生先写入y,跟着读取两个变量,然后对x的写才发生。结果就会是r1为2,r2为0.
The Java MemoryModel describes what behaviors are legal in multithreaded code, and how threadsmay interact through memory. It describes the relationship between variables ina program and the low-level details of storing and retrieving them to and frommemory or registers in a real computer system. It does this in a way that canbe implemented correctly using a wide variety of hardware and a wide variety ofcompiler optimizations.
Java内存模型描述了在多线程代码中什么是合法的,以及线程如何通过内存交互。它描述了程序中变量之间的关系以及在一个实时系统中,从内存和寄存器存储和获取它们的底层实现。Java内存模型使用各种硬件和各种编译器优化技术,来正确实现以上事情。
Java includesseveral language constructs, including volatile, final, and synchronized, whichare intended to help the programmer describe a program's concurrencyrequirements to the compiler. The Java Memory Model defines the behavior ofvolatile and synchronized, and, more importantly, ensures that a correctlysynchronized Java program runs correctly on all processor architectures.
Java包括一些语言结构,包括volatile,final和synchronized,它们帮助程序员向编译器描述程序的并发需求。Java内存模型定义了volatile和synchronized的行为,而且更重要的是确保了一个正确同步的java程序可以在所有处理器架构上正确运行。
Do other languages, like C++, havea memory model?
Most otherprogramming languages, such as C and C++, were not designed with direct supportfor multithreading. The protections that these languages offer against thekinds of reorderings that take place in compilers and architectures are heavilydependent on the guarantees provided by the threading libraries used (such aspthreads), the compiler used, and the platform on which the code is run.
大多数其他编程语言,诸如C、C++,都没有设计为直接支持多线程。
What is JSR 133 about?
Since 1997,several serious flaws have been discovered in the Java Memory Model as definedin Chapter 17 of the Java Language Specification. These flaws allowed forconfusing behaviors (such as final fields being observed to change their value)and undermined the compiler's ability to perform common optimizations.
从1997年以来,一些严重的缺陷从Java语言规范第17章的Java内存模型中发现。这些缺陷会导致一些令人混淆的行为(比如final字段被观察到改变了值),破坏编译器的常见的优化能力。
The JavaMemory Model was an ambitious undertaking; it was the first time that aprogramming language specification attempted to incorporate a memory model whichcould provide consistent semantics for concurrency across a variety ofarchitectures. Unfortunately, defining a memory model which is both consistentand intuitive proved far more difficult than expected. JSR 133 defines a newmemory model for the Java language which fixes the flaws of the earlier memorymodel. In order to do this, the semantics of final and volatile needed tochange.
Java内存模型是一个雄心勃勃的计划,它是一个编程语言规范第一次尝试包含一个可以提供一致性语意的内存模型,来为横跨不同的架构实现并发编程。不幸的是,定义一个既一致又直观的内存模型远比想象中要困难。JSR133为Java语言定义了一个新的内存模型,它修复了早起内存模型的缺陷。为了实现它,final和volatile的语意需要改变。
The fullsemantics are available at(http://www.cs.umd.edu/users/pugh/java/memoryModel),but the formal semantics are not for the timid. It is surprising, and sobering,to discover how complicated seemingly simple concepts like synchronizationreally are. Fortunately, you need not understand the details of the formalsemantics -- the goal of JSR 133 was to create a set of formal semantics that providesan intuitive framework for how volatile, synchronized, and final work.
完整语意可见:[http://www.cs.umd.edu/users/pugh/java/memoryModel](http://www.cs.umd.edu/users/pugh/java/memoryModel),但是正式的语意并不是相当复杂的。它是令人惊讶和清醒的,目的是为了让人了解到那些看似简单的概念如同步其实是多么的复杂。幸运的是,你不需要理解语意的详细内容——JSR133的目的就是创建一系列正式语意来提供volatile、synchronized和final如何工作的直观框架。
The goalsof JSR 133 include:
JSR133的目标包括:
- Preserving existing safety
guarantees, like type-safety, and strengthening others. For example,
variable values may not be created "out of thin air": each value
for a variable observed by some thread must be a value that can reasonably
be placed there by some thread.
保留现有的安全保证,如类型安全以及强化其它的安全保证。例如,变量值不能凭空创建:线程观察到的每一个变量值必须是被其他线程合理设置的。 - The semantics of correctly
synchronized programs should be as simple and intuitive as possible.
正确的同步程序的语意应该尽可能的简单和直观。 - The semantics of incompletely or
incorrectly synchronized programs should be defined so that potential
security hazards are minimized.
应该定义不完全或不正确的同步程序,来最小化其安全危害。 - Programmers should be able to
reason confidently about how multithreaded programs interact with memory.
程序员应该能够自信的推断出多线程程序如何与内存交互。 - It should be possible to design
correct, high performance JVM implementations across a wide range of
popular hardware architectures.
能够在现在许多流行的硬件架构下设计正确的高性能的JVM实现。 - A new guarantee of initialization
safety should be provided. If an object is properly constructed (which
means that references to it do not escape during construction), then all
threads which see a reference to that object will also see the values for
its final fields that were set in the constructor, without the need for
synchronization.
要提供一个安全初始化的保证。如果一个对象被正确构造(即它的引用没有在构造过程中逃逸),那么所有看见那个对象引用的线程,在不进行同步的情况下,也能看到其构造方法中设置的final字段。 - There should be minimal impact on
existing code.
尽量不影响现有代码。
What is meant by reordering?
There are anumber of cases in which accesses to program variables (object instance fields,class static fields, and array elements) may appear to execute in a differentorder than was specified by the program. The compiler is free to take libertieswith the ordering of instructions in the name of optimization. Processors mayexecute instructions out of order under certain circumstances. Data may bemoved between registers, processor caches, and main memory in different orderthan specified by the program.
在很多情况下,访问程序变量(对象实例字段、类静态变量以及数组元素)可能会以一个不同于程序指定的顺序执行。编译器能够以优化的名义自由的改变指令顺序。处理器在特性环境下可能会颠倒执行顺序。数序会在寄存器、处理器缓存以及主内存之间以不同于程序指定顺序的次序移动。
Forexample, if a thread writes to field a and then to field b,and the value of b does not depend on the value of a, thenthe compiler is free to reorder these operations, and the cache is free toflush b to main memory before a. There are a number ofpotential sources of reordering, such as the compiler, the JIT, and the cache.
Thecompiler, runtime, and hardware are supposed to conspire to create the illusionof as-if-serial semantics, which means that in a single-threaded program, theprogram should not be able to observe the effects of reorderings. However,reorderings can come into play in incorrectly synchronized multithreadedprograms, where one thread is able to observe the effects of other threads, andmay be able to detect that variable accesses become visible to other threads ina different order than executed or specified in the program.
Most of thetime, one thread doesn't care what the other is doing. But when it does, that'swhat synchronization is for.
What was wrong with the old memory model?
There wereseveral serious problems with the old memory model. It was difficult tounderstand, and therefore widely violated. For example, the old model did not,in many cases, allow the kinds of reorderings that took place in every JVM.This confusion about the implications of the old model was what compelled theformation of JSR-133.
One widelyheld belief, for example, was that if final fields were used, thensynchronization between threads was unnecessary to guarantee another threadwould see the value of the field. While this is a reasonable assumption and asensible behavior, and indeed how we would want things to work, under the oldmemory model, it was simply not true. Nothing in the old memory model treated finalfields differently from any other field -- meaning synchronization was the onlyway to ensure that all threads see the value of a final field that was writtenby the constructor. As a result, it was possible for a thread to see thedefault value of the field, and then at some later time see its constructedvalue. This means, for example, that immutable objects like String can appearto change their value -- a disturbing prospect indeed.
The oldmemory model allowed for volatile writes to be reordered with nonvolatile readsand writes, which was not consistent with most developers intuitions aboutvolatile and therefore caused confusion.
Finally, aswe shall see, programmers' intuitions about what can occur when their programsare incorrectly synchronized are often mistaken. One of the goals of JSR-133 isto call attention to this fact.
What do you mean by �incorrectly synchronized�?
Incorrectlysynchronized code can mean different things to different people. When we talkabout incorrectly synchronized code in the context of the Java Memory Model, wemean any code where
- there is a write of a variable by one thread,
- there is a read of the same variable by another thread and
- the write and read are not ordered by synchronization
When theserules are violated, we say we have a data race on that variable. Aprogram with a data race is an incorrectly synchronized program.
What does synchronization do?
Synchronizationhas several aspects. The most well-understood is mutual exclusion -- only onethread can hold a monitor at once, so synchronizing on a monitor means thatonce one thread enters a synchronized block protected by a monitor, no otherthread can enter a block protected by that monitor until the first thread exitsthe synchronized block.
But there ismore to synchronization than mutual exclusion. Synchronization ensures thatmemory writes by a thread before or during a synchronized block are madevisible in a predictable manner to other threads which synchronize on the samemonitor. After we exit a synchronized block, we **release **the monitor,which has the effect of flushing the cache to main memory, so that writes madeby this thread can be visible to other threads. Before we can enter asynchronized block, we acquire the monitor, which has the effect ofinvalidating the local processor cache so that variables will be reloaded frommain memory. We will then be able to see all of the writes made visible by theprevious release.
Discussingthis in terms of caches, it may sound as if these issues only affectmultiprocessor machines. However, the reordering effects can be easily seen ona single processor. It is not possible, for example, for the compiler to moveyour code before an acquire or after a release. When we say that acquires andreleases act on caches, we are using shorthand for a number of possibleeffects.
The newmemory model semantics create a partial ordering on memory operations (readfield, write field, lock, unlock) and other thread operations (start and join),where some actions are said to happen before other operations. When oneaction happens before another, the first is guaranteed to be ordered before andvisible to the second. The rules of this ordering are as follows:
- Each action in a thread happens before every action in that thread that comes later in the program's order.
- An unlock on a monitor happens before every subsequent lock on that same monitor.
- A write to a volatile field happens before every subsequent read of that same volatile.
- A call to start() on a thread happens before any actions in the started thread.
- All actions in a thread happen before any other thread successfully returns from a join() on that thread.
This meansthat any memory operations which were visible to a thread before exiting asynchronized block are visible to any thread after it enters a synchronizedblock protected by the same monitor, since all the memory operations happenbefore the release, and the release happens before the acquire.
Anotherimplication is that the following pattern, which some people use to force amemory barrier, doesn't work:
synchronized (new Object()) {}
This isactually a no-op, and your compiler can remove it entirely, because thecompiler knows that no other thread will synchronize on the same monitor. Youhave to set up a happens-before relationship for one thread to see the resultsof another.
ImportantNote: Note that it is important for both threads tosynchronize on the same monitor in order to set up the happens-beforerelationship properly. It is not the case that everything visible to thread Awhen it synchronizes on object X becomes visible to thread B after itsynchronizes on object Y. The release and acquire have to "match"(i.e., be performed on the same monitor) to have the right semantics.Otherwise, the code has a data race.
How can final fields appear to changetheir values?
One of thebest examples of how final fields' values can be seen to change involves oneparticular implementation of the String class.
A Stringcan be implemented as an object with three fields -- a character array, anoffset into that array, and a length. The rationale for implementing Stringthis way, instead of having only the character array, is that it lets multiple Stringand StringBuffer objects share the same character array and avoidadditional object allocation and copying. So, for example, the method String.substring()can be implemented by creating a new string which shares the same characterarray with the original String and merely differs in the length andoffset fields. For a String, these fields are all final fields.
String s1 = "/usr/tmp";
String s2 = s1.substring(4);
The string s2will have an offset of 4 and a length of 4. But, under the old model, it waspossible for another thread to see the offset as having the default value of 0,and then later see the correct value of 4, it will appear as if the string "/usr"changes to "/tmp".
Theoriginal Java Memory Model allowed this behavior; several JVMs have exhibitedthis behavior. The new Java Memory Model makes this illegal.
How do final fields work under the newJMM?
The valuesfor an object's final fields are set in its constructor. Assuming the object isconstructed "correctly", once an object is constructed, the valuesassigned to the final fields in the constructor will be visible to all otherthreads without synchronization. In addition, the visible values for any otherobject or array referenced by those final fields will be at least as up-to-dateas the final fields.
What doesit mean for an object to be properly constructed? It simply means that noreference to the object being constructed is allowed to "escape"during construction. (See SafeConstruction Techniques for examples.) In other words, do not place areference to the object being constructed anywhere where another thread mightbe able to see it; do not assign it to a static field, do not register it as alistener with any other object, and so on. These tasks should be done after theconstructor completes, not in the constructor.
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
The classabove is an example of how final fields should be used. A thread executing readeris guaranteed to see the value 3 for f.x, because it is final. It isnot guaranteed to see the value 4 for y, because it is not final. If FinalFieldExample'sconstructor looked like this:
public FinalFieldExample() { // bad!
x = 3;
y = 4;
// bad construction - allowing this to escape
global.obj = this;
}
thenthreads that read the reference to this from global.obj are notguaranteed to see 3 for x.
The ability to see the correctly constructed value for thefield is nice, but if the field itself is a reference, then you also want yourcode to see the up to date values for the object (or array) to which it points.If your field is a final field, this is also guaranteed. So, you can have afinal pointer to an array and not have to worry about other threads seeing thecorrect values for the array reference, but incorrect values for the contentsof the array. Again, by "correct" here, we mean "up to date asof the end of the object's constructor", not "the latest valueavailable".
Now, havingsaid all of this, if, after a thread constructs an immutable object (that is,an object that only contains final fields), you want to ensure that it is seencorrectly by all of the other thread, you still typically need to usesynchronization. There is no other way to ensure, for example, that thereference to the immutable object will be seen by the second thread. Theguarantees the program gets from final fields should be carefully tempered witha deep and careful understanding of how concurrency is managed in your code.
There is nodefined behavior if you want to use JNI to change final fields.
What does volatile do?
Volatilefields are special fields which are used for communicating state betweenthreads. Each read of a volatile will see the last write to that volatile byany thread; in effect, they are designated by the programmer as fields forwhich it is never acceptable to see a "stale" value as a result ofcaching or reordering. The compiler and runtime are prohibited from allocatingthem in registers. They must also ensure that after they are written, they areflushed out of the cache to main memory, so they can immediately become visibleto other threads. Similarly, before a volatile field is read, the cache must beinvalidated so that the value in main memory, not the local processor cache, isthe one seen. There are also additional restrictions on reordering accesses tovolatile variables.
Under theold memory model, accesses to volatile variables could not be reordered witheach other, but they could be reordered with nonvolatile variable accesses.This undermined the usefulness of volatile fields as a means of signalingconditions from one thread to another.
Under thenew memory model, it is still true that volatile variables cannot be reorderedwith each other. The difference is that it is now no longer so easy to reordernormal field accesses around them. Writing to a volatile field has the same memoryeffect as a monitor release, and reading from a volatile field has the samememory effect as a monitor acquire. In effect, because the new memory modelplaces stricter constraints on reordering of volatile field accesses with otherfield accesses, volatile or not, anything that was visible to thread A when itwrites to volatile field f becomes visible to thread B when it reads f.
Here is asimple example of how volatile fields can be used:
class VolatileExample {
int x = 0;
volatile boolean v = false;
public void writer() {
x = 42;
v = true;
}
public void reader() {
if (v == true) {
//uses x - guaranteed to see 42.
}
}
}
Assume thatone thread is calling writer, and another is calling reader.The write to v in writer releases the write to x tomemory, and the read of v acquires that value from memory. Thus, ifthe reader sees the value true for v, it is also guaranteed to see thewrite to 42 that happened before it. This would not have been true under theold memory model. If v were not volatile, then the compilercould reorder the writes in writer, and reader's read of xmight see 0.
Effectively,the semantics of volatile have been strengthened substantially, almost to thelevel of synchronization. Each read or write of a volatile field acts like"half" a synchronization, for purposes of visibility.
ImportantNote: Note that it is important for both threads toaccess the same volatile variable in order to properly set up thehappens-before relationship. It is not the case that everything visible tothread A when it writes volatile field f becomes visible to thread Bafter it reads volatile field g. The release and acquire have to"match" (i.e., be performed on the same volatile field) to have theright semantics.
Does the new memory model fix the"double-checked locking" problem?
The(infamous) double-checked locking idiom (also called the multithreadedsingleton pattern) is a trick designed to support lazy initialization whileavoiding the overhead of synchronization. In very early JVMs, synchronizationwas slow, and developers were eager to remove it -- perhaps too eager. Thedouble-checked locking idiom looks like this:
// double-checked-locking - don't do this!
private static Something instance = null;
public Something getInstance() {
if (instance == null) {
synchronized (this) {
if (instance == null)
instance = new Something();
}
}
return instance;
}
This looksawfully clever -- the synchronization is avoided on the common code path.There's only one problem with it -- it doesn't work. Why not? The mostobvious reason is that the writes which initialize instance and thewrite to the instance field can be reordered by the compiler or thecache, which would have the effect of returning what appears to be a partiallyconstructed Something. The result would be that we read anuninitialized object. There are lots of other reasons why this is wrong, andwhy algorithmic corrections to it are wrong. There is no way to fix it usingthe old Java memory model. More in-depth information can be found at Double-checkedlocking: Clever, but broken and The"Double Checked Locking is broken" declaration
Many peopleassumed that the use of the volatile keyword would eliminate theproblems that arise when trying to use the double-checked-locking pattern. InJVMs prior to 1.5, volatile would not ensure that it worked (yourmileage may vary). Under the new memory model, making the instancefield volatile will "fix" the problems with double-checked locking, becausethen there will be a happens-before relationship between the initialization ofthe Something by the constructing thread and the return of its valueby the thread that reads it.
However,for fans of double-checked locking (and we really hope there are none left),the news is still not good. The whole point of double-checked locking was toavoid the performance overhead of synchronization. Not only has briefsynchronization gotten a LOT less expensive since the Java 1.0 days, but underthe new memory model, the performance cost of using volatile goes up, almost tothe level of the cost of synchronization. So there's still no good reason touse double-checked-locking. *Redacted --volatiles are cheap on most platforms. *
Instead,use the Initialization On Demand Holder idiom, which is thread-safe and a loteasier to understand:
private static class LazySomethingHolder {
public static Something something = new Something();
}
public static Something getInstance() {
return LazySomethingHolder.something;
}
This code is guaranteed to be correctbecause of the initialization guarantees for static fields; if a field is setin a static initializer, it is guaranteed to be made visible, correctly, to anythread that accesses that class.
What if I'm writing aVM?
You should look at http://gee.cs.oswego.edu/dl/jmm/cookbook.html.
Why should I care?
Why shouldyou care? Concurrency bugs are very difficult to debug. They often don't appearin testing, waiting instead until your program is run under heavy load, and arehard to reproduce and trap. You are much better off spending the extra effortahead of time to ensure that your program is properly synchronized; while thisis not easy, it's a lot easier than trying to debug a badly synchronizedapplication.