原文地址http://mechanitis.blogspot.com/2011/08/dissecting-disruptor-why-its-so-fast.html
My recent slow-down in posting is because I've been trying to write a post explaining memory barriersand their applicability in the Disruptor. The problem is, no matter how much I read and no matter how many times I ask the ever-patient Martin and Mike questions trying to clarify some point, I just don't intuitively grasp the subject. I guess I don't have the deep background knowledge required to fully understand.
最近我的写作速度慢下来了是因为我正在尝试写一篇解释内存障和它们在Disruptor中适用性的文章。问题是,不管我查看多少资料,也不管我多少次问毫不厌倦的Martin和Mike问题,试图弄清一些观点,但还是不能直观地掌握主旨。估计我没有足够深厚的背景知识来完全理解。
So, rather than make an idiot of myself trying to explain something I don't really get, I'm going to try and cover, at an abstract / massive-simplification level, what I do understand in the area. Martin has written a post going into memory barriers in some detail, so hopefully I can get away with skimming the subject.
因此比起傻乎乎地尝试解释一些我自己都不明白的东西,我将在抽象的/大量简化的程度上试着涵盖那些我懂的地方。Martin写过一篇有些详细的走进内存障,多希望我可以先去略读一下它。
Disclaimer: any errors in the explanation are completely my own, and no reflection on the implementation of the Disruptor or on the LMAX guys who actually do know about this stuff.
免责声明:讲解中出现的任何错误都完全是我自己的,不反映Disruptor的实现水平或LMAX的那些确实知道这事的伙计们的水平。
What's the point?
重点是什么呢?
My main aim in this series of blog posts is to explain how the Disruptor works and, to a slightly lesser extent, why. In theory I should be able to provide a bridge between the code and the technical paper by talking about it from the point of view of a developer who might want to use it.
我发这一系列博文的主要目的是解释Disruptor如何工作,和较小程度的解释为什么。从理论上说我应该能提供一座代码和技术文档之间的桥梁,通过从想要使用它的开发者的角度来讲解。
The paper mentioned memory barriers, and I wanted to understand what they were, and how they apply.
那个文档提到了内存障,而我想理解它们是什么,并且怎么适用。
What's a Memory Barrier?
内存障是什么?
It's a CPU instruction. Yes, once again, we're thinking about CPU-level stuff in order to get the performance we need (Martin's famous Mechanical Sympathy). Basically it's an instruction to a) ensure the order in which certain operations are executed and b) influence visibility of some data (which might be the result of executing some instruction).
它是一个CPU指令。是的,再一次,我们在思考CPU级的东西以便得到我们需要的性能(Martin著名的Mechanical Sympathy)。基本上它是一个指令,为了a)确保特定运算的执行顺序和b)影响一些数据(可能是一些指令的执行结果)的可见性。
Compilers and CPUs can re-order instructions, provided the end result is the same, to try and optimise performance. Inserting a memory barrier tells the CPU and the compiler that what happened before that command needs to stay before that command, and what happens after needs to stay after. All similarities to a trip to Vegas are entirely in your own mind.
编译器和CPU能对指令重新排序,来尝试优化性能,最终执行结果是一样的。插入一个内存障会告诉CPU和编译器在那个命令之前执行的需要呆在那个命令之前,在那个命令之后执行的需要呆在那之后。就像一次去拉斯维加斯的旅游全在你脑子里一样。
The other thing a memory barrier does is force an update of the various CPU caches - for example, a write barrier will flush all the data that was written before the barrier out to cache, therefore any other thread that tries to read that data will get the most up-to-date version regardless of which core or which socket it might be executing by.
内存障做的另一件事是强制各种CPU缓存的更新-比如,一个写障会把在这个障之前写到缓存的数据全刷新,于是其他任何线程去读那个数据都会拿到最新的版本,不管它是由哪个内核或socket执行的。
What's this got to do with Java?
这和JAVA有什么关系?
Now I know what you're thinking - this isn't assembler. It's Java.
我知道你现在在想什么-我们讲的不是汇编。是Java。
The magic incantation here is the word volatile
(something I felt was never clearly explained in the Java certification). If your field is volatile, the Java Memory Model inserts a write barrier instruction after you write to it, and a read barrier instruction before you read from it.
这里神奇的咒语是单词"volatile"(一个我觉得在Java认证中从没明确地解释过的东西)。如果你的字段是volatile的,Java内存模型会在你对它写入之后插入一个写障指令,并且在你对它读取之前插入一个读障指令。
This means if you write to a volatile field, you know that:
这意味着如果你对一个volatile字段写入,你知道的:
1.Any thread accessing that field after the point at which you wrote to it will get the updated value
任何在你对这个字段写入之后访问它的线程都会得到更新后的值。
2.Anything you did before you wrote that field is guaranteed to have happened and any updated data values will also be visible, because the memory barrier flushed all earlier writes to the cache.
任何你在对这个字段写入之前做的事都被确保发生过了,而任何更新过的数据值都会变得可见,因为内存障把所有早先对缓存的写入都刷新了。
cursor
is one of these magic volatile thingies, and it's one of the reasons we can get away with implementing the Disruptor without locking.RingBuffer游标是这些神奇的volatile类型的东西之一,这也是我们可以不用锁而实现Disruptor的原因之一。
cursor
) creates a memory barrier which ultimately brings all the caches up to date (or at least invalidates them accordingly). So, if your downstream consumer (C2) sees that an earlier consumer (C1) reaches number 12, when C2 reads entries up to 12 from the ring buffer it will get all updates C1 made to the entries before it updated its sequence number.
因此,如果你的下游消费者(C2)看到较早的消费者(C1)到达过12号,当C2从ring buffer读取到12为止的条目的时候在它更新自己的序列号之前它会拿到所有C1所做的更新。
Basically everything that happens after C2 gets the updated sequence number (shown in blue above) must occur after everything C1 did to the ring buffer before updating its sequence number (shown in black).
基本上所有在C2拿到更新过的序列号(上面蓝色表示的)之后发生的事情都必须出现在C1在更新自己的序列号之前对ring buffer做的事情之后。
Impact on performance
对性能的影响
Memory barriers, being another CPU-level instruction, don't have the same cost as locks - the kernel isn't interfering and arbitrating between multiple threads. But nothing comes for free. Memory barriers do have a cost - the compiler/CPU cannot re-order instructions, which could potentially lead to not using the CPU as efficiently as possible, and refreshing the caches obviously has a performance impact. So don't think that using volatile instead of locking will get you away scot free.
内存障,作为另一个CPU级的指令,没有锁那样的代价-内核没有在多个线程之间干涉和协调。但是没有免费的午餐。内存障也有代价-编译器/CPU不能对指令重新排序,这将隐约导致不能尽可能高效地使用CPU,而且刷新缓存显然也有性能上的影响。因此不要认为用volatile代替锁就能让你逍遥法外。
You'll notice that the Disruptor implementation tries to read from and write to the sequence number as infrequently as possible. Every read or write of a volatile
field is a relatively costly operation. However, recognising this also plays in quite nicely with batching behaviour - if you know you shouldn't read from or write to the sequences too frequently, it makes sense to grab a whole batch of Entries and process them before updating the sequence number, both on the Producer and Consumer side. Here's an example from BatchConsumer:
你会注意到Disruptor的实现尽可能少地对序列号进行读写。每次对volatile字段的读或写都是相对高成本的操作。尽管如此,认识到这在批量的情况也表现得很好-如果你知道的话,你不该对序列号作过多的读写操作,在Producer或Consumer两边抓取一整批Entry并且在更新序列号之前加工它们,都是有意义的。这里有一个来自BatchConsumer的例子:
long nextSequence = sequence +1; |
while(running) |
{ |
try |
{ |
finallong availableSequence = consumerBarrier.waitFor(nextSequence); |
while(nextSequence <= availableSequence) |
{ |
entry = consumerBarrier.getEntry(nextSequence); |
handler.onAvailable(entry); |
nextSequence++; |
} |
handler.onEndOfBatch(); |
sequence = entry.getSequence(); |
} |
... |
catch(finalException ex) |
{ |
exceptionHandler.handle(ex, entry); |
sequence = entry.getSequence(); |
nextSequence = entry.getSequence()+1; |
} |
} |
(You'll note this is the "old" code and naming conventions, because this is inline with my previous blog posts, I thought it was slightly less confusing than switching straight to the new conventions).
(你会注意到这是旧代码和约定名称,因为这和我之前发的博文对应,我觉得比起直接切换到新的约定,这样疑惑会更少一些)。
In the code above, we use a local variable to increment during our loop over the entries the consumer is processing. This means we read from and write to the volatile sequence field (shown in bold) as infrequently as we can get away with.
在上面的代码里,我们在对消费者处理的条目进行循环的时候使用一个局部变量来递增。这表明我们尽可能少地读写那个volatile类型的序列号(粗体的)。
In Summary
总结
Memory barriers are CPU instructions that allow you to make certain assumptions about when data will be visible to other processes. In Java, you implement them with the volatile
keyword. Using volatile means you don't necessarily have to add locks willy nilly, and will give you performance improvements over using them. However you need to think a little more carefully about your design, in particular how frequently you use volatile fields, and how frequently you read and write them.
内存障是CPU指令,它们允许你对什么时候数据对其他进程可见作一些假定。在Java中,你通过volatile关键字来实现它们。使用volatile意味着你不管愿不愿意都不必加入锁,并且通过使用它们会给你带来性能上的提升。然而你需要对你的设计想得更仔细一些,特别是你使用volatile字段有多频繁,和对它们读写有多频繁。
PS Given that the New World Order in the Disruptor uses totally different naming conventions now to everything I've blogged about so far, I guess the next post is mapping the old world to the new one.
备注中讲了Disrupor中的”世界新秩序“使用了和我到目前为止发的博文不同的命名约定,我猜下一篇文章应该是将它们做一个对照了。