前言
身体是革命的本钱,不舒服了2周,现在好点了。
学习JDK8的Stream,Spliterator这个分割迭代器是必须要重视的。
Notes:下方蓝色文字是自己的翻译(如果有问题请指正)。黑色文字是源文档。红色文字是自己的备注。
Spliterator类源码文档
用于遍历和分割一个数据源中的元素的对象。分割迭代器覆盖的数据源可以是数组,集合,IO通道或生成器函数。
An object for traversing and partitioning elements of a source.
The source of elements covered by a Spliterator could be, for example, an array, a Collection, an IO channel, or a generator function.
一个分割迭代器可以单独地(tryAdvance())或者顺序地按块(forEachRemaining())遍历元素。
A Spliterator may traverse elements individually (tryAdvance()) or sequentially in bulk (forEachRemaining()).
分割迭代器还可以将其某些元素分区(使用trySplit())作为另一个分割迭代器,以用于可能的并行操作中。
A Spliterator may also partition off some of its elements (using trySplit()) as another Spliterator, to be used in possibly-parallel operations.
如果你的分割迭代器的操作使其不能被分割,或者以一种高度不平衡的方式分割,那么你不会从并行计算中获益。
Operations using a Spliterator that cannot split, or does so in a highly imbalanced or inefficient manner, are unlikely to benefit from parallelism.
在遍历和分割元素的整个过程中,每个分割迭代器只能被一个单独的数据块使用。
Traversal and splitting exhaust elements; each Spliterator is useful for only a single bulk computation.
一个分割迭代器需要报告一组特性值(characteristics()),特性值是关于数据源的结构,源,元素的。目前的特性值包含8个:ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED.
A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED.
这些报告的特性值可能被Spliterator客户端用来特殊处理:特殊化处理或简化计算。例如:一个Collection的分割迭代器应该报告SIZED特性;一个Set的分割迭代器应该报告DISTINCT特性;一个SortedSet的分割迭代器应该报告SORTED特性。
这里希望特殊说明一下:SORTED特性和ORDERED特性。它们代表的含义是不同的。SORTED特性代表的是这个源是被排过序的(如:按年龄大小顺序排过序);ORDERED特性代表的是这个源是有序的(如:ArrayList)。
These may be employed by Spliterator clients to control, specialize or simplify computation. For example, a Spliterator for a Collection would report SIZED, a Spliterator for a Set would report DISTINCT, and a Spliterator for a SortedSet would also report SORTED.
特性值被用一组二进制来表示。一些特性值还限制了方法的行为。例如,如果为ORDERED,则遍历方法必须符合其记录的顺序。
Characteristics are reported as a simple unioned bit set. Some characteristics additionally constrain method behavior; for example if ORDERED, traversal methods must conform to their documented ordering.
将来可能会定义新的特性,因此实现者不应为没列出的值分配含义。
New characteristics may be defined in the future, so implementors should not assign meanings to unlisted values.
我们希望不包含IMMUTABLE和CONCURRENT的分割迭代器有一个文档化策略的考量:1.何时分割迭代器绑定到源中的元素?2.绑定数据源之后,对数据源的数据结构上的干扰的检测。
structural interference意思是:源在结构上受到干扰,如:添加、替换或删除元素。
A Spliterator that does not report IMMUTABLE or CONCURRENT is expected to have a documented policy concerning: when the spliterator binds to the element source; and detection of structural interference of the element source detected after binding.
一个延迟绑定的分割迭代器在第一次遍历或者第一次分割或第一次查询数据源的size的时候会绑定至数据源,而不是创建分割迭代器的时候绑定上去的。
A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size, rather than at the time the Spliterator is created.
一个非延迟绑定的分割迭代器在构造器或任何一个方法首次被调用的时候会绑定至数据源。
A Spliterator that is not late-binding binds to the source of elements at the point of construction or first invocation of any method.
在绑定之前,对元素的修改能在分割迭代器遍历时反映出来。而绑定拆分器后,如果检测到数据源结构有变化,则应尽最大努力抛出ConcurrentModificationException。分割迭代器这样的行为称为fail fast。
Modifications made to the source prior to binding are reflected when the Spliterator is traversed. After binding a Spliterator should, on a best-effort basis, throw ConcurrentModificationException if structural interference is detected. Spliterators that do this are called fail-fast.
分割迭代器对数据块的迭代方法(forEachRemaining())会优化遍历,可以优化遍历的过程并在遍历完所有元素后,检测结构变化,而不是一个元素一个元素的检查,并立即失败。
The bulk traversal method (forEachRemaining()) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.
Spliterators可以通过estimateSize()方法获取剩余元素数量的估计值。理想情况下,正如特征值SIZED所反映的那样,这个值恰好对应于成功遍历过程中会遇到的元素数量
Spliterators can provide an estimate of the number of remaining elements via the estimateSize() method. Ideally, as reflected in characteristic SIZED, this value corresponds exactly to the number of elements that would be encountered in a successful traversal.
然而,即使在不完全确认的情况下,估计值(estimated value)对于正在源上执行的操作仍然是有用的,比如帮助确定进一步分割或按顺序遍历剩余元素是更好的。
However, even when not exactly known, an estimated value value may still be useful to operations being performed on the source, such as helping to determine whether it is preferable to split further or traverse the remaining elements sequentially.
尽管spliterators在并行算法中有明显的效用,但它并不期望是线程安全的;相反,使用spliterator实现并行算法应该确保spliterator一次只被一个线程使用。这通常很容易通过串行线程限制来实现,而串行线程限制通常是通过递归分解工作的典型并行算法的自然结果。
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition.
调用trySplit()的线程可以将返回的Spliterator移交给另一个线程,后者依次遍历或进一步分割该Spliterator。如果两个或多个线程在同一个spliterator上并发操作,则分割和遍历的行为是未定义的。如果原始线程将一个spliterator交给另一个线程进行处理,那么最好是在tryAdvance()使用任何元素之前进行切换,因为某些保证(例如对于有SIZED特性的spliterators的estimateSize()的准确性)仅在遍历开始之前有效。
A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.
Spliterator的原始数据类型的实现可用于int、long和double值。Spliterator的子接口中的tryAdvance(java.util.function.Consumer)和forEachRemaining(java.util.function.Consumer)的的默认实现将原始值封装到它们对应的包装类的实例中。这种装箱可能会破坏通过使用原生特化所获得的任何性能优势。
这里的"Spliterator的子接口"指的是Spliterator.java中的子接口:OfDouble,OfInt,OfLong,OfPrimitive
Primitive subtype specializations of Spliterator are provided for int, long, and double values. The subtype default implementations of tryAdvance(java.util.function.Consumer) and forEachRemaining(java.util.function.Consumer) box primitive values to instances of their corresponding wrapper class. Such boxing may undermine any performance advantages gained by using the primitive specializations.
为了避免装箱,应该使用相应的基于原生类型的方法。例如,Spliterator.OfInt.tryAdvance (java.util.function.IntConsumer)和Spliterator.OfInt.forEachRemaining (java.util.function.IntConsumer)在使用时,应该优先于Spliterator.OfInt.tryAdvance (java.util.function.Consumer)和Spliterator.OfInt.forEachRemaining (java.util.function.Consumer)。
To avoid boxing, the corresponding primitive-based methods should be used. For example, Spliterator.OfInt.tryAdvance(java.util.function.IntConsumer) and Spliterator.OfInt.forEachRemaining(java.util.function.IntConsumer) should be used in preference to Spliterator.OfInt.tryAdvance(java.util.function.Consumer) and Spliterator.OfInt.forEachRemaining(java.util.function.Consumer).
使用基于装箱的方法tryAdvance()和forEachRemaining()遍历原生类型不会影响转换为装箱值的值所遇到的顺序。
Traversal of primitive values using boxing-based methods tryAdvance() and forEachRemaining() does not affect the order in which the values, transformed to boxed values, are encountered.
API说明:
API Note:
与Iterators一样,Spliterators用于遍历源的元素。Spliterator API通过支持分割(split)和单元素迭代,被设计为除了顺序遍历之外还支持有效的并行遍历。此外,通过Spliterator访问元素的协议被设计为比Iterator施加更小的单个元素开销,并避免使用单独的hasNext()和next()方法所带来的固有竞争。
Spliterators, like Iterators, are for traversing the elements of a source. The Spliterator API was designed to support efficient parallel traversal in addition to sequential traversal, by supporting decomposition as well as single-element iteration. In addition, the protocol for accessing elements via a Spliterator is designed to impose smaller per-element overhead than Iterator, and to avoid the inherent race involved in having separate methods for hasNext() and next().
对于可变的数据源,如果在Spliterator绑定到其数据源和遍历结束之间,源在结构上受到干扰(添加、替换或删除元素),则可能出现任意的和非确定的行为。例如,在使用java.util.stream时,这种干扰将产生任意的、不确定的结果。
For mutable sources, arbitrary and non-deterministic behavior may occur if the source is structurally interfered with (elements added, replaced, or removed) between the time that the Spliterator binds to its data source and the end of traversal. For example, such interference will produce arbitrary, non-deterministic results when using the java.util.stream framework.
源的结构性干扰可以用下列方法来管理(按可取性递减的大致顺序)
Structural interference of a source can be managed in the following ways (in approximate order of decreasing desirability):
-
不能从结构上干扰源。
例如,CopyOnWriteArrayList的一个实例就是一个不可变源。从这个源创建的Spliterator报告IMMUTABLE(不可变的)特性。
The source cannot be structurally interfered with.
For example, an instance of CopyOnWriteArrayList is an immutable source. A Spliterator created from the source reports a characteristic of IMMUTABLE. -
数据源对象负责管理并发修改。
例如,ConcurrentHashMap的key的set是一个支持并发的数据源。从这个源创建的Spliterator报告CONCURRENT特性。
The source manages concurrent modifications.
For example, a key set of a ConcurrentHashMap is a concurrent source. A Spliterator created from the source reports a characteristic of CONCURRENT. -
可变源提供一个延迟绑定和快速失败的Spliterator。
延迟绑定缩小了干扰会影响计算的窗口;fail-fast以最大的努力检测到,在遍历开始后发生了结构干扰,并抛出了ConcurrentModificationException异常。例如,JDK中的ArrayList和许多其他非并发集合类提供了延迟绑定、快速失败的spliterator。
The mutable source provides a late-binding and fail-fast Spliterator.
Late binding narrows the window during which interference can affect the calculation; fail-fast detects, on a best-effort basis, that structural interference has occurred after traversal has commenced and throws ConcurrentModificationException. For example, ArrayList, and many other non-concurrent Collection classes in the JDK, provide a late-binding, fail-fast spliterator. -
可变源提供一个非延迟绑定,但快速失败的Spliterator。
该源增加了抛出ConcurrentModificationException的可能性,因为潜在干扰的窗口更大。
The mutable source provides a non-late-binding but fail-fast Spliterator.
The source increases the likelihood of throwing ConcurrentModificationException since the window of potential interference is larger. -
可变源提供一个延迟绑定,但非快速失败的Spliterator。
在遍历开始后,由于没有检测到干扰,源有可能出现任意的、不确定的行为。
The mutable source provides a late-binding and non-fail-fast Spliterator.
The source risks arbitrary, non-deterministic behavior after traversal has commenced since interference is not detected. -
可变源提供一个非延迟绑定,且非快速失败的Spliterator。
源增加了任意、非确定性行为的风险,因为未受检测的干扰行为可能在Spliterator构造后发生。
The mutable source provides a non-late-binding and non-fail-fast Spliterator.
The source increases the risk of arbitrary, non-deterministic behavior since non-detected interference may occur after construction.
例子,这里有一个类(除了用于说明之外,没有其他作用),它维护一个数组,其中实际数据保存在偶数位置,而不相关的标记数据保存在奇数位置。它的Spliterator忽略标记数据。
Example. Here is a class (not a very useful one, except for illustration) that maintains an array in which the actual data are held in even locations, and unrelated tag data are held in odd locations. Its Spliterator ignores the tags.
class TaggedArray<T> {
private final Object[] elements; // immutable after construction
TaggedArray(T[] data, Object[] tags) {
int size = data.length;
if (tags.length != size) throw new IllegalArgumentException();
this.elements = new Object[2 * size];
for (int i = 0, j = 0; i < size; ++i) {
elements[j++] = data[i];
elements[j++] = tags[i];
}
}
public Spliterator<T> spliterator() {
return new TaggedArraySpliterator<>(elements, 0, elements.length);
}
static class TaggedArraySpliterator<T> implements Spliterator<T> {
private final Object[] array;
private int origin; // current index, advanced on split or traversal
private final int fence; // one past the greatest index
TaggedArraySpliterator(Object[] array, int origin, int fence) {
this.array = array; this.origin = origin; this.fence = fence;
}
public void forEachRemaining(Consumer<? super T> action) {
for (; origin < fence; origin += 2)
action.accept((T) array[origin]);
}
public boolean tryAdvance(Consumer<? super T> action) {
if (origin < fence) {
action.accept((T) array[origin]);
origin += 2;
return true;
}
else // cannot advance
return false;
}
public Spliterator<T> trySplit() {
int lo = origin; // divide range in half
int mid = ((lo + fence) >>> 1) & ~1; // force midpoint to be even
if (lo < mid) { // split out left half
origin = mid; // reset this Spliterator's origin
return new TaggedArraySpliterator<>(array, lo, mid);
}
else // too small to split
return null;
}
public long estimateSize() {
return (long)((fence - origin) / 2);
}
public int characteristics() {
return ORDERED | SIZED | IMMUTABLE | SUBSIZED;
}
}
}
作为一个介绍Spliterator是如何支持并行计算的例子,如java.util.stream包,本例将在并行计算中使用Spliterator,下面是一种实现相关并行forEach的方法,它演示了分离子任务的主要用法,直到估计的工作量足够小,可以按顺序执行为止。
As an example how a parallel computation framework, such as the java.util.stream package, would use Spliterator in a parallel computation, here is one way to implement an associated parallel forEach, that illustrates the primary usage idiom of splitting off subtasks until the estimated amount of work is small enough to perform sequentially.
这里我们假设子任务的处理顺序无关紧要;不同的(forked)任务可以进一步拆分和以未确定的顺序并发处理元素。这个例子使用了CountedCompleter;类似的用法也适用于其他并行任务。
Here we assume that the order of processing across subtasks doesn't matter; different (forked) tasks may further split and process elements concurrently in undetermined order. This example uses a CountedCompleter; similar usages apply to other parallel task constructions.
static <T> void parEach(TaggedArray<T> a, Consumer<T> action) {
Spliterator<T> s = a.spliterator();
long targetBatchSize = s.estimateSize() / (ForkJoinPool.getCommonPoolParallelism() * 8);
new ParEach(null, s, action, targetBatchSize).invoke();
}
static class ParEach<T> extends CountedCompleter<Void> {
final Spliterator<T> spliterator;
final Consumer<T> action;
final long targetBatchSize;
ParEach(ParEach<T> parent, Spliterator<T> spliterator,
Consumer<T> action, long targetBatchSize) {
super(parent);
this.spliterator = spliterator; this.action = action;
this.targetBatchSize = targetBatchSize;
}
public void compute() {
Spliterator<T> sub;
while (spliterator.estimateSize() > targetBatchSize &&
(sub = spliterator.trySplit()) != null) {
addToPendingCount(1);
new ParEach<>(this, sub, action, targetBatchSize).fork();
}
spliterator.forEachRemaining(action);
propagateCompletion();
}
}
实现说明:
Implementation Note:
如果系统属性之一的布尔值org.openjdk.java.util.stream.tripwire设置为true,那么在操作原生特化子类型时,如果发生原生类型的装箱,就会报告诊断警告。
If the boolean system property org.openjdk.java.util.stream.tripwire is set to true then diagnostic warnings are reported if boxing of primitive values occur when operating on primitive subtype specializations.