Java8采用stream、parallelStream迭代的区别
我们都知道在Java 8 API添加了一个新的抽象称为流Stream,可以让你以一种声明的方式处理数据。Stream 使用一种类似用 SQL 语句从数据库查询数据的直观方式来提供一种对 Java 集合运算和表达的高阶抽象。Stream API可以极大提高Java程序员的生产力,让程序员写出高效率、干净、简洁的代码。这种风格将要处理的元素集合看作一种流, 流在管道中传输, 并且可以在管道的节点上进行处理, 比如筛选, 排序,聚合等。元素流在管道中经过中间操作(intermediate operation)的处理,最后由最终操作(terminal operation)得到前面处理的结果。
通过查看API能够看到Java8 API为我们提供了Stream和parallelStream两个不同的方法,那么同样是流处理,这两个方法又有什么区别呢?首先我们来看看以下的代码:
public static void main(String[] args) {
List<Integer> numberList = Arrays.asList(1,2,3,4,5,6,7,8,9);
System.out.println("运行结果:");
// stream method
numberList.stream().forEach(number -> {
System.out.print(String.format("%d ",number));
});
System.out.println(" ");
// parallelStream method
numberList.parallelStream().forEach(number -> {
System.out.print(String.format("%d ",number));
});
System.out.println(" ");
// parallelStream method
numberList.parallelStream().forEachOrdered(number -> {
System.out.print(String.format("%d ",number));
});
System.out.println(" ");
}
通过多次运行上述代码,我们可以发现,通过parallelStream方法迭代集合,每次输出的结果都不一样,而通过steam方法或parallelStream方法并以forEachOrdered方式,每次执行输出的结果都是一样的,并且顺序符合集合元素的存放顺序。
那么,为什么会造成这样的结果差异呢,难道parallelStram是采用多线程并行的方式运行?于是,我们进一步修改下我们的代码来验证一下猜测。
public static void main(String[] args) {
System.out.println("运行结果:");
List<Integer> numberList = Arrays.asList(1,2,3,4,5,6,7,8,9);
// stream method
numberList.stream().forEach(number -> {
System.out.println(String.format("Stream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
});
System.out.println(" ");
// parallelStream method
numberList.parallelStream().forEach(number -> {
System.out.println(String.format("ParallelStream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
});
System.out.println(" ");
// parallelStream method
numberList.parallelStream().forEachOrdered(number -> {
System.out.println(String.format("ParallelStream forEach Ordered The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
});
System.out.println(" ");
}
修改后代码运行结果如下:
运行结果:
Stream The Current Thread's ID is 1 and output number 1
Stream The Current Thread's ID is 1 and output number 2
Stream The Current Thread's ID is 1 and output number 3
Stream The Current Thread's ID is 1 and output number 4
Stream The Current Thread's ID is 1 and output number 5
Stream The Current Thread's ID is 1 and output number 6
Stream The Current Thread's ID is 1 and output number 7
Stream The Current Thread's ID is 1 and output number 8
Stream The Current Thread's ID is 1 and output number 9
ParallelStream The Current Thread's ID is 1 and output number 6
ParallelStream The Current Thread's ID is 19 and output number 9
ParallelStream The Current Thread's ID is 18 and output number 1
ParallelStream The Current Thread's ID is 15 and output number 2
ParallelStream The Current Thread's ID is 17 and output number 4
ParallelStream The Current Thread's ID is 14 and output number 8
ParallelStream The Current Thread's ID is 13 and output number 3
ParallelStream The Current Thread's ID is 16 and output number 7
ParallelStream The Current Thread's ID is 1 and output number 5
ParallelStream forEach Ordered The Current Thread's ID is 15 and output number 1
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 2
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 3
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 4
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 5
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 6
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 7
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 8
ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 9
Disconnected from the target VM, address: '127.0.0.1:52976', transport: 'socket'
Process finished with exit code 0
通过上面的运行结果,我们可以看到通过ParallelStream方法迭代的方法,是采用多线程的,可以看过每次输出都是不同的线程ID,而ParallelStream(). forEach Ordered是在多线程的基础上,保证了数据的顺序输出。到此,我们验证了我们的猜测ParallelStream方法是多线程的,而关于线程是否并行的验证,我们需进一步修改下我们的代码,于是有了下面的代码:
public static void main(String[] args) throws InterruptedException {
System.out.println("运行结果:");
List<Integer> numberList = Arrays.asList(1,2,3,4,5,6,7,8,9);
//for
Long forBegin = System.currentTimeMillis();
for(Integer number : numberList){
//System.out.println(String.format("For The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
Thread.sleep(1000);
}
System.out.println(String.format("For execute time cost %d ms",System.currentTimeMillis()-forBegin));
System.out.println(" ");
// stream method
Long streamBegin = System.currentTimeMillis();
numberList.stream().forEach(number -> {
//System.out.println(String.format("Stream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
});
System.out.println(String.format("Stream execute time cost %d ms",System.currentTimeMillis()-streamBegin));
System.out.println(" ");
// parallelStream method
Long parallelStreamBegin = System.currentTimeMillis();
numberList.parallelStream().forEach(number -> {
//System.out.println(String.format("ParallelStream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
});
System.out.println(String.format("ParallelStream execute time cost %d ms",System.currentTimeMillis()-parallelStreamBegin));
System.out.println(" ");
// parallelStream method
Long parallelStreamForEachOrderBegin = System.currentTimeMillis();
numberList.parallelStream().forEachOrdered(number -> {
//System.out.println(String.format("ParallelStream forEachOrdered The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
});
System.out.println(String.format("ParallelStream forEachOrdered execute time cost %d ms",System.currentTimeMillis()-parallelStreamForEachOrderBegin));
System.out.println(" ");
}
这里我们加入了传统的for循环迭代方式,加入一起比较,由于要体现多线程并行的优势,这里我们将每次循环里加入线程休眠1秒钟,运行后的结果如下:
运行结果:
For execute time cost 9032 ms
Stream execute time cost 9079 ms
ParallelStream execute time cost 2011 ms
ParallelStream forEachOrdered execute time cost 9037 ms
通过运行结果,我们可以看到parallelStream().forEach方式耗时最短,而另外其他3种方式运行的耗时都几乎接近。因此,我们可以断定我们的猜测是正确的,parallelStream().forEach是通过多线程并行的方式来执行我们的代码,而parallelStream(). forEachOrdered也是采用多线程,但由于加入了顺序执行约束,故程序是采用多线程同步的方式运行的,最终耗时与for、stream两种单线程执行的耗时接近,但parallelStream(). forEachOrdered由于是多线程,与for、stream两种单线程的方式相比,优势在于很好的利用了CPU多核的资源。感兴趣的同学可以通过以下代码查看CPU的核数,并通过jstack dump出堆栈来查看线程对CPU使用的情况。
System.out.println("系统一共有"+Runtime.getRuntime().availableProcessors()+"个cpu");