GPU对数据的操作不可累加

我想当然的认为GPU处理数据时可以共同访问内存，所以对数据的操作是累加的。

事实证明：虽然GPU多个核可以访问同一块内存，但彼此之间没有依赖关系，它们对这块内存的作用无法累加。

先看代码:

#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/for_each.h>

using namespace std;

struct testfunc
{
    float* list;
	int size;
 
    __host__ __device__
    void operator()(const int& idx) const{
        for(int i=0;i<size;++i){
            list[i]-=(float)0.1;
        }
    }
};


int main(int argc, char* argv[]){

	thrust::device_vector<float> vlist(100,(float)10);
	testfunc fn;
	fn.size=vlist.size();
	fn.list=vlist.data().get();
	
	thrust::for_each(
		thrust::counting_iterator<int>(0),
		thrust::counting_iterator<int>(0)+11,
		fn
	);

	for (int i=0;i<10;++i){
		cout<<vlist[i]<<" ";
	}
	cout<<endl;

	return 0;
}

这里我在GPU的内存中创建了一个数组vlist，其每个单元值为10。

之后我用了11个核，每个核都对数组vlist的每个元素减0.1，如果结果能够累加，那么运行结束后vlist每个元素的值应该为10-0.1*11=8.9。

但实际结果是：9.9

相当于只保留了一个核的结果……果然是并行啊~

相关阅读:
Linux ReviewBoard安装与配置
窗口部件预防式验证
python-Levenshtein几个计算字串相似度的函数解析
exactly-once和kafka
关于回归树的创建和剪枝
Python神坑：sum和numpy.sum
Python问题汇总
ES查询之刨根问底
安装ES
KNN手写实践：Python基于数据集整体计算以及排序

原文地址：https://www.cnblogs.com/plwang1990/p/4184237.html