CUDA PROGRAM STRUCTURE
A typical CUDA program structure consists of fi ve main steps:
1. Allocate GPU memories.
2. Copy data from CPU memory to GPU memory.
3. Invoke the CUDA kernel to perform program-specifi c computation.
4. Copy data back from GPU memory to CPU memory.
5. Destroy GPU memories.
In the simple program hello.cu, you only see the third step: Invoke the kernel. For
the remainder of this book, examples will demonstrate each step in the CUDA program
structure.
一个典型的CUDA程序结构由以下五个主要步骤组成:
1分配GPU的记忆。
2. 将数据从CPU内存复制到GPU内存。
3.调用CUDA内核执行program-specifi计算。
4. 将数据从GPU内存复制到CPU内存。
5. 摧毁GPU的记忆。
在简单的程序中hello.cu,你只看到第三步:调用内核。在本书的其余部分,示例将演示CUDA程序结构中的每个步骤。
#include <stdio.h> #include"cuda_runtime.h" __global__ void helloFromGPU(void) { printf("Hello World from GPU! "); } int main(void) { // hello from cpu printf("Hello World from CPU! "); helloFromGPU <<<10, 1 >>> (); cudaDeviceReset(); return 0; }