VBO, PBO与FBO

OpenGL Pixel Buffer Object (PBO)

Related Topics: Vertex Buffer Object (VBO), Frame Buffer Object (FBO)
Download: pboUnpack.zip, pboPack.zip

Overview
Creating PBO
Mapping PBO
Example: Streaming Texture Uploads with PBO
Example: Asynchronous Readback with PBO

Overview

OpenGL PBO

OpenGL ARB_pixel_buffer_object extension is very close to ARB_vertex_buffer_object. It simply expands ARB_vertex_buffer_object extension in order to store not only vertex data but also pixel data into the buffer objects. This buffer object storing pixel data is called Pixel Buffer Object (PBO). ARB_pixel_buffer_object extension borrows all VBO framework and APIs, plus, adds 2 additional "target" tokens. These tokens assist the PBO memory manger (OpenGL driver) to determine the best location of the buffer object; system memory, AGP (shared memory) or video memory. Also, the target tokens clearly specify the bound PBO will be used in one of 2 different operations; GL_PIXEL_PACK_BUFFER_ARB to transfer pixel data to a PBO, or GL_PIXEL_UNPACK_BUFFER_ARB to transfer pixel data from PBO.

For example, glReadPixels() and glGetTexImage() are "pack" pixel operations, and glDrawPixels(), glTexImage2D() and glTexSubImage2D() are "unpack" operations. When a PBO is bound with GL_PIXEL_PACK_BUFFER_ARB token, glReadPixels() reads pixel data from a OpenGL framebuffer and write (pack) the data into the PBO. When a PBO is bound with GL_PIXEL_UNPACK_BUFFER_ARB token, glDrawPixels() reads (unpack) pixel data from the PBO and copy them to OpenGL framebuffer.

The main advantage of PBO are fast pixel data transfer to and from a graphics card through DMA (Direct Memory Access) without involing CPU cycles. And, the other advantage of PBO is asynchronous DMA transfer. Let's compare a conventional texture transfer method with using a Pixel Buffer Object. The left side of the following diagram is a conventional way to load texture data from an image source (image file or video stream). The source is first loaded into system memory, and then, copied from system memory to an OpenGL texture object with glTexImage2D(). These 2 transfer processes (load and copy) are all performed by CPU.

Texture loading without PBO

Texture loading with PBO

On the contrary in the right side diagram, the image source can be directly loaded into a PBO, which is controlled by OpenGL. CPU still involves to load the source to the PBO, but, not for transferring the pixel data from a PBO to a texture object. Instead, GPU (OpenGL driver) manages copying data from a PBO to a texture object. This means OpenGL performs a DMA transfer operation without wasting CPU cycles. Further, OpenGL can schedule an asynchronous DMA transfer for later execution. Therefore, glTexImage2D() returns immediately, and CPU can perform something else without waiting the pixel transfer is done.

There are 2 major PBO approaches to improve the performance of the pixel data transfer: streaming texture update and asynchronous read-back from the framebuffer.

Creating PBO

As mentioned earlier, Pixel Buffer Object borrows all APIs from Vertex Buffer Object. The only difference is there are 2 additional tokens for PBOs: GL_PIXEL_PACK_BUFFER_ARB and GL_PIXEL_UNPACK_BUFFER_ARB. GL_PIXEL_PACK_BUFFER_ARB is for transferring pixel data from OpenGL to your application, and GL_PIXEL_UNPACK_BUFFER_ARB means transferring pixel data from an application to OpenGL. OpenGL refers to these tokens to determine the best memory space of a PBO, for example, a video memory for uploading (unpack) textures, or system memory for reading (pack) framebuffer. However, these target tokens are solely hint. OpenGL driver decides the appropriate location for you.

Creating a PBO requires 3 steps;

Generate a new buffer object with glGenBuffersARB().
Bind the buffer object with glBindBufferARB().
Copy pixel data to the buffer object with glBufferDataARB().

If you specify a NULL pointer to the source array in glBufferDataARB(), then PBO allocates only a memory space with the given data size. The last parameter of glBufferDataARB() is another performance hint for PBO to provide how the buffer object will be used. GL_STREAM_DRAW_ARB is for streaming texture upload and GL_STREAM_READ_ARB is for asynchronous framebuffer read-back.

Please check VBO for more details.

Mapping PBO

PBO provides a memory mapping mechanism to map the OpenGL controlled buffer object to the client's address space. So, the client can modify a portion of the buffer object or the entire buffer by using glMapBufferARB() and glUnmapBufferARB().

void* glMapBufferARB(GLenum target, GLenum access) GLboolean glUnmapBufferARB(GLenum target)

glMapBufferARB() returns the pointer to the buffer object if success. Otherwise it returns NULL. The target parameter is GL_PIXEL_PACK_BUFFER_ARB or GL_PIXEL_UNPACK_BUFFER_ARB. The second parameter, access specifies what to do with the mapped buffer; read data from the PBO (GL_READ_ONLY_ARB), write data to the PBO (GL_WRITE_ONLY_ARB), or both (GL_READ_WRITE_ARB).

Note that if GPU is still working with the buffer object, glMapBufferARB() will not return until GPU finishes its job with the corresponding buffer object. To avoid this stall(wait), call glBufferDataARB() with NULL pointer right before glMapBufferARB(). Then, OpenGL will discard the old buffer, and allocate new memory space for the buffer object.

The buffer object must be unmapped with glUnmapBufferARB() after use of the PBO. glUnmapBufferARB() returns GL_TRUE if success. Otherwise, it returns GL_FALSE.

Example: Streaming Texture Uploads

Streaming Texture with PBO

Download the source and binary: pboUnpack.zip.

This demo application uploads (unpack) streaming textures to an OpenGL texture object using PBO. You can switch to the different transfer modes (single PBO, double PBOs and without PBO) by pressing the space key, and compare the performance differences.

The texture sources are written directly on the mapped pixel buffer every frame in the PBO modes. Then, these data are transferred from the PBO to a texture object using glTexSubImage2D(). By using PBO, OpenGL can perform asynchronous DMA transfer between a PBO and a texture object. It significantly increases the texture upload performance. If asynchronous DMA transfer is supported, glTexSubImage2D() should return immediately, and CPU can process other jobs without waiting the actual texture copy.

Streaming texture uploads with 2 PBOs

To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.

For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.

// "index" is used to copy pixels from a PBO to a texture object
// "nextIndex" is used to update pixels in the other PBO
index = (index + 1) % 2;
if(pboMode == 1)                // with 1 PBO
    nextIndex = index;
else if(pboMode == 2)           // with 2 PBOs
    nextIndex = (index + 1) % 2;

// bind the texture and PBO
glBindTexture(GL_TEXTURE_2D, textureId);
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[index]);

// copy pixels from PBO to texture object
// Use offset instead of ponter.
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, WIDTH, HEIGHT,
                GL_BGRA, GL_UNSIGNED_BYTE, 0);


// bind PBO to update texture source
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]);

// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// until GPU to finish its job. To avoid waiting (idle), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, DATA_SIZE, 0, GL_STREAM_DRAW_ARB);

// map the buffer object into client's memory
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB,
                                        GL_WRITE_ONLY_ARB);
if(ptr)
{
    // update data directly on the mapped buffer
    updatePixels(ptr, DATA_SIZE);
    glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB); // release the mapped buffer
}

// it is good idea to release PBOs with ID 0 after use.
// Once bound with 0, all pixel operations are back to normal ways.
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

Example: Asynchronous Read-back

Asynchronous readback with PBO

Download the source and binary: pboPack.zip.

This demo application reads (pack) the pixel data from the framebuffer (left-side) to a PBO, then, draws it back to the right side of the window after modifying the brightness of the image. You can toggle PBO on/off by pressing the space key, and measure the performance of glReadPixels().

Conventional glReadPixels() blocks the pipeline and waits until all pixel data are transferred. Then, it returns control to the application. On the contrary, glReadPixels() with PBO can schedule asynchronous DMA transfer and returns immediately without stall. Therefore, the application (CPU) can execute other process right away, while transferring data with DMA by OpenGL (GPU).

Asynchronous glReadPixels() with 2 PBOs

This demo uses 2 pixel buffers. At frame n, the application reads the pixel data from OpenGL framebuffer to PBO 1 using glReadPixels(), and processes the pixel data in PBO 2. These read and process can be performed simultaneously, because glReadPixels() to PBO 1 returns immediately and CPU starts to process data in PBO 2 without delay. And, we alternate between PBO 1 and PBO 2 on every frame.

// "index" is used to read pixels from framebuffer to a PBO
// "nextIndex" is used to update pixels in the other PBO
index = (index + 1) % 2;
nextIndex = (index + 1) % 2;

// set the target framebuffer to read
glReadBuffer(GL_FRONT);

// read pixels from framebuffer to PBO
// glReadPixels() should return immediately.
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[index]);
glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0);

// map the PBO to process its data by CPU
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[nextIndex]);
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB,
                                        GL_READ_ONLY_ARB);
if(ptr)
{
    processPixels(ptr, ...);
    glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);
}

// back to conventional pixel operation
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);

PBO,即Pixel Buffer Object也是用于GPU的扩展（ARB_vertex_buffer_object）。这里的缓存当然就是GPU的缓存。PBO与VBO扩展类似，只不过它存储的是像素数据而不是顶点数据。PBO借用了VBO框架和所有API函数形式，并加了上两个"target"标志。这两个标识是：

GL_PIXEL_PACK_BUFFER_ARB 将像素数据传给PBO
GL_PIXEL_UNPACK_BUFFER_ARB 从PBO得到像素数据

这里的“pack”还是“unpack”，可分别理解为“传给”和“得到”。它们也都可以统一理解为“拷贝”，也就是像素数据的“传递”。

比如说，glReadPixel就是数据从帧缓存（framebuffer）到内存（memory），可理解为“pack”；glDrawPixel是从内存到帧缓存，可理解为“unpack”；glGetTexImage是从纹理对象到内存，可理解为“pack”；glTexImage2d从内存（memory）到纹理对象（texture object），可理解为“unpack”。

下图是PBO与Framebuffer和Text对象之间的传递。

VBO,

图1 opengl PBO

使用PBO的好处是快速的像素数据传递，它采用了一种叫DMA（Direct Memory Access）的技术，无需CPU介入。另一个PBO的优点是，这种DMA是异步的。我们可以通过下面两张图来比较使用PBO的与传统的纹理传递的过程。

图2是用传统的方法从图像源（如图像文件或视频）载入图像数据到纹理对象的过程。像素数据首先存到系统内存中，接着使用glTexImage2D将数据从系统内存拷贝到纹理对象。包含的两个子过程均需要有CPU执行。而从图3中，我们可以看到像素数据直接载入到PBO中，这个过程仍需要CPU来执行，但是从数据从PBO到纹理对象的过程则由GPU来执行DMA，而不需要CPU参与。而且opengl可安排异步DMA，不必马上进行像素数据的传递。因此，相比而言，图3中的glTexImage2D立即返回而不是马上执行，这样CPU可以执行其它的操作而不需要等待像素数据传递的结束。

VBO,

图2 不使用PBO的纹理载入

VBO,

图3 使用PBO的纹理载入

GL_PIXEL_PACK_BUFFER_ARB用于将像素数据从opengl传递给应用程序，GL_PIXEL_UNPACK_BUFFER_ARB则是将像素数据从应用程序传递给opengl。

生成PBO

生成PBO分成3步：

1. 用glGenBuffersARB()生成缓存对象；

2. 用glBindBufferARB()绑定缓存对象；

3. 用glBufferDataARB()将像素数据拷贝到缓存对象。

如果在glBufferDataARB函数中将一个NULL的指针给源数组，PBO只会为之分配一个给定大小的内存空间。glBufferDataARB的另一个参数是关于PBO的性能参数（hint)，表明缓存对象如何使用。该参数取GL_STREAM_DRAW_ARB表明是载入，GL_STREAM_READ_ARB表明是异步帧缓存读出。

映射PBO

PBO提供了一种将opengl控制的缓存对象与客户端地址空间进行内存映射的机制。所以客户端可通过glMapBufferARB()和glUnmapbufferARB就可以修改缓存对象的部分或整个数据。

void* glMapBufferARB(GLenum target, GLenum access);

GLboolean glUnmapBufferARB(GLenum target);
glMapBufferARB返回缓存对象的指针。参数target取GL_PIXEL_PACK_BUFFER_ARB或GL_PIXEL_UNPACK_BUFFER_ARB，参数access是能对映射缓存的操作，可取GL_READ_ONLY_ARB、GL_WRITE_ONLY_ARB和GL_READ_WRITE_ARB，分别表明可从PBO读、可向PBO写，可从PBO读也可向PBO写。

要注意：如果GPU正在对缓存对象进行操作，glMapBufferARB不会返回缓存对象直到GPU结束了对缓存对象的处理。为了避免等待，在调用glMapBufferARB之前，先调用glBufferDataARB（用NULL指针作为参数），这时OpenGL将丢弃老的缓存对象，为新的缓存对象分配空间。

在客户端使用PBO后，应调用glUnmapBufferARB来取消映射。glUnmapBufferARB返回GL_TRUE表明成功，否则返回GL_FALSE。

____________________________________________________

Demo

例子程序pboUnpack.zip使用了不同方式来比较将纹理传给OpenGL的模式：

使用一个PBO；
使用两个PBO；
不使用PBO；

通过按空格键，可以不同模式间切换。

在PBO模式下，每帧的纹理源(像素)都是在映射PBO状态下直接写进去的。再通过调用glTexSubImage2D将PBO中的像素传递给纹理对象。通过使用纹理对象可以在PBO和纹理对象间进行异步DMA传递。它能大大提高像素传递的性能。
由于glTexSubImage2D立即返回，因此CPU能够直接进行其它工作，无需等待实际的像素传递。

VBO,

图4 两个PBO更新纹理

为了将像素传递的性能最大化，可以使用多个PBO对象。图4中表明同时使用了两个PBO。在glTexSubImage2D将像素数据从PBO拷贝出来的同时，另一份像素数据写进了另一个PBO。
在第n帧时，PBO1用于glTexSubImage2D，而PBO2用于生成一个新的纹理对象了。再到n+1帧时，两个PBO则互换了角色。由于异步DMA传递，像素数据的更新和拷贝过程可同时进行，即CPU将纹理源更新到PBO，同时GPU将从另一PBO中拷贝出纹理。

VBO,

例子程序pboPack.zip从窗口的左边读出（pack）像素数据到PBO，在更改它的亮度后，把它在窗口的右边绘制出来。通过按空格键，可以看glReadPixels的性能。

传统使用glReadPixels将阻塞渲染管道（流水线），直到所有的像素数据传递完成，才会将控制权交还给应用程序。相反，使用PBO的glReadPixels可以调度异步DMA传递，能够立即返回而不用等待。因此，CPU可以在OpenGL(GPU)传递像素数据的时候进行其它处理。

VBO,

图5 用两个PBO异步glReadPixels

例子程序也使用了两个PBO，在第n帧时，应用帧缓存读出像素数据到PBO1中，同时在PBO中对像素数据进行处理。读与写的过程可同时进行，是因为，在调用glReadPixels时立即返回了，而CPU立即处理PBO2而不会有延迟。在下一帧时，PBO1和PBO2的角色互换。

VBO,

OpenGL中渲染到纹理(Render to Texture)不同实现的比较

[align=center][b][size=6]渲染到纹理（Render to Texture） [/size][/b][/align][size=3][/size]
[size=3]

渲染到纹理实现方式：

1. 渲染到帧缓存，然后用glReadPixels将需要的部分读入到客户内存，然后使用glTexImage（）函数创建纹理
[/size]
[size=3]    缺点：比较慢

2.渲染到帧缓存，然后使用glCopyTexImage() 直接从帧缓存创建纹理

3.渲染到帧缓存，然后使用glCopyTexSubImage() 从帧缓存中读出需要的部分更新纹理的一部分

4.使用 pbuffer，直接渲染到纹理[/size]
[size=3]
   需要的扩展：
   WGL_ARB_extensions_string
   WGL_ARB_render_texture
   WGL_ARB_pbuffer
   WGL_ARB_pixel_format

   缺点：
   只可以在windows系统上使用。
   每一个pbuffer在不同的OpenGL上下文中工作，管理麻烦
   pbuffer之间切换开销很大

用pbuffer的DC 和RC作为渲染设备和渲染上下文执行渲染

5. 使用 framebuffer object (FBO)扩展: GL_EXT_framebuffer_object [/size]

相关阅读:
我的未来。
我的的第一篇博客
 从软件工程角度回顾本科毕业论文
 从高级软件工程角度分析毕业设计-小结-盛超
 从软件工程视角，回顾分析本科毕业设计软件中存在的不足问题
 从软件工程的角度分析本科毕业设计
 从高级软件工程角度分析本科毕业设计
 从软件工程的视角，回顾本科毕业设计，探视设计中存在的不足
 用软件工程思想看毕业设计
 从软件工程角度分析毕业设计
原文地址：https://www.cnblogs.com/lizhengjin/p/1912961.html