【Game Engine Architecture 10】
1、Full-Screen Antialiasing (FSAA)
also known as super-sampled antialiasing (SSAA)。
the scene is rendered into a frame buffer that is larger than the actual screen. Once rendering of the frame is complete, the resulting oversized image is downsampled to the desired resolution. In 4x supersampling, the rendered image is twice as wide and twice as tall as the screen, resulting in a frame buffer that occupies four times the memory. It also requires four times the GPU processing power because the pixel shader must be run four times for each screen pixel.
As you can see, FSAA is an incredibly expensive technique both in terms of memory consumption and GPU cycles. As such, it is rarely used in practice.
2、Multisampled Antialiasing (MSAA)
visual quality comparable to that of FSAA, while consuming a great deal less GPU bandwidth (and the same amount of video RAM).
效果和 FSAA 同样好,只消耗很少的GPU,但消耗同样多的 vRAM。
To understand how MSAA works, recall that the process of rasterizing a triangle really boils down to three distinct operations:
1)coverage
2)depth testing
3)pixel shading
In MSAA, the coverage and depth tests are run for N points known as subsamples within each screen pixel. N is typically chosen to be 2, 4, 5, 8 or 16. However, the pixel shader is only run once per screen pixel, no matter how many subsamples we use. shading is typically a great deal more expensive than coverage and depth testing.
Nx MSAA,需要N个 depth、stencil、color buffer。
When rasterizing a triangle, the coverage and depth tests are run N times for the N subsamples . If at least one of the N tests indicates that the fragment should be drawn, the pixel shader is run once. The color obtained from the pixel shader is then stored only into those slots that correspond to the subsamples that fell inside the triangle.
Nx MSAA 需要进行N次 coverage、N次 depth testing。
Once the entire scene has been rendered, the oversized color buffer is downsampled to yield the final screen resolution image. This process involves averaging the color values found in the N subsample slots for each screen pixel. The net result is an antialiased image with a shading cost equal to that of a non-antialiased image.
MSAA 的 shading cost 与 非抗锯齿渲染一样。
3、Coverage Sample Antialiasing (CSAA)
将 pixel 拆分成 4x4 subpixel,用一个 16位的 short 来标记当前 triangle 占据了哪些 subpixel。
4、Morphological Antialiasing (MLAA)
focuses its efforts on correcting only those regions of a scene that suffer the most from the effects of aliasing. In MLAA, the scene is rendered at normal size, and then scanned in order to identify stair-stepped patterns. When these patterns are found, they are blurred to reduce the effects of aliasing. Fast approximate antialiasing (FXAA) is an optimized technique developed by Nvidia that is similar to MLAA in its approach.
5、Subpixel Morphological Antialiasing (SMAA)
combines morphological antialiasing (MLAA and FXAA) techniques with multisampling/supersampling strategies (MSAA, SSAA) to produce more accurate subpixel features.
Like FXAA, it’s an inexpensive technique, but it blurs the final image less than FXAA. For these reasons, it’s arguably the best AA solution available today.
SMAA 是目前最先进的搞锯齿技术。它的消耗和 FXAA一样,但效果比 FXAA好。
6、Occlusion and Potentially Visible Sets
Even when objects lie entirely within the frustum, they may occlude one another. Removing objects from the visible list that are entirely occluded by other objects is called occlusion culling.
Because automated PVS tools are imperfect, they typically provide the user with a mechanism for tweaking the results, either by manually placing vantage points for testing, or by manually specifying a list of regions that should be explicitly included or excluded from a particular region’s PVS.
7、Portals
To render a scene with portals, we start by rendering the region that contains the camera. Then, for each portal in the region, we extend a frustum-like volume consisting of planes extending from the camera’s focal point through each edge of the portal’s bounding polygon. The contents of the neighboring region can be culled to this portal volume in exactly the same way geometry is culled against the camera frustum. This ensures that only the visible geometry in the adjacent regions will be rendered. Figure 11.48 provides an illustration of this technique.
8、Occlusion Volumes (Antiportals)
If we flip the portal concept on its head, pyramidal volumes can also be used to describe regions of the scene that cannot be seen because they are being occluded by an object. These volumes are known as occlusion volumes or antiportals.
9、Render State
The set of all configurable parameters within the GPU pipeline is known as the hardware state or render state.
10、Geometry Sorting
Clearly we’d like to change render settings as infrequently as possible. The best way to accomplish this is to sort our geometry by material.
Unfortunately, sorting geometry by material can have a detrimental effect on rendering performance because it increases overdraw—a situation in which the same pixel is filled multiple times by multiple overlapping triangles.
合批会导致 overdraw 增大。
11、z-Prepass to the Rescue
How can we reconcile the need to sort geometry by material with the conflicting need to render opaque geometry in a front-to-back order? The answer lies in a GPU feature known as z-prepass.
The idea behind z-prepass is to render the scene twice: the first time to generate the contents of the z-buffer as efficiently as possible and the second time to populate the frame buffer with full color information (but this time
with no overdraw, thanks to the contents of the z-buffer). The GPU provides a special double-speed rendering mode in which the pixel shaders are disabled, and only the z-buffer is updated.
z-prepass 技术可以解决 batching 导致的 overdraw 问题。渲染2遍,第一遍生成 depth buffer,第二遍才真正渲染。
Order-independent transparency (OIT) is a technique that permits transparent geometry to be rendered in an arbitrary order. It works by storing multiple fragments per pixel, sorting each pixel’s fragments and blending them only after the entire scene has been rendered. This technique produces correct results without the need for pre-sorting the geometry, but it comes at a high memory cost because the frame buffer must be large enough to store all of the translucent fragments for each pixel.
12、Quadtrees and Octrees
quadtrees are often used to store renderable primitives such as 1)mesh instances, 2)subregions of terrain geometry or 3)individual triangles of a large static mesh, for the purposes of efficient frustum culling.
The renderable primitives are stored at the leaves of the tree, and we usually aim to achieve a roughly uniform number of primitives within each leaf region.
13、Bounding Sphere Trees
14、BSP Trees
A kd-tree is a generalization of the BSP tree concept to k dimensions.
BSP tree can also be used to sort triangles into a strictly back-to-front or front-to-back order.
15、Image-Based Lighting
16、Heightmaps: Bump, Parallax and Displacement Mapping
In bump mapping, a heightmap is used as a cheap way to generate surface normals. This technique was primarily used in the early days of 3D graphics—nowadays, most game engines store surface normal information explicitly in a normal map, rather than calculating the normals from a heightmap.
Parallax occlusion mapping uses the information in a heightmap to artificially adjust the texture coordinates used when rendering a flat surface, in such a way as to make the surface appear to contain surface details that move semi-correctly as the camera moves.
Displacement mapping (also known as relief mapping) produces real surface details by actually tessellating and then extruding surface polygons, again using a heightmap to determine how much to displace each vertex. This produces
the most convincing effect
17、Specular/Gloss Maps
the specular intensity takes the form Ks(R.V)^a.
将 Ks 保存为纹理的技术,叫做 specular map、gloss map、specular mask。
将 a 保存为纹理的技术,叫做 specular power map。
18、Environment Mapping
It is generally used to inexpensively render reflections.
The two most common formats are spherical environment maps and cubic environment maps.
cubic environment maps 优于 spherical environment map.
19、Shadow Volumes
the GPU can be configured so that rendered geometry updates the values in the stencil buffer in various useful ways.
geometry 更新 stencil buffer。
1)the scene is first drawn to generate an unshadowed image in the frame buffer, along with an accurate z-buffer. The stencil buffer is cleared so that it contains zeros at every pixel.
2)Each shadow volume is then rendered from the point of view of the camera in such a way that front-facing triangles increase the values in the stencil buffer by one, while back-facing triangles decrease them by one. In areas of the screen where the shadow volume does not appear at all, of course the stencil buffer’s pixels will be left containing zero.
3)render shadows in a third pass, by simply darkening those regions of the screen that contain a nonzero stencil buffer value.
20、Shadow Maps
First, a shadow map texture is generated by rendering the scene from the point of view of the light source and saving off the contents of the depth buffer.
Second, the scene is rendered as usual, and the shadow map is used to determine whether or not each fragment is in shadow.
21、Ambient Occlusion
22、Reflections
Environment maps are used to produce general reflections of the surrounding environment on the surfaces of shiny objects.
Direct reflections in flat surfaces like mirrors can be produced by reflecting the camera’s position about the plane of the reflective surface and then rendering the scene from that reflected point of view into a texture. The texture is then applied to the reflective surface in a second pass.
23、Caustics
Caustic effects can be produced by projecting a (possibly animated) texture containing semi-random bright highlights onto the affected surfaces.
24、Subsurface Scattering
When light enters a surface at one point, is scattered beneath the surface, and then reemerges at a different point on the surface, we call this subsurface scattering.
This phenomenon is responsible for the “warm glow” of human skin, wax and marble statues.
25、Precomputed Radiance Transfer (PRT)
26、Deferred Rendering
In deferred rendering, the majority of the lighting calculations are done in screen space, not view space. We efficiently render the scene without worrying about lighting. During this phase, we store all the information we’re going to need to light the pixels in a “deep” frame buffer known as the G-buffer. Once the scene has been fully rendered, we use the information in the G-buffer to perform our lighting and shading calculations.
27、Physically Based Shading
28、Decals
A decal is a relatively small piece of geometry that is overlaid on top of the regular geometry in the scene, allowing the visual appearance of the surface to be modified dynamically. Examples include bullet holes, foot prints, scratches, cracks.
29、Sky
On modern game platforms, where pixel shading costs can be high, sky rendering is often done after the rest of the scene has been rendered.
30、Terrain
31、Water
There are lots of different kinds of water, including oceans, pools, rivers, waterfalls, fountains, jets, puddles(水坑) and damp(潮湿) solid surfaces. Each type of water generally requires some specialized rendering technology.
32、Text and Fonts
a text rendering system needs to be capable of displaying a sequence of character glyphs corresponding to a text string.
A font is often implemented via a texture map known as a glyph atlas. A font description file provides information such as the bounding boxes of each glyph within the texture, and font layout information such as kerning, baseline offsets and so on.
The FreeType library enables a game or other application to read fonts in a wide variety of formats, including TrueType (TTF) and OpenType (OTF), and to render glyphs into in-memory pixmaps at any desired point size. FreeType renders each glyph using its Bezier curve outlines, so it produces very accurate results.
FreeType library 将 glyphs 渲染进 pixmaps中。
Typically a real-time application like a game will use FreeType to prerender the necessary glyphs into an atlas, which is in turn used as a texture map to render glyphs as simple quads every frame. However, by embedding FreeType or a similar library in your engine, it’s possible to render some glyphs into the atlas on the fly, on an as-needed basis. This can be useful when rendering text in a language with a very large number of possible glyphs, like Chinese or Korean.
FreeType 离线、实时将 glyphs 渲染成 texture map,供 text rendering engine 使用。
signed distance fields,each pixel contains a signed distance from that pixel center to the nearest edge of the glyph. Inside the glyph, the distances are negative; outside the glyph’s outlines, they are positive.
33、Gamma Correction
CRT monitors tend to have a nonlinear response to luminance values. Visually, the dark regions of the image would look darker than they should.
gCRT > 1. To correct for this effect, the colors sent to the CRT display are usually passed through an inverse transformation (i.e., using a gamma value gcorr < 1). The value of gCRT for a typical CRT monitor is 2.2, so the correction value is usually gcorr = 1/2.2 = 0.455.
One problem that is encountered, however, is that the bitmap images used to represent texture maps are often gamma-corrected themselves.
34、Full-Screen Post Effects
35、
36、
37、