Pages

Showing posts with label rendering. Show all posts
Showing posts with label rendering. Show all posts

Tech Feature: SSAO and Temporal Blur


Screen space ambient occlusion (SSAO) is the standard solution for approximating ambient occlusion in video games. Ambient occlusion is used to represent how exposed each point is to the indirect lighting from the scene. Direct lightingis light emitted from a light source, such as a lamp or a fire. The direct light then illuminates objects in the scene. These illuminated objects make up the indirect lighting. Making each object in the scene cast indirect lighting is very expensive. Ambient occlusion is a way to approximate this by using a light source with constant color and information from nearby geometry to determine how dark a part of an object should be. The idea behind SSAO is to get geometry information from the depth buffer.

There are many publicised algorithms for high quality SSAO. This tech feature will instead focus on improvements that can be made after the SSAO has been generated.



SSAO Algorithm
SOMA uses a fast and straightforward algorithm for generating medium frequency AO. The algorithm runs at half resolution which greatly increases the performance. Running at half resolution doesn’t reduce the quality by much, since the final result is blurred.

For each pixel on the screen, the shader calculates the position of the pixel in view space and then compares that position with the view space position of nearby pixels. How occluded the pixel gets is based on how close the points are to each other and if the nearby point is in front of the surface normal. The occlusion for each nearby pixel is then added together for the final result. 

SOMA uses a radius of 1.5m to look for nearby points that might occlude. Sampling points that are outside of the 1.5m range is a waste of resources, since they will not contribute to the AO. Our algorithm samples 16 points in a growing circle around the main pixel. The size of the circle is determined by how close the main pixel is to the camera and how large the search radius is. For pixels that are far away from the camera, a radius of just a few pixels can be used. The closer the point gets to the camera the more the circle grows - it can grow up to half a screen. Using only 16 samples to select from half a screen of pixels results in a grainy result that flickers when the camera is moving.
Grainy result from the SSAO algorithm
Bilateral Blur
Blurring can be used to remove the grainy look of the SSAO. Blur combines the value of a large number of neighboring pixels. The further away a neighboring pixel is, the less the impact it will have on the final result. Blur is run in two passes, first in the horizontal direction and then in the vertical direction.

The issue with blurring SSAO this way quickly becomes apparent. AO from different geometry leaks between boundaries causing a bright halo around objects. Bilateral weighting can be used to fix the leaks between objects. It works by comparing the depth of the main pixel to the depth of the neighboring pixel. If the distance between the depth of the main and the neighbor is outside of a limit the pixel will be skipped. In SOMA this limit is set to 2cm.
To get good-looking blur the number of neighboring pixels to sample needs to be large. Getting rid of the grainy artifacts requires over 17x17 pixels to be sampled at full resolution.

Temporal Filtering 
Temporal Filtering is a method for reducing the flickering caused by the low number of samples. The result from the previous frame is blended with the current frameto create smooth transitions. Blending the images directly would lead to a motion-blur-like effect. Temporal Filtering removes the motion blur effect by reverse reprojecting the view space position of a pixel to the view space position it had the previous frame and then using that to sample the result. The SSAO algorithm runs on screen space data but AO is applied on world geometry. An object that is visible in one frame may not be seen in the next frame, either because it has moved or because the view has been blocked by another object. When this happens the result from the previous frame has to be discarded. The distance between the points in world space determines how much of the result from the previous frame should be used.

Explanation of Reverse Reprojection used in Frostbite 2 [2]
Temporal Filtering introduces a new artifact. When dynamic objects move close to static objects they leave a trail of AO behind. Frostbite 2’s implementation of Temporal Filtering solves this by disabling the Temporal Filter for stable surfaces that don’t get flickering artifacts. I found another way to remove the trailing while keeping Temporal Filter for all pixels.


Shows the trailing effect that happens when a dynamic object is moved. The Temporal Blur algorithm is then applied and most of the trailing is removed.

Temporal Blur 

(A) Implementation of Temporal Filtered SSAO (B) Temporal Blur implementation 
I came up with a new way to use Temporal Filtering when trying to remove the trailing artifacts. By combining two passes of cheap blur with Temporal Filtering all flickering and grainy artifacts can be removed without leaving any trailing. 

When the SSAO has been rendered, a cheap 5x5 bilateral blur pass is run on the result. Then the blurred result from the previous frame is applied using Temporal Filtering. A 5x5 bilateral blur is then applied to the image. In addition to using geometry data to calculate the blending amount for the Temporal Filtering the difference in SSAO between the frames is used, removing all trailing artifacts. 

Applying a blur before and after the Temporal Filtering and using the blurred image from the previous frame results in a very smooth image that becomes more blurred for each frame, it also removes any flickering. Even a 5x5 blur will cause the resulting image to look as smooth as a 64x64 blur after a few frames.

Because the image gets so smooth the upsampling can be moved to after the blur. This leads to Temporal Blur being faster, since running four 5x5 blur passes in half resolution is faster than running two 17x17 passes in full resolution. 

Upsampling
All of the previous steps are performed in half resolution. To get the final result it has to be scaled up to full resolution. Stretching the half resolution image to twice its size will not look good. Near the edges of geometry there will be visible bleeding; non-occluded objects will have a bright pixel halo around them. This can be solved using the same idea as the bilateral blurring. Normal linear filtering is combined with a weight calculated by comparing the distance in depth between the main pixel and the depth value of the four closest half resolution pixels.

Summary 
Combining SSAO with the Temporal Blur algorithm produces high quality results for a large search radius at a low cost. The total cost of the algoritm is 1.1ms (1920x1080 AMD 5870). This is more than twice as fast as a normal SSAO implementation.

SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.

Temporal Blur could be used to improve many other post effects that need to produce smooth-looking results.

Ambient Occlusion is only one part of the rendering pipeline, and it should be combined with other lighting techniques to give the final look.


References
  1. http://gfx.cs.princeton.edu/pubs/Nehab_2007_ARS/NehEtAl07.pdf 
  2. http://dice.se/wp-content/uploads/GDC12_Stable_SSAO_In_BF3_With_STF.pdf 

 // SSAO Main loop

//Scale the radius based on how close to the camera it is
 float fStepSize = afStepSizeMax * afRadius / vPos.z;
 float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));    

 for(float d = 0.0; d < 16.0; d+=4.0)
 {
        //////////////
        // Sample four points at the same time

        vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
        
        //////////////////////
        // Rotate the samples

        vec2 vUV1 = mtxRot * vUV0;
        vUV0 = mtxRot * vUV1;

        vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
        vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
        vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
        vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;

        vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
                                 dot(vDelta1, vDelta1),
                                 dot(vDelta2, vDelta2),
                                 dot(vDelta3, vDelta3));

        vec4 vInvertedLength = inversesqrt(vDistanceSqr);

        vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;

        vec4 vAngle = vec4(dot(vNormal, vDelta0),
                            dot(vNormal, vDelta1),
                            dot(vNormal, vDelta2),
                            dot(vNormal, vDelta3)) * vInvertedLength;


        ////////////////////
        // Calculates the sum based on the angle to the normal and distance from point

        fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));
}

//////////////////////////////////
// Get the final AO by multiplying by number of samples
fAO = max(0, 1.0 - fAO / 16.0);

------------------------------------------------------------------------------ 

// Upsample Code
 
vec2 vClosest = floor(gl_FragCoord.xy / 2.0);
vec2 vBilinearWeight = vec2(1.0) - fract(gl_FragCoord.xy / 2.0);

float fTotalAO = 0.0;
float fTotalWeight = 0.0;

for(float x = 0.0; x < 2.0; ++x)
for(float y = 0.0; y < 2.0; ++y)
{
       // Sample depth (stored in meters) and AO for the half resolution 
       float fSampleDepth = textureRect(aHalfResDepth, vClosest + vec2(x,y));
       float fSampleAO = textureRect(aHalfResAO, vClosest + vec2(x,y));

       // Calculate bilinear weight
       float fBilinearWeight = (x-vBilinearWeight .x) * (y-vBilinearWeight .y);
       // Calculate upsample weight based on how close the depth is to the main depth
       float fUpsampleWeight = max(0.00001, 0.1 - abs(fSampleDepth – fMainDepth)) * 30.0;

       // Apply weight and add to total sum
       fTotalAO += (fBilinearWeight + fUpsampleWeight) * fSampleAO;
       fTotalWeight += (fBilinearWeight + fUpsampleWeight);
}

// Divide by total sum to get final AO
float fAO = fTotalAO / fTotalWeight;

-------------------------------------------------------------------------------------

// Temporal Blur Code

//////////////////
// Get current frame depth and AO

vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
float fMainDepth = textureRect(aHalfResDepth, vScreenPos.xy);   

//////////////////
// Convert to view space position
vec3 vPos = ScreenCoordToViewPos(vScreenPos, fMainDepth);

/////////////////////////
// Convert the current view position to the view position it 
// would represent the last frame and get the screen coords
vPos = (a_mtxPrevFrameView * (a_mtxView
Inv * vec4(vPos, 1.0))).xyz;

vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);

       
//////////////
// Get the AO from the last frame

float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);

float f
PrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);

/////////////////
// Get to view space position of temporal coords

vec3 vTemporalPos =
ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);
       

///////
// Get weight based on distance to last frame position (removes ghosting artifact)

float fWeight = distance(vTemporalPos, vPos) * 9.0;

////////////////////////////////
// And weight based on how different the amount of AO is (removes trailing artifact)
// Only works if both fAO and fPrevFrameAO is blurred
fWeight += abs(
fPrevFrameAO - fAO ) * 5.0;

////////////////
// Clamp to make sure atleast 1.0 / FPS of a frame is blended

fWeight = clamp(fWeight, afFrameTime, 1.0);       
fAO = mix(fPrevFrameAO , fAO , fWeight);
   
------------------------------------------------------------------------------


Tech Feature: Linear-space lighting


Linear-space lighting is the second big change that has been made to the rendering pipeline for HPL3. Working in a linear lighting space is the most important thing to do if you want correct results.
It is an easy and inexpensive technique for improving the image quality. Working in linear space is not something the makes the lighting look better, it just makes it look correct.

(a) Left image shows the scene rendered without gamma correction 
(b) Right image is rendered with gamma correction

Notice how the cloth in the image to the right looks more realistic and how much less plastic the specular reflections are.
Doing math in linear space works just as you are used to. Adding two values returns the sum of those values and multiplying a value with a constant returns the value multiplied by the constant. 

This seems like how you would think it would work, so why isn’t it?

Monitors

Monitors do not behave linearly when converting voltage to light. A monitor follows closer to an exponential curve when converting the pixel value. How this curve looks is determined by the monitor’s gamma exponent. The standard gamma for a monitor is 2.2, this means that a pixel with 100 percent intensity emit 100 percent light but a pixel with 50 percent intensity only outputs 21 percent light. To get the pixel to emit 50 percent light the intensity has to be 73 percent.

The goal is to get the monitor to output linearly so that 50 percent intensity equals 50 percent light emitted.

 Gamma correction

Gamma correction is the process of converting one intensity to another intensity which generates the correct amount of light.
The relationship between intensity and light for a monitor can be simplified as an exponential function called gamma decoding.



To cancel out the effect of gamma decoding the value has to be converted using the inverse of this function.
Inversing an exponential function is the inverse of the exponent. The inverse function is called gamma encoding.




Applying the gamma encoding to the intensity makes the pixel emit the correct amount of light.

Lighting

Here are two images that use simple Lambertian lighting (N * L) .

(a) Lighting performed in gamma space
(b) Lighting performed in linear space
The left image has a really soft falloff which doesn’t look realistic. When the angle between the normal and light source is 60 degrees the brightness should be 50 percent.  The image on the left is far too dim to match that. Applying a constant brightness to the image would make the highlight too bright and not fix the really dark parts. The correct way to make the monitor display the image correctly is by applying gamma encoding it. 

 (a) Lighting and texturing in gamma space
(b) Lighting done in linear space with standard texturing
(c) The source texture

Using textures introduces the next big problem with gamma correction. In the left image the color of the texture looks correct but the lighting is too dim. The right image is corrected and the lighting looks correct but the texture, and the whole image, is washed out and desaturated. The goal is to keep the colors from the texture and combining it with the correct looking lighting.

Pre-encoded images

Pictures taken with a camera or paintings made in Photoshop are all stored in a gamma encoded format. Since the image is stored as encoded the monitor can display it directly. The gamma decoding of the monitor cancels out the encoding of the image and linear brightness gets displayed. This saves the step of having to encode the image in real time before displaying it. 
The second reason for encoding images is based on how humans perceive light. Human vision is more sensitive to differences in shaded areas than in bright areas. Applying gamma encoding expands the dark areas and compresses the highlights which results in more bits being used for darkness than brightness. A normal photo would require 12 bits to be saved in linear space compared to the 8 bits used when stored in gamma space. Images are encoded with the sRGB format which uses a gamma of 2.2.

Images are stored in gamma space but lighting works in linear space, so the image needs to be converted to linear space when they are loaded into the shader. If they are not converted correctly there will be artifacts from mixing the two different lighting spaces. The converstion to linear space is done by applying the gamma decoding function to the texture.



      (a) All calculations have been made in gamma space 
        (b) Correct texture and lighting, texture decoded to linear space and then all calculations are done before encoding to gamma space again

Mixing light spaces

Gamma correction a term is used to describe two different operations, gamma encoding and decoding. When learning about gamma correction it can be confusing because word is used to describe both operations.
Correct results are only achieved if both the texture input is decoded and then the final color is encoded. If only one of the operations is used the displayed image will look worse than if none of them are.



     (a) No gamma correction, the lighting looks incorrect but the texture looks correct. 
(b) Gamma encoding of the output only, the lighting looks correct but the textures becomes washed out
(c)  Gamma decoding only, the texture is much darker and the lighting is incorrect. 
(d) Gamma decoding of texture and gamma encoding of the output, the lighting and the texture looks correct.

Implementation

Implementing gamma correction is easy. Converting an image to linear space is done by appling the gamma decoding function. The alpha channel should not be decoded, as it is already stored in linear space.

// Correct but expensive way
vec3 linear_color = pow(texture(encoded_diffuse,  uv).rgb, 2.2);
// Cheap way by using power of 2 instead
vec3 encoded_color = texture(encoded_diffuse,  uv).rgb;
vec3 linear_color = encoded_color * encoded_color;

Any hardware with DirectX 10 or OpenGL 3.0 support can use the sRGB texture format. This format allows the hardware to perform the decoding automatically and return the data as linear. The automatic sRGB correction is free and give the benefit of doing the conversion before texture filtering.
To use the sRGB format in OpenGL just pass GL_SRGB_EXT instead of GL_RGB to glTexImage2D as the format.

After doing all calculations and post-processing the final color should then to be correct by applying gamma encoding with a gamma that matches the gamma of the monitor.

vec3 encoded_output = pow(final_linear_color, 1.0 / monitor_gamma);

For most monitors a gamma of 2.2 would work fine. To get the best result the game should let the player select gamma from a calibration chart.
This value is not the same gamma value that is used to decode the textures. All textures are be stored at a gamma of 2.2 but that is not true for monitors, they usually have a gamma ranging from 2.0 to 2.5.

When not to use gamma decoding

Not every type of texture is stored as gamma encoded. Only the texture types that are encoded should get decoded. A rule of thumb is that if the texture represents some kind of color it is encoded and if the texture represents something mathematical it is not encoded. 
  • Diffuse, specular and ambient occlusion textures all represent color modulation and need to be decoded on load 
  • Normal, displacement and alpha maps aren’t storing a color so the data they store is already linear

Summary

Working in linear space and making sure the monitor outputs light linearly is needed to get properly rendered images. It can be complicated to understand why this is needed but the fix is very simple.
  • When loading a gamma encoded image apply gamma decoding by raising the color to the power of 2.2, this converts the image to linear space 
  • After all calculations and post processing is done (the very last step) apply gamma encoding to the color by raising it to the inverse of the gamma of the monitor

If both of these steps are followed the result will look correct.

References