Background of Shaders
One particular facet of modern graphics development that is often a pain – even for AAA games — is shader variants!
If you have bought an AAA game in recent years and wondered what the heck it is doing when it says it is compiling shaders for a long time (up to an hour or more for some recent PC titles on slower machines!), then this blog will explain it a little.
Modern graphics APIs (Vulkan, D3D12, Metal) like to know about everything that has to do with GPU state, up front. A large chunk of the GPU state is provided by so-called shader programs. These shader programs fill in various gaps in the graphics pipeline that used to be provided by fixed-function hardware back in the days of OpenGL 1.x.
As OpenGL (and DirectX) evolved, people wanted to do a wider range of things when processing vertices into colorful pixels on-screen. So, over time, the fixed function silicon on GPUs has gradually been replaced by more and more general purpose processors. As with CPUs, we now need to tell these processors what to do by writing small (sometimes larger), specialized programs called shader programs.
In OpenGL, we would write our shaders in the high-level GLSL language and feed that to the OpenGL driver as a string at runtime. The OpenGL driver would then compile the GLSL to GPU machine code and we could then throw big piles of vertices and other resources like textures at it and marvel at the results — or, more likely, swear a bit and wonder why we are staring at a black window yet again.
The necessity of including a complete compiler in the graphics driver was a huge burden for each of the GPU vendors, resulting in a great deal of overhead for them. It also led to some strange problems for developers when running code on a new platform with a different GLSL compiler in the driver and hitting new and different bugs or shortcomings.
With the advent of modern graphics APIs, there has been a move toward consuming shader code in the form of a bytecode intermediate representation, such as SPIR-V. SPIR-V is still not the final form of executable code required by the GPU silicon but it is much closer to it than GLSL and means the Vulkan drivers no longer need the entire compiler front-end.
Tooling, such as nSight and RenderDoc, are able to decompile the SPIR-V shader code back to GLSL (or HLSL) to make it easier for you to debug your applications.
The conversion from GLSL (or any other suitable language) to SPIR-V can still happen at runtime if that’s what you need — for example, in dynamic editor tools. However, for constrained applications, we can now compile the GLSL to SPIR-V up front at build time.
That’s nice! We can simply add a few targets to our CMakeLists.txt and go home, right? Well, not quite.
The Need for Shader Variants
You see, shader developers are just as lazy as any other kinds of developers and like to reduce the amount of copy/paste coding that we have to do. So, we add optional features to our shaders that can be compiled in or out by way of pre-processor #defines, just as with C/C++.
Why is this even needed, though? Well, we don’t always have full control over the data that our application will be fed. Imagine a generic glTF file viewer application. Some models that get loaded will use textures for the materials and include texture coordinates in the model’s vertex data. Other models may just use vertex colors, completely leaving out texture coordinates.
To handle this, our vertex shader’s prologue may look something like this:
layout(location = 0) in vec3 vertexPosition;
layout(location = 1) in vec3 vertexNormal;
#ifdef TEXCOORD_0_ENABLED
layout(location = 2) in vec2 vertexTexCoord;
#endif
layout(location = 0) out vec3 normal;
#ifdef TEXCOORD_0_ENABLED
layout(location = 1) out vec2 texCoord;
#endif
Then, in the main() function, we would have:
void main()
{
#ifdef TEXCOORD_0_ENABLED
texCoord = vertexTexCoord;
#endif
normal = normalize((camera.view * entity.model[gl_InstanceIndex] * vec4(vertexNormal, 0.0)).xyz);
gl_Position = camera.projection * camera.view * entity.model[gl_InstanceIndex] * vec4(vertexPosition, 1.0);
}
The fragment shader would have similar changes to handle the cases with and without texture coordinates.
Super, so we have one set of shader source files that can handle both models with textures and models without textures. How do we compile the shaders to get these shader variants?
Just as with C/C++ we have a compiler toolchain and, similarly, we invoke the compiler with the various -D options as needed, e.g.:
glslangValidator -o material-with-uvs.vert.spirv -DTEXCOORD_0_ENABLED material.vert # With texture coords
glslangValidator -o material-without-uvs.vert.spirv material.vert # Without texture coords
Then, within our application, we can load the glTF model, inspect its data to see whether it uses textures, and then load the appropriate SPIR-V compiled shader.
Hooray! The job is done and we can go home now, right? Well, actually, no — the project manager just called to say we also need to handle models that include the alpha cut-off feature and models that don’t include it.
Alpha cut-off is a feature of glTF files by which any pixels determined to have an alpha value less than some specified threshold simply get discarded. This is often used to cut away the transparent parts of quads used to render leaves of plants.
Ok then — let’s simply repeat a process similar to that which we did for handling the presence, or absence, of texture coordinates.
The fragment shader implementation of alpha cut-off is trivial:
void main()
{
vec4 baseColor = ...;
#ifdef ALPHA_CUTOFF_ENABLED
if (baseColor.a < material.alphaCutoff)
discard;
#endif
...
fragColor = baseColor;
}
We can then add suitable CMake targets to compile with and without this option.
Of course, there’s a catch. We have a combinatorial explosion of feature combinations. This only gets worse when we add the next optional feature or optional features that have various settings we wish to set at compile time, such as the number of taps used when sampling from a texture to perform a Gaussian blur.
Clearly, we do not want to have to add several thousand combinations of features as CMake targets by hand! So, what can we do?
Exploring the Problem
Let’s consider the above combination of the texture coordinates and alpha cut-off features. Our table of features and compiler flags looks like this:
Tex Coord OffTex Coord On
Alpha Cut-off Off |
|
-DTEXCOORD_0_ENABLED |
Alpha Cut-off On |
-DALPHA_CUTOFF_ENABLED |
-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED |
Adding another option would add another dimension to this table. The above mentioned option of blur filter taps with, say, 3, 5, 7, or 9 taps would add a 3rd dimension to the table and increase the number of options by another factor of 4, for a total of 16 possible configurations of this one shader program.
Adding just a handful of features, we can see that it would be all too easy to end up with thousands of combinations of compiled shaders from the single set of GLSL files!
How can we solve this in a nice and extensible way?
It is easy enough to have nested loops to iterate over the available options for each of the specified axes of variations. But what if we don’t know all of the axes of variation up front? What if they vary from shader to shader? Not all shaders will care about alpha cut-off or blur filter taps, for example.
We can’t simply hard-wire a set number of nested loops to iterate over the combinations in our CMake files. We need something a bit more flexible and smarter.
Let’s think about the problem in a slightly different way.
To start with, let’s represent a given configuration of our option space by a vector of length N, where N is the number of options. For now, let’s set this to 3, for our options we have discussed:
- Texture Coordinates (Off or On)
- Alpha Cut-off (Off or On)
- Blur filter taps (3, 5, 7, or 9)
That is, we will have a vector like this:
[TexCoords Off, Alpha Cut-off Off, blur taps = 3]
To save some typing, let’s now replace the wordy description of each element with a number representing the index of the option for that axis of variation:
- Texture Coordinates: (0 = Off, 1 = On)
- Alpha Cut-off: (0 = Off, 1 = On)
- Blur filter taps: (0 = 3 taps, 1 = 5 taps, 2 = 7 taps, 3 = 9 taps)
With this scheme in place, our above option set will be:
[0, 0, 0]
And the vector representing texture coordinates on, no alpha cut-off, and 7 blur filter taps option will be:
[1, 0, 2]
How does this help us? Well, it allows us to succinctly represent any combination of options; but it’s even better than that. We can now easily go through the list of all possible combinations in a logical order. We begin by incrementing the final element of the vector over all possible values. Then we increment the previous element and repeat, like this:
[0, 0, 0]
[0, 0, 1]
[0, 0, 2]
[0, 0, 3]
[0, 1, 0]
[0, 1, 1]
[0, 1, 2]
[0, 1, 3]
[1, 0, 0]
[1, 0, 1]
[1, 0, 2]
[1, 0, 3]
[1, 1, 0]
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
Note that the total number of option combinations is just the product of the number of options in each dimension or axis of variation, e.g. 2x2x4 = 16 in this example.
The above sequence is exactly what we would get if we had 3 nested for-loops to iterate over the options at each level. How does this help us?
Well, looking at the above sequence of options vectors, you may well notice the similarity to plain old counting of numbers. For each “decimal place” (element in the vector), starting with the final or least significant digit, we go up through each of the available values. Then, we increment the next least significant digit and repeat.
The only difference to how we are used to counting in decimal (base 10), binary, octal, or hexadecimal is that the base of each digit is potentially different. The base for each digit is simply the number of options available for that axis of variation (e.g. the texture coordinates can only be on or off (base = 2)). It’s the same for the alpha cut-off. The blur taps option has a base of 4 (4 possible options).
We know how many combinations we need in total and we know that each combination can be represented by a vector that acts like a variable-base number. Therefore, if we can find a way to convert from a decimal number to the corresponding combination vector, we are in a good situation, as we will have converted a recursive approach (nested for-loops) into a flat linear approach. All we would need would be something like this pseudo-code:
for i = 0 to combination_count
option_vector = calculate_option_vector(i)
output_compiler_options(option_vector)
next i
So how do we do this?
A Solution
To convert a decimal number into a different base system is fairly easy. The process is described well at https://www.tutorialspoint.com/computer_logical_organization/number_system_conversion.htm, where they give an example of converting from decimal to binary.
All we have to do, in our case, is use a base that differs for each digit of our combination vector. However, before we show this, we need a way to specify the options for each shader that we wish to consider. We have done this by way of a simple JSON file, for now. Here is an example showing our above case for these options as applied to the fragment shader, but only the texture coordinates and alpha cut-off for the vertex shader. This is just an example for illustration. In reality, the vertex shader has nothing to do with alpha cut-off and our simple shaders do not do anything with the blur tap option at all:
{
"options": [
{
"name": "hasTexCoords",
"define": "TEXCOORD_0_ENABLED"
},
{
"name": "enableAlphaCutoff",
"define": "ALPHA_CUTOFF_ENABLED"
},
{
"name": "taps",
"define": "BLUR_TAPS",
"values": [3, 5, 7, 9]
}
],
"shaders": [
{
"filename": "materials.vert",
"options": [0, 1]
},
{
"filename": "materials.frag",
"options": [0, 1, 2]
}
]
}
The default in our system, if no explicit options are provided in the JSON file, is defined (on) or not defined (off).
Each input shader file section then specifies which of the options it cares about. So, in this example, the fragment shader considers all 3 options and will have 16 variants compiled.
In order to generate the possible build combinations, we have written a small Ruby script to implement the necessary logic. Why Ruby? Because I couldn’t face trying to do the necessary math in CMake’s scripting language and Ruby is lovely!
The core of the script that implements the decimal to a variable-base number (combination vector) is pretty simple:
def calculate_digits(bases, index)
digits = Array.new(bases.size, 0)
base_index = digits.size - 1
current_value = index
while current_value != 0
quotient, remainder = current_value.divmod(bases[base_index])
digits[base_index] = remainder
current_value = quotient
base_index -= 1
end
return digits
end
In the above code, the bases
argument is a vector representing the base of each digit in the final combination vector. Here, bases = [2, 2, 4]. We then loop over the decimal number, performing the divmod
operation at each step to find the value of each digit in our combination vector. When we have reduced the input decimal number to 0, we are done. This is exactly analogous to the decimal to binary conversion linked above but for variable base at each digit.
With the resulting combination vector in hand, it is simple for us to then look up the corresponding compiler -D option for that selection and output that into a JSON string. Here is an example of the output of running the ruby script against the above configuration file:
{
"variants": [
{
"input": "materials.vert",
"defines": "",
"output": "materials.vert.spv"
},
{
"input": "materials.vert",
"defines": "-DALPHA_CUTOFF_ENABLED",
"output": "materials_alpha_cutoff_enabled.vert.spv"
},
{
"input": "materials.vert",
"defines": "-DTEXCOORD_0_ENABLED",
"output": "materials_texcoord_0_enabled.vert.spv"
},
{
"input": "materials.vert",
"defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED",
"output": "materials_texcoord_0_enabled_alpha_cutoff_enabled.vert.spv"
},
{
"input": "materials.frag",
"defines": "-DBLUR_TAPS=3",
"output": "materials_blur_taps_3.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DBLUR_TAPS=5",
"output": "materials_blur_taps_5.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DBLUR_TAPS=7",
"output": "materials_blur_taps_7.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DBLUR_TAPS=9",
"output": "materials_blur_taps_9.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=3",
"output": "materials_alpha_cutoff_enabled_blur_taps_3.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=5",
"output": "materials_alpha_cutoff_enabled_blur_taps_5.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=7",
"output": "materials_alpha_cutoff_enabled_blur_taps_7.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=9",
"output": "materials_alpha_cutoff_enabled_blur_taps_9.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=3",
"output": "materials_texcoord_0_enabled_blur_taps_3.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=5",
"output": "materials_texcoord_0_enabled_blur_taps_5.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=7",
"output": "materials_texcoord_0_enabled_blur_taps_7.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=9",
"output": "materials_texcoord_0_enabled_blur_taps_9.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=3",
"output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_3.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=5",
"output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_5.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=7",
"output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_7.frag.spv"
},
{
"input": "materials.frag",
"defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=9",
"output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_9.frag.spv"
}
]
}
If you are interested, this is the full script:
require 'json'
require 'pp'
def expand_options(data)
# Expand the options so that if no explicit options are specified we default
# to options where the #define symbole is defined or not
data[:options].each do |option|
if !option.has_key?(:values)
option[:values] = [:nil, :defined]
end
option[:count] = option[:values].size
end
end
def extract_options(data, shader)
shader_options = Hash.new
shader_options[:options] = Array.new
shader[:options].each do |option_index|
shader_options[:options].push data[:options][option_index]
end
# STDERR.puts "Options for shader:"
# STDERR.puts shader_options
return shader_options
end
def find_bases(data)
bases = Array.new(data[:options].size)
(0..(data[:options].size - 1)).each do |index|
bases[index] = data[:options][index][:count]
end
return bases
end
def calculate_steps(bases)
step_count = bases[0]
(1..(bases.size - 1)).each do |index|
step_count *= bases[index]
end
return step_count
end
# Calculate the number for "index" in our variable-bases counting system
def calculate_digits(bases, index)
digits = Array.new(bases.size, 0)
base_index = digits.size - 1
current_value = index
while current_value != 0
quotient, remainder = current_value.divmod(bases[base_index])
digits[base_index] = remainder
current_value = quotient
base_index -= 1
end
return digits
end
def build_options_string(data, selected_options)
str = ""
selected_options.each_with_index do |selected_option, index|
# Don't add anything if option is disabled
next if selected_option == :nil
# If we have the special :defined option, then we add a -D option
if selected_option == :defined
str += " -D#{data[:options][index][:define]}"
else
str += " -D#{data[:options][index][:define]}=#{selected_option}"
end
end
return str.strip
end
def build_filename(shader, data, selected_options)
str = File.basename(shader[:filename], File.extname(shader[:filename]))
selected_options.each_with_index do |selected_option, index|
# Don't add anything if option is disabled
next if selected_option == :nil
# If we have the special :defined option, then we add a section for that option
if selected_option == :defined
str += "_#{data[:options][index][:define].downcase}"
else
str += "_#{data[:options][index][:define].downcase}_#{selected_option.to_s}"
end
end
str += File.extname(shader[:filename]) + ".spv"
return str
end
# Load the configuration data and expand default options
if ARGV.size != 1
puts "No filename specified."
puts " Usage: generate_shader_variants.rb "
exit(1)
end
variants_filename = ARGV[0]
file = File.read(variants_filename)
data = JSON.parse(file, { symbolize_names: true })
expand_options(data)
# Prepare a hash to output as json at the end
output_data = Hash.new
output_data[:variants] = Array.new
data[:shaders].each do |shader|
# STDERR.puts "Processing #{shader[:filename]}"
# Copy over the options referenced by this shader to a local hash that we can operate on
shader_options = extract_options(data, shader)
# Create a "digits" array we can use for counting. Each element (digit) in the array
# will correspond to an option in the loaded data configuration. The values each
# digit can take are those specified in the "values" array for that option.
#
# The number of steps we need to take to count from "0" to the maximum value is the
# product of the number of options for each "digit" (option).
bases = find_bases(shader_options)
# STDERR.puts "Bases = #{bases}"
step_count = calculate_steps(bases)
# STDERR.puts "There are #{step_count} combinations of options"
# Count up through out range of options
(0..(step_count - 1)).each do |index|
digits = calculate_digits(bases, index)
selected_options = Array.new(bases.size)
(0..(bases.size - 1)).each do |digit_index|
settings = data[:options][digit_index]
setting_index = digits[digit_index]
selected_options[digit_index] = settings[:values][setting_index]
end
# Construct the options to pass to glslangValidator
defines = build_options_string(shader_options, selected_options)
output_filename = build_filename(shader, shader_options, selected_options)
# STDERR.puts " Step #{index}: #{digits}, selected_options = #{selected_options}, defines = #{defines}, output_filename = #{output_filename}"
variant = { input: shader[:filename], defines: defines, output: output_filename }
output_data[:variants].push variant
end
# STDERR.puts ""
end
puts output_data.to_json
Integrating into the Build System
CMake is now able to read and parse JSON documents — a fact that I didn’t know at first. This means that we can quite conveniently ask our build system to execute our Ruby script as an external process at configure time, capture the JSON output as shown above, iterate over the generated combinations, and add a build target for each one.
The cut-down code for doing this is:
function(CompileShaderVariants target variants_filename)
# Run the helper script to generate json data for all configured shader variants
execute_process(
COMMAND ruby ${CMAKE_SOURCE_DIR}/generate_shader_variants.rb ${variants_filename}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE SHADER_VARIANTS
RESULT_VARIABLE SHADER_VARIANT_RESULT
)
if(NOT SHADER_VARIANT_RESULT EQUAL "0")
message(NOTICE ${SHADER_VARIANT_RESULT})
message(FATAL_ERROR "Failed to generate shader variant build targets for " ${variants_filename})
endif()
string(JSON VARIANT_COUNT LENGTH ${SHADER_VARIANTS} variants)
message(NOTICE "Generating " ${VARIANT_COUNT} " shader variants from " ${variants_filename})
# Adjust count as loop index goes from 0 to N
MATH(EXPR VARIANT_COUNT "${VARIANT_COUNT} - 1")
foreach(VARIANT_INDEX RANGE ${VARIANT_COUNT})
string(JSON CURRENT_INTPUT_FILENAME GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} input)
string(JSON CURRENT_OUTPUT_FILENAME GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} output)
string(JSON CURRENT_DEFINES GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} defines)
set(SHADER_TARGET_NAME "${target}_${CURRENT_OUTPUT_FILENAME}")
CompileShader(${SHADER_TARGET_NAME} ${CURRENT_INTPUT_FILENAME} ${CURRENT_OUTPUT_FILENAME} ${CURRENT_DEFINES})
endforeach(VARIANT_INDEX RANGE ${VARIANT_COUNT})
endfunction()
Here, CompileShader() call is another helper function that just invokes the glslangValidator GLSL->SPIR-V compiler with the specified options.
This nicely takes care of generating all of the required shader variants that will be compiled with correct dependencies on the source GLSL files. To ensure that the targets get updated if the input JSON configuration file changes, we can add the following snippet to the above function:
# Re-run cmake configure step if the variants file changes
set_property(
DIRECTORY
APPEND
PROPERTY CMAKE_CONFIGURE_DEPENDS ${variants_filename}
)
Now, if we edit the JSON configuration file that contains the options, CMake will automatically re-run and generate the targets.
On the C++ runtime side of things, we have some logic to construct the appropriate shader file name for the compiled SPIR-V shader matching the options needed by whatever model we are rendering.
In the future, we may make this part more reusable by making it read in the same JSON configuration file used to create the shader variants.
Wrapping Up
So, going back to where we started: how does all of this tie into your PC’s spending an hour compiling shaders when we have shown here how to compile them at application build time?
It all goes back to SPIR-V’s just being a bytecode intermediate representation. Before the GPU can execute these shaders, it needs to do a final compilation step to convert the SPIR-V to actual machine code. In a modern graphics API, this is done when we create a so-called “graphics pipeline.” At this point, we have to specify pretty much all GPU state, which then gets baked into a binary blob along with the shader code by the driver. This binary blob is both GPU-vendor and driver-version specific. So, it cannot be built at application build time but, rather, has to be done on the actual machine on which it will execute.
The first time you run such a game or other application, it will often loop through all of the shader variants and compile a graphics pipeline for each one. These then get cached to disk for use on subsequent runs. If you change your GPU or (more likely) the driver version, then this cache might get invalidated and you’d have to sit through this process once again.
For systems with known hardware and drivers, this whole process can be performed as part of the build step. This is why consoles such as the PlayStation 5 do not have to do this lengthy shader compiling step, while we wait there and watch.
There is some work going on in Khronos at present, in the shape of VK_ext_shader_object, to try to get back to a more dynamic-shader friendly way of doing things, in which the driver takes care of much of this compiling and caching for us. As with all things in computer science though, it will be a trade-off.
Thank you for reading about what turned out to be a nice little excursion of simplifying a problem by changing it from recursive to linear and learning about converting between numbers of different bases.
If you would like to learn more about modern 3D graphics or get some help on your own projects, then please get in touch.
About KDAB
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.
Sean Harmer is a senior software engineer at KDAB where he heads up our UK office and also leads the 3D R&D team. He has been developing with C++ and Qt since 1998 and is Qt 3D Maintainer and lead developer in the Qt Project. Sean has broad experience and a keen interest in scientific visualization and animation in OpenGL and Qt. He holds a PhD in Astrophysics along with a Masters in Mathematics and Astrophysics.
Thanks for the great writeup Sean!