2015-02-17

Compute shader advices

Recently, I had a lot of pleasure with OpenGL's compute shaders. With this lot of pleasure came a lot of pain because I made some (rookie) mistakes. So I wanted to share my experience and some advices I have, just in case you have troubles too:


  • The first thing you should check are your texture formats! No, really, double check it, don't repeat my mistakes. In your compute shaders, you could use your images (not textures) with

    glBindImageTexture(unit, textureId, 0, false, 0, GL_WRITE_ONLY, GL30.GL_RGBA16F);

    of OpenGL version 4.2 as an output texture. Of course you could use GL_READ_ONLY or GL_READ_WRITE if you use the texture differently. Also keep in mind that this call binds an image, not a texture. And that's why you have to provide a mipmap level you want to attach. I used the wrong format once, namely rgba32f, which my rendertarget attachments didn't have, and it resulted in non existent output from my compute shader. Very frustrating but correct behaviour.
  • Keep in mind that you could use your regular textures via samplers in your compute shaders, too. Simply bind the texture and have a  similar line to this in your shader

    layout(binding = 1) uniform sampler2D normalMap;

    That's helpful if you want to access mip levels easily.
  • Since even in the OpenGL super bible is a typo that doesn't help to understand the compute shaders built-ins, I recapture them.
    With dispatchCompute you have to provide three variables that are your group counts. A compute shader pass is done by a large number of threads and defining clever group counts/sizes will help you to process your data. In graphics related cases, mostly you will need compute shaders to render to a texture target. So it would be clever to have a certain, two-dimensional amount of threads, wouldn't it? Define your group sizes corresponding to your image size: a 320*320 image could be devided into 10*10 groups, or tiles - and each will have 32*32 pixels in it. So you should define your group size as 32, 32, 1. Now you can dispatch 320/group size threads, which will be 10 groups, for x and y dimension. In your shader, you will be able to use the built-in gl_WorkGroupSize to have this information in every invocation of your shaders main method. To uniquely identify your invocation, you can use the gl_GlobalInvocationID. If you use your shader like I said in this example, this would contain your texel's position the invocation would have to write. And that's how you can use compute shaders to manipulate your textures. Additionally, there is a gl_WorkGroupID, that identifies your tile/group of the invoation, and gl_LocalInvocationID, that is your pixels position in its tile. Sometimes, it could be useful to use a flattened identifier - for example if you have a task that requires performing an action just 12 times, but has to be done in the compute shader - and therefore you can use gl_LocalInvocationIndex. You can use it as a conditional to limit some code paths like

    if(gl_LocalInvocationIndex < MAX_ITEMS) { processItem(); }

    For a better understanding, have a look at this post, which has a nice picture and another explanation of the group layout.

What else? Compute shaders are awesome! I like how easy it is to invoke them, independent of something like the graphics pipeline. Use compute shaders!