2015-10-24

Single Pass Omnidirectional Shadow Mapping

Implementation and evaluation

Layered rendering is possible since geometry shaders entered the OpenGL stage. The rough idea of a good omnidirectional shadow mapping (for example for pointlights) is to render the complete shadow map with a single draw pass, in order to reduce the amount of rendercalls, compared to six-pass rendering with traditional rendering to a cubemap. Therefore, the geometry shader emits a vertex for each incoming vertex of the non-culled scene geometry to a face of the cubemap with something called layered rendering. While the idea is well described all over the internet already, every now and then, there is discussion about the efficiency of this method.

My first idea was to evaluate omnidirectional shadow mapping with dual paraboloid shadow maps since it's the easiest way to achieve pointlight shadow: No viewmatrices, no projectionmatrices, two textures per pointlight, let's go. Without view frustum culling, I draw the complete geometry twice and passed a "backside" flag as a uniform variable for the second rendering. No layered rendering, just two depth buffers and two color attachments (can be ommited if no variance shadow mapping is used) for each pointlight (front and back). It's possible to do dpsm with a single-pass rendering in a single texture - I doubt it to be efficient because you would have to handle the depth buffer somehow. Or it will be efficient, but much more complicated.

Rendering a single (two texture) shadowmap for the famous sponza atrium takes ~1.5 ms on my GTX 770.

Layered rendering however, took me a while for the implementation. Because I use array textures in order to be able to use many shadow mapped lights, I had to fight with the strange array indices for cubemap arrays. For everyone who is interested, here's how you create a cubemap array rendertarget and use it to draw your shadow maps:

 framebufferLocation = GL30.glGenFramebuffers();
 // Create rendertarget
 GL30.glBindFramebuffer(GL30.GL_FRAMEBUFFER, framebufferLocation);
 IntBuffer scratchBuffer = BufferUtils.createIntBuffer(colorBufferCount);
 for (int i = 0; i < colorBufferCount; i++) {
      GL32.glFramebufferTexture(GL30.GL_FRAMEBUFFER, GL30.GL_COLOR_ATTACHMENT0 + i, cubeMapArrays.get(i).getTextureID(), 0);
      scratchBuffer.put(i, GL30.GL_COLOR_ATTACHMENT0+i);
 }
 // Use glDrawBuffers(GL_NONE) if you don't need color attachments but depth only
 GL20.glDrawBuffers(scratchBuffer);
 CubeMapArray depthCubeMapArray = new CubeMapArray(AppContext.getInstance().getRenderer(), 1, GL11.GL_LINEAR, GL14.GL_DEPTH_COMPONENT24);
 int depthCubeMapArrayId = depthCubeMapArray.getTextureID();
 GL32.glFramebufferTexture(GL30.GL_FRAMEBUFFER, GL30.GL_DEPTH_ATTACHMENT, depthCubeMapArrayId, 0);

Although some implementations are hidden because of my framework, the idea should be clear. Don't forget to initialize the cubemap array texture correctly or your framebuffer object won't be complete.

Since the geometryshader decides which face to render to, you can pass the pointlight index as a uniform variable. The layer will then be gl_Layer = 6*lightIndex + layer.

The quality is very nice, but unfortunetly, rendering sponza into the cubemap takes ~8.3ms on my GTX 770. The additional color attachment is not responsible for the expensiveness, since I tried to remove it. I'm pretty sure that I didn't do a major mistake in the implementation. It has to be the poor performance of the geometry shader that is responsible for the high amount of time this method takes.

Conclusion

Dual paraboloid shadow map rendering ~1.5 ms
Single pass cubemap shadow map rendering ~ 8.3 ms

I used 512*512 per dspm face or cubemap face. Would be nice to hear anyone alse's experience with omnidirectional shadowmapping.