2019-02-24

Multivolume voxel cone tracing

Voxel cone tracing, but through multiple volumes

Another feature experiment I nearly forgot to show you is voxel cone tracing with multiple voxel volumes. There are different approaches to support large scenes with voxel cone tracing:

* Use a single volume texture and scale it to the scene's bounds. This increases a single voxel's world extents, which leads to coarser lighting and even more light leaking.
* Use sparse octree voxelization with a buffer instead of a volume texture. This is a way more complicated implementation. Additionally, the performance hit for lighting evaluation is quite big, compared to hardware-filtered volume texel fetching.
* Use cascaded voxel cone tracing, a similar approach to cascaded shadow maps. Revoxelization of the whole scene (or what's visible for the player) is quite demanding - implementations that only revoxelize objects on the border of the cascades are way more complex than the traditional, non-cascaded approach. Not using such an approach and revoxelizing everything every frame leads to flickering in the voxelization step, which can't be eliminated complettely due to the "binary" nature" of voxels (or at least I didn't manage to achieve it).

My implementation goes a different way, that doesn't suffer from the above problems, by introducing world space voxel volumes. Instead of a single big one, there are many smaller ones. There are many advantages now:

* Not all voxel volumes have to be updated every frame - one can update the nearest n volumes per frame, depending on the given hardware specs.
* There can be higher resolutions where needed and less resolution where coarse illumination is sufficient.
* Since everything is in world space, no flickering on revoxelization - at least when materials change. For dynamic objects, one still has to do some tricks or use temporal filtering with multiple bounces or sth.
* Theoretically, the voxel data could be precalculated and streamed in.

I put a sized list of VoxelGrid entries into a generic buffer that my avaluation compute shaders can read. My VoxelGrid data structure is as simple as the following.

struct VoxelGrid {
   int albedoGrid;
   int normalGrid;
   int grid;
   int grid2;
   int resolution;
   int resolutionHalf;
   int dummy2;
   int dummy3;
   mat4 projectionMatrix;
   vec3 position;
   float scale;
   uvec2 albedoGridHandle;
   uvec2 normalGridHandle;
   uvec2 gridHandle;
   uvec2 grid2Handle;
 };

As mentioned in an earlier post, my volumes use a kind of deferred rendering to cache geometry properties and onyl recaclulate lighting information when necessary, hence the need for 4 texture attachments - one for albedo, one for normals, and two for multiple bounces of gi.

The resolution (and the helper resolutionHalf) determine the resolution of the volume texture. This is needed, because the size of a volume can be arbitrary, while the resolution is fixed at some time, leading to arbitrary world space sizes for a single voxel.

Besides a little bit of padding, I also save the projection matrix that is used to voxelize objects into this volume. This isn't needed during evaluation, but I wanted to use a single data structure for both steps of the pipeline.

Since the texture ids don't give you much when using multiple volumes any more (you don't want to bind anything anymore...), those can be erased by now. What I use is bindless handles for everything, hence the long texture handles for the said four textures, passed in as uvec2 data types.

Tracing

Now the interesting part. When only a few volumes are used, let's say 5-10 or something, the tracing can easily be implemented as brute force iteration over an array. I don't think more volumes are practical, as each volume needs a lot of memory, and there comes the point where classic sparse voxel octrees are simply more efficient.

When implementing the tracing, I realized, that I want to favour higher resolution volumes when volumes overlap. Besides that, the tracing is quite simple: Take the gbuffer's world space position and trace diffuse or/and specular lighting in as many directions as you like. The sampling diameter increases with distance and determines the mipmap level to sample from.

    vec4 accum = vec4(0.0);
    float alpha = 0;
    float dist = 0;
    vec3 samplePos = origin;// + dir;

    while (dist <= maxDist && alpha < 1.0)
    {
        float minScale = 100000.0;
        int canditateIndex = -1;
        VoxelGrid voxelGrid;
        for(int voxelGridIndex = 0; voxelGridIndex < voxelGridArray.size; voxelGridIndex++) {
            VoxelGrid candidate = voxelGridArray.voxelGrids[voxelGridIndex];
            if(isInsideVoxelGrid(samplePos, candidate) && candidate.scale < minScale) {
                canditateIndex = voxelGridIndex;
                minScale = candidate.scale;
                voxelGrid = candidate;
            }
        }

        float minVoxelDiameter = 0.25f*voxelGrid.scale;
        float minVoxelDiameterInv = 1.0/minVoxelDiameter;
        vec4 ambientLightColor = vec4(0.);
        float diameter = max(minVoxelDiameter, 2 * coneRatio * (1+dist));
        float increment = diameter;

        if(canditateIndex != -1) {
            sampler3D grid;
            // sample grid here
        }

        dist += increment;
        samplePos = origin + dir * dist;
        increment *= 1.25f;
    }
 return vec4(accum.rgb, alpha);

The results are quite nice, with mixed resolutions and sizes of volumes. Here's an example of a transition between a fine and a coarse volume:

coarse and fine voxel volume side by side

Unfortunately, the performance of my tracing is not that good. It's remarkably slower than the single volume tracing, and I'm not too certain why this is the case. My test scene contained 4 volumes and decreased performance below 30 fps on my GTX 1060, to it's not capable of being realtime anymore. And I'm talking about avaluation only - no voxelization done here.

This again leads me to the conclusion, that voxel cone tracing is just too heavy on resources to be practical. I got the idea of using binary voxelization and only use voxels for occlusion and soft shadows, but evaluate global illumination from precomputed volumes. Much cheaper, no synchronization between voxelization threads and in general just a lot cheaper.