Fast GPU color transforms with XNA
Filed under: XNA
Some time ago I came across a nice Siggraph paper on Using Lookup Tables to Accelerate Color Transformations and since then have been interested in seeing it at work.
The basic premise is that a 3d texture is used as a lookup table where the input 3d coords represent the input colour. A nice and simple fragment shader is then used to access this lookup texture and return the transformed color. Any number of any complexity operations can be stored in the lookup texture as long as you’re mapping the single input pixel (that is, no neighbours). Gamma correction, contrast, saturation, colorisation, levels, colour keying etc, are all possible in any combination at a fixed small cost.
The basics
Here’s how I generate my lookup 3d texture.
int size = 32; //size of lookup texture Color[] colors = new Color[size * size * size]; Texture3D tex = new Texture3D(Device, size, size, size, 1, TextureUsage.Linear, SurfaceFormat.Color); for (int r = 0; r < size; r++) { for (int g = 0; g < size; g++) { for (int b = 0; b < size; b++) { Vector3 inCol = new Vector3((float)r / size, (float)g / size, (float)b / size); //Manipulate the input color in some way here Color col = new Color(inCol); colors[r + (g * size) + (b * size * size)] = col; } } } tex.SetData<Color>(colors);
Not rocket science, is it? We’ve just created a lookup table that simply maps output = input. Now, to display it. We have a pretty simple effect to do all out work for us. The only calculations it has to perform are the texel offsets explained in the article I linked. I’m assuming the reader knows enough about XNA to attach this effect so a SpriteBatch but even if not, you’ll find that in the sample code.
half lutSize = 32; //some default lookup table size, to be changed by app //Sampler for texture currently being drawn sampler2D tex1 : register(s0); //Sampler for lookup table texture3D cubeTex; sampler3D cube = sampler_state { Texture = <cubeTex>; //We really want triliniear filtering for this sort of thing MinFilter = linear; MagFilter = linear; MipFilter = linear; }; float4 Fragment(float4 incol : COLOR, float2 UV : TEXCOORD0) : COLOR0 { //Fetch input color float4 inCol = tex2D(tex1, UV); //Edge offset (see http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter24.html) half3 scale = (lutSize - 1.0) / lutSize; half3 offset = 1.0 / (2.0 * lutSize); //Transform float4 outCol = tex3D(cube, scale * inCol + offset); //Lerp between input and transformed in RGB space based on input vertex alpha return lerp(inCol, outCol, incol.a); } technique FastTransform { pass Pass1 { PixelShader = compile ps_2_0 Fragment(); } }
That’s it. After generating a lookup table and setting it on the effect, its ready to go. At the moment you’d see no difference because the lookup table outputs the input colour but you could try something simple like inCol*=2 to do a multiply2x.
That’s great, but to actually do something meaningful with this you’ll probably at least want to be able to convert to HSL and back. I’ve spent some time to write a rudimentary library for performing colour transforms, you’ll find it in the sample. Still, to see actual use from this technique you’ll probably want photoshop grade effects; that’s what I had in mind when I started this. I scurried about the net looking for various algorithms to perform some of the photoshop tasks, like gamma correction, exposure, levels. I gave up on that notion pretty quickly. Some are fairly complex algorithms that make the lookup table generation slow, and I plain didn’t want to spend hours reinventing the wheel. There’s a better way.
Generating the lookup table with image editing software
Use photoshop for photoshop grade filters, of course! Its quite simple. To start, we need a 2d representation of the identity lookup table. I whipped up a quick routine to do this and figure out the nicest image sizes for me.
public static Texture2D LutToTexture2D(GraphicsDevice Device, Texture3D lut) { //Calculate closest to square proportions for 2d table //We assume power-of-two sides, otherwise I don't know int size = lut.Width; int side1 = size * size; int side2 = size; while (side1 / 2 >= side2 * 2) { side1 /= 2; side2 *= 2; } //Dump 3d texture into 2d texture Color[] colors = new Color[size * size * size]; Texture2D tex = new Texture2D(Device, side1, side2, 1, TextureUsage.Linear, SurfaceFormat.Color); lut.GetData<Color>(colors); tex.SetData<Color>(colors); return tex; }
A 64px cube unfolds to a nice 512×512 image like this:

Once you have that, feel free to throw it into your favourite image editor and apply any colour transforms you want to it. For the results to be actually meaningful, I took a screenshot of my game, applied adjustment layers to make it look how I want, and then threw them on the lookup table. You can see my setup in the screenshot here.
Finally we load the modified lookup table back in by reversing the above process (see sample), and voila! Lightning fast photoshop grade filters.
Limitations
This cake is obviously not free. The lookup tables take up video memory. For compatibility reasons I use SurfaceFormat.Color to represent them which means that one channel (alpha) is wasted; which doesn’t help either. A 32px cube is negligible, a 64px cube takes up 1mb, a 128px cube takes up 8mb, and the full 256px deal takes up a whopping 64mb. What size should you be using? That depends on the complexity of the transformations you’re doing. The higher frequency the changes in your function are, the higher resolution table you’ll need. I find that 64 is acceptable for most stuff. 128 may be needed sometimes (once so far) to closely match photoshop. I find that 8mb is still acceptable today, just make sure that your effect warrants it.
The other limitation is that the procedure is pretty static. If you wanted to change some settings, you’d have to rebuild the table. One way around this is to create multiple tables with different settings and interpolate between them, but be careful to make sure that your vram expenditure and shader complexity (with the new interpolation which may not look nice in rgb space) are really worth using this for rather than just writing a shader to do the job traditionally.
Alternatively you could have a two step system where you execute a complex shader on the lookup table and use that to do the rest of the work. This would be nice where settings change every once in a while, and you have existing shader code to do the task.
So is this terribly useful at the end of the day? For the inept like me, yes! I like the idea of going into photoshop and making the look I want a lot more than writing fiddly shader code. Its pretty hard to picture this sort of math without actually seeing it at work.






