« Mac OpenGL contexts | Main | Fragment Program Snippets »
May 10, 2005
Fragment Programs-ATI R3xx vs NVidia NV34
ATI and NVidia cards have some differences it's important to know about when you're writing fragment programs to run on both.
The ATI cards have a limit of four levels of texture indirection.
A texture indirection is a TEX instruction with texture coordinates that are calculated, not just constant or passed in by a glMultiTex call. In practice, a 4 level limit means you can have up to four texture reads in a chain using coordinates from a previous instruction.
TEX firstValue, fragment.texcoord[0], texture[0]; # First indirection
TEX secondValue, firstValue, texture[0]; # Second indirection
TEX thirdValue, secondValue, texture[0]; # Third indirection
TEX fourthValue, thirdValue, texture[0]; # Fourth indirection
The spec to ARB_fragment_program explains in more detail what an indirection is.
When we hit the limit on indirections, we'd see a black output. We'd either try to rewrite to avoid the indirections, or break the shader into multiple passes.
RCP on zero gives different results.
ATI cards return a very large number when you divide by zero. NVidia cards give a plus or minus infinity NaN result, which infects subsequent calculations.
We preferred ATI's behavior, it wasn't mathematically correct, but was usually what we wanted. To emulate that on the NV34 we had two approaches:
- Detect the zero denominator before the operation, and then replace the result with something sensible:
ABS value.x, denominator.x;
SUB value.x, 0, value.x;
RCP reciprocal.x, denominator.x;
CMP reciprocal.x, value.x, reciprocal.x, someValueYouWant.x;
- Don't try to detect the zero, just nudge all numbers up a bit. This only works for inputs you know will be positive (eg texture samples), and loses a little precision, but often this is not significant.
ADD denominator.x, denominator.x, 0.0000001;
Inexact texture coordinates.
It's been hard to get exactly one-to-one texel to pixel mapping on ATI cards, we want to use bilinear filtering in general but the sampling errors can introduce slight blurring to the image. NVidia cards show much smaller errors. Where possible, we switched to nearest neighbor sampling, we also considered griding up each large quad into smaller polys to minimise the errors.
The NV34 doesn't support a floating point pixel format.
For a high precision accumulation buffer, you can get 16bit fixed point precision by using two 8 bit buffers, and storing half of each 16 bit channel in one of the buffer channels.

Since you can only write to one pbuffer at a time, you need to run two passes. Here's a fragment program to implement the red/green pass:
!!ARBfp1.0
# Emulate a 16 bit accumulation buffer for red and green channels
# author - Pete Warden
ATTRIB tex0 = fragment.texcoord[0];
ATTRIB tex1 = fragment.texcoord[1];
PARAM unpackScale = { 255.0, 1.0, 255.0, 1.0 };
PARAM packScale = { 0.0039216, 1.0, 0.0039216, 1.0 };
TEMP s0, s1, s2;
# Fetch the texel to be added
TEX s0, tex0, texture[0], RECT;
# Fetch the current total from the first 8 bit buffer
TEX s1, tex0, texture[1], RECT;
# Multiply the upper half of each channel by 255
MUL s1, s1, unpackScale;
# Add on the lower half to the upper
ADD s1.r, s1.r, s1.g;
ADD s1.g, s1.b, s1.a;
# Do the accumulation of the current input onto the total
ADD s1,s0,s1;
# Put the lower half of each channels into the result
FRC s2.g, s1.r;
FRC s2.a, s1.g;
# Calculate the rounded upper half of both channels
SUB s2.r, s1.r, s2.g;
SUB s2.b, s1.g, s2.a;
# Compress the channels into a 0-1 range to store in the 8 bit buffer
MUL result.color, s2, packScale;
END
The NV34 only passes 4 texture coordinates through to the fragment program.
The solution is to write a vertex program that emulates the fixed-function pipeline, but passes through all 8 texture coordinates, as on ATI:
!!ARBvp1.0
# Passthru vertex program
ATTRIB vertexPosition = vertex.position;
OUTPUT outputPosition = result.position;
DP4 outputPosition.x, state.matrix.mvp.row[0], vertexPosition;
DP4 outputPosition.y, state.matrix.mvp.row[1], vertexPosition;
DP4 outputPosition.z, state.matrix.mvp.row[2], vertexPosition;
DP4 outputPosition.w, state.matrix.mvp.row[3], vertexPosition;
MOV result.color, vertex.color;
MOV result.texcoord, vertex.texcoord;
MOV result.texcoord[1], vertex.texcoord[1];
MOV result.texcoord[2], vertex.texcoord[2];
MOV result.texcoord[3], vertex.texcoord[3];
MOV result.texcoord[4], vertex.texcoord[4];
MOV result.texcoord[5], vertex.texcoord[5];
MOV result.texcoord[6], vertex.texcoord[6];
MOV result.texcoord[7], vertex.texcoord[7];
END
Posted by petewarden at May 10, 2005 05:03 PM