« Quick and Dirty Vectorization | Main | GPU Optical Flow »

May 10, 2005

Random Numbers in Fragment Programs

It's hard to write a fragment program that will calculate pseudo-random numbers. The usual random number algorithms need two things that we don't have on our GPU's:


State

A seed value is usually stored, and updated every time a random number is generated. There's no way to pass this between pixels, with a fragment program, there's no global storage that can be updated to hold that kind of state. This means every pixel has to calculate its value independently.

Logic Operations

Almost all generators rely on bitwise logic operations such as shifts and exclusive-or's in their implementation. These operations can't be done in the ARBFP instruction set, (though I hear there are some NVidia specific extensions that allow a limited number).

Alternatives

The best way around these problems is to off-load as much of the work as we can to the CPU. We can generate a small texture from a table of random values, and do texture reads to get random values. Since our reads must be based on values that are very non-random, such as linearly interpolated texture coordinates, we may need to combine together several levels of indirection to get random looking results over a large area, if we have a small table of random values.

Ken Perlin's classic approach to generating random values at grid points using a 256 entry table is a good example to look at. However, on ATI cards the multiple texture indirections needed can be a problem, since there's a limit of 5 levels in any program.

Do it all on the CPU

Another alternative is to create a larger 2D texture full of random values. This is perfect if the random values do no need to change from frame to frame, but if it does need to animate, the CPU cost of generating and uploading the texture may be a problem. Depending on what it's being used for, the texture may be smaller than the area it needs to cover, and use tiling. This will result in obvious patterns in many cases however, something which is a constant danger with most of these methods. Careful design and tweaking are needed to minimize such ugliness.

Do it all on the GPU

I did manage to create a fragment program that calculated usable random numbers from scratch, using only floating point. I enclose it below, though it failed in its main purpose. It works as intended on NVidia GPU's, because they have 32 bit floating point internal precision, but the 24 bit precision of ATI chips causes visible problems. Since the main goal of this was to avoid texture indirections on ATI cards, it's not very useful to me, but was fun to write, so I thought I should share:


!!ARBfp1.0
# Based on an algorithm described by Francois Grieu, sci.crypt, 5th February 2004
ATTRIB tex0 = fragment.texcoord[0];
PARAM outputMult = program.local[0];
PARAM bounds = program.local[1];
PARAM seed = program.local[2];
PARAM coordsOffset = { -100, 100, 0, 0 };
PARAM cMult = 0.0001002707309736288;
PARAM aSubtract = 0.2727272727272727;
PARAM coordMult0 = { 0.67676, 0.000058758, 0, 0 };
PARAM coordMult1 = { 0.0000696596, 0.797976, 0, 0 };
PARAM coordMult2 = { 0.587976, 0.0000233443, 0, 0 };

TEMP tableCoord, a, b, c, floorA, seedCoords;

ADD seedCoords, tex0, coordsOffset;

# gFastRngA = (((currentX*multX)/(currentY*multY))+
MUL tableCoord, seedCoords, coordMult0;
RCP tableCoord.y, tableCoord.y;
MUL a.x, tableCoord.x, tableCoord.y;

# (((height-currentY)*multX2)/((width-currentX)*multY2))+
SUB tableCoord, bounds, seedCoords;
MUL tableCoord, tableCoord, coordMult1;
RCP tableCoord.x, tableCoord.x;
MAD a.x, tableCoord.x, tableCoord.y, a.x;

# (((height-currentX)*multX3)/((width-currentY)*multY3)));
SUB tableCoord.x, bounds.y, seedCoords.x;
SUB tableCoord.y, bounds.x, seedCoords.y;
MUL tableCoord, tableCoord, coordMult2;
RCP tableCoord.y, tableCoord.y;
MAD a.x, tableCoord.x, tableCoord.y, a.x;

# gFastRngA = fmod(gFastRngA,1);
FRC a.x, a.x;
ADD a.x, a.x, seed;

MOV c.x, 0;
MOV b.x, 0;

# (gFastRngA += gFastRngC*(1./9973)+(3./11)-floor(gFastRngA))
FRC floorA.x, a.x;
SUB floorA.x, a.x, floorA.x;
SUB floorA.x, aSubtract.x, floorA.x;
ADD floorA.x, floorA.x, a.x;
MAD a.x, c.x, cMult.x, floorA.x;

# (gFastRngB += (gFastRngA *= gFastRngA))
MUL a.x, a.x, a.x;
ADD b.x, b.x, a.x;

# (gFastRngC += (gFastRngB -= floor(gFastRngB)))
FRC b.x, b.x;
ADD c.x, c.x, b.x;

# (gFastRngC -= floor(gFastRngC))
FRC c.x, c.x;

# (gFastRngA += gFastRngC*(1./9973)+(3./11)-floor(gFastRngA))
FRC floorA.x, a.x;
SUB floorA.x, a.x, floorA.x;
SUB floorA.x, aSubtract.x, floorA.x;
ADD floorA.x, floorA.x, a.x;
MAD a.x, c.x, cMult.x, floorA.x;

# (gFastRngB += (gFastRngA *= gFastRngA))
MUL a.x, a.x, a.x;
ADD b.x, b.x, a.x;

# (gFastRngC += (gFastRngB -= floor(gFastRngB)))
FRC b.x, b.x;
ADD c.x, c.x, b.x;

# (gFastRngC -= floor(gFastRngC))
FRC c.x, c.x;

# (gFastRngA += gFastRngC*(1./9973)+(3./11)-floor(gFastRngA))
FRC floorA.x, a.x;
SUB floorA.x, a.x, floorA.x;
SUB floorA.x, aSubtract.x, floorA.x;
ADD floorA.x, floorA.x, a.x;
MAD a.x, c.x, cMult.x, floorA.x;

# (gFastRngB += (gFastRngA *= gFastRngA))
MUL a.x, a.x, a.x;
ADD b.x, b.x, a.x;

# (gFastRngC += (gFastRngB -= floor(gFastRngB)))
FRC b.x, b.x;
ADD c.x, c.x, b.x;

# (gFastRngC -= floor(gFastRngC))
FRC c.x, c.x;

MOV result.color, c.x;
MOV result.color.a, 1;

END

Posted by petewarden at May 10, 2005 05:06 PM

Comments

Thanks Pete, very informative, and particularly I like the all floating point PRNG - that is really an elegant solution. As a (GP)GPU newbie I've only worked with Cg, so I tracked down Francis Grieu's notes on sci.crypt and implemented a Cg version. In case this is useful for anyone (forgive the sub-optimality of my code):

// PRNG on the GPU. Two pseudo random numbers per iteration (from the BA channels).
// Inspired by Pete Warden's fragment program for pseudo-random number generation.
// In turn based on an algorithm described by Francois Grieu, sci.crypt, 5th February 2004
// Alex Campbell. 7th August 2006
#define cMult 0.0001002707309736288
#define aSubtract 0.2727272727272727
float4 randGrieu(float2 coords : WPOS,
uniform samplerRECT ranTexture) : COLOR
{
float4 t = f4texRECT(ranTexture, coords);
float a = t.x + t.z*cMult + aSubtract - floor(t.x);
a *= a;
float b = t.y + a;
b -= floor(b);
float c = t.z + b;
c -= floor(c);
float d = c;
a += c*cMult + aSubtract - floor(a);
a *= a;
b += a;
b -= floor(b);
c += b;
c -= floor(c); return float4(a,b,c,d);
}

Visually this works fine for my chosen application (a probabilistic, spatially explicit implementation of the rock-scissors-paper game). But as I'm ultimately interested in scientific computing it would be good to put it through some random number generation tests sometime. And as you say it would be nice if it worked on ATI cards.

Alex

Posted by: Alex Campbell at August 7, 2006 12:07 AM

Post a comment




Remember Me?