« Mac PBuffers | Main | Fragment Program Utilities »
May 18, 2005
Fragment Program Reference
The OpenGL Extensions Guide has a great chapter on the ARB fragment program language, I recommend buying it, but there aren't many good references online. The most useful is the official spec, but it's designed as an exhaustive guide, not a quick reference for programmers. Here's a rundown of the instruction set, and some tips and tricks.
Here's a cut-out-and-keep table of all instructions, based on table X.5 in the spec:
| Instruction | Inputs | Output | Description | Pseudocode | Notes |
|---|---|---|---|---|---|
| ABS | v | v | absolute value | fabs(arg1) | |
| ADD | v,v | v | add | arg1+arg2 | |
| CMP | v,v,v | v | compare | if (arg1<0) arg2 else arg3 | |
| COS | s | ssss | cosine | cos(arg1) | Synthesised using 5 instructions on ATI R3xx |
| DP3 | v,v | ssss | 3-component dot product | (arg1.x*arg2.x)+(arg1.y*arg2.y)+(arg1.z*arg2.z) | |
| DP4 | v,v | ssss | 4-component dot product | (arg1.x*arg2.x)+(arg1.y*arg2.y)+(arg1.z*arg2.z)+(arg1.w*arg2.w) | Emulated with 4 native instructions on ATI R3xx |
| DPH | v,v | ssss | homogeneous dot product | (arg1.x*arg2.x)+(arg1.y*arg2.y)+(arg1.z*arg2.z)+arg2.w | |
| DST | v,v | v | distance vector | Funky, see spec | |
| EX2 | s | ssss | exponential base 2 | pow(2,arg1) | Using negative args on NVidia gives different results to ATI |
| FLR | v | v | floor | floor(arg1) | |
| FRC | v | v | fraction | arg1-(int)arg1 | |
| KIL | v | v | kill fragment | if (arg1<0) return | |
| LG2 | s | ssss | logarithm base 2 | log(arg1) | |
| LIT | v | v | compute light coefficients | Funky, see spec | |
| LRP | v,v,v | v | linear interpolation | (arg2*arg1)+(arg3*(1-arg1)) | Order is lerpValue, end, start |
| MAD | v,v,v | v | multiply and add | (arg2*arg3)+arg4 | Really useful to reduce instruction counts |
| MAX | v,v | v | maximum | if (arg1<arg2) arg2 else arg1 | |
| MIN | v,v | v | minimum | if (arg1>arg2) arg2 else arg1 | |
| MOV | v | v | move | arg1 | |
| MUL | v,v | v | multiply | arg1*arg2 | |
| POW | s,s | ssss | exponentiate | pow(arg1,arg2) | |
| RCP | s | ssss | reciprocal | 1/arg1 | |
| RSQ | s | ssss | reciprocal square root | r1/sqrt(arg1) | |
| SCS | s | ss-- | sine/cosine | result.x=sin(arg1) result.y=cos(arg1) | Synthesised using multiple instructions on ATI R3xx |
| SGE | v,v | v | set on greater than or equal | if (arg1>=arg2) 1.0 else 0.0 | |
| SIN | s | ssss | sine | sin(arg1) | Synthesised using five instructions on ATI R3xx |
| SLT | v,v | v | set on less than | if (arg1<arg2) 1.0 else 0.0 | |
| SUB | v,v | v | subtract | arg1-arg2 | |
| SWZ | v | v | extended swizzle | Funky, see spec | Synthesised using multiple instructions on ATI R3xx |
| TEX | v,u,t | v | texture sample | Texture instructions are almost always the performance bottleneck | |
| TXB | v,u,t | v | texture sample with bias | ||
| TXP | v,u,t | v | texture sample with projection | ||
| XPD | v,v | v | cross product | [(arg1.y*arg2.z-arg1.z*arg2.y),(arg1.z*arg2.x-arg1.x*arg2.z),(arg1.x*arg2.y-arg1.y*arg2.x)] |
Always specify if you're only using some components in an instruction, the compilers aren't smart enough generally to figure out if you only use the .x component of the result later on, and both vendors' hardware has clever tricks they can play executing vector and scalar instructions in parallel.
Try and calculate everything you can using arithmetic instructions rather than doing table lookups from textures. Memory access almost always seems to be the limiting factor on the speed of our fragment programs, you've got a lot of free instruction slots that can be filled performing extra calculations while the hardware's waiting on memory.
ATI cards support simple swizzling, either where you're masking out some components in the result register (ADD foo.xy, bob, jim;) or where you're duplicating a single component across the whole register (ADD foo, bob, jim.x;)
Anything more complicated will be emulated using multiple instructions (ADD foo, bob.zyzy, jim; or ADD foo, bob.xxxy, jim;)
On ATI, using GL_TEXTURE_RECTANGLE_EXT textures in TEX instructions (RECT as the target) will generate hidden instructions to convert the coordinates to the 0 to 1 range, from the input range of 0 to
Posted by petewarden at May 18, 2005 04:28 PM