« Random Numbers in Fragment Programs | Main | Emulating Bilinear Filtering »
May 10, 2005
GPU Optical Flow
Optical flow analysis is heavily used by video and film processing apps, to retime images smoothly, apply motion blur as a post-process, and guide the application of other processes. It's traditionally a slow operation, I wanted to see if a GPU version was possible.
Optical flow analyses frames in a sequence, and trys to map how each pixel is moving. All techniques assume that there's some meaningful movement in the sequence to extract; sequences of unrelated images or random noise won't give useful results.
The results you want are easy to visualize, when we see a moving image, we can tell that there are moving objects there, and roughly how fast and in what direction they're moving. The useful way to represent that is as a vector field, an image where each pixel has a velocity instead of a color, showing what direction the pixel in the underlying image is moving.
Determining Optical Flow by Horn & Shunck lays out the classic method for figuring this out. I took it as a starting point for my implementation, since it seemed pretty simple and robust compared to more recent algorithms.
It took me some time to understand the basic ideas behind their algorithm, but once I did, they're remarkably elegant and straightforward. I'll try an informal explanation here to help anyone else whose eyes glaze over when confronted with a journal paper.
Start with the 1D case. You have a 1D wave that looks like this at time 0:

Here's the same wave at frame 1:

Visually, you can see that the line has moved to the left. If you subtract frame 0 from frame 1, you get this line:

This is just the difference between the first and second frames. The crucial step is to multiply this against the slope of the original wave, to get an estimate of the movement. This gives you this wave:

Why does this work? One way that I found to visualize it is to imagine you've got an infinite straight line, that's constantly sloping and sliding to the right over time. For any point that's on the line at the start of time, the line directly below it drops away at a constant speed:

The difference between two frames is like the distance from the point that was on the line, to where the line is now below it. That distance is determined by three variables; the slope of the line, the time that the line's been sliding, and how fast it's been sliding. Since we know the slope, and how long there's been between frames, we can divide those out and be left with the line's speed of movement.
This simple example should also highlight a couple of limitations of optical flow analysis:
- You need a slope or gradient in the image for the analysis to get a grip on. Flat areas of color in the center of objects can't reveal any movement.
- Movements need to be small, compared to the size of the object. The difference is applied to the gradient at each pixel, if that gradient is from a different object than in the first frame, it will give nonsense results. You need a slope that is roughly the same over the distance that the object moves between frames to get accurate results.
To extend the algorithm to two dimensions, do a difference on the two frames, and then multiply dy and dx gradient vectors for the slope of each pixel by the difference to get the optical flow estimate.
Here's some pseudo-code for the basic process:
(1) CurrentDifference = (Frame1[x][y]-Frame0[x][y]);
(2) GradientX = ((Frame1[x+1][y]-Frame1[x-1][y]) + (Frame0[x+1][y]-Frame0[x-1][y])) / 2.0;
(3) GradientY = ((Frame1[x][y+1]-Frame1[x][y-1]) + (Frame0[x][y+1]-Frame0[x][y-1])) / 2.0;
(4) GradientMagnitude = sqrt((GradientX*GradientX)+(GradientY*GradientY));
(5) VelocityX = CurrentDifference*(GradientX/GradientMagnitude);
(6) VelocityY = CurrentDifference*(GradientY/GradientMagnitude);
This is the heart of the technique. To improve the accuracy of the results, Horn and Schunck do this repeatedly, in order to eliminate spurious errors introduced by noise and to propagate movement across areas of flat color.
They use a couple of ways to do this. The algorithm likes long smooth slopes, and gets confused by short steep ones, and short steep ones are more likely to be the result of high-frequency noise than genuine objects in the sequence. To damp down the effect, a term usually called lambda is added to line 4:
(4) GradientMagnitude = sqrt((GradientX*GradientX)+(GradientY*GradientY)+Lambda);
Since the gradient is divided by this magnitude result, large values of lambda mean that steep gradients give lower velocities than gentle slopes. In practice, it's a noise filter, low values give you a very sensitive but noisy result, high values filter out the noise, but miss small objects.
The other method tries to smooth out the result, removing isolated small areas that are moving in a different direction to their neighbors, and also filling in areas with no gradients with movement data from nearby. This is where the iteration comes in; the vector field result is run through a blur kernel, and then the optical flow calculation is re-run, feeding the previous flow vector back in. This is how it looks in the code:
...
(3.1) PreviousVector = PreviousResultBlurred[x][y];
(3.2) PreviousDotGradient = (PreviousVector.x*GradientX)+(PreviousVector.y*GradientY);
...
(5) VelocityX = PreviousVector.x + (CurrentDifference+PreviousDotGradient)*(GradientX/GradientMagnitude);
(6) VelocityY = PreviousVector.y + (CurrentDifference+PreviousDotGradient)*(GradientY/GradientMagnitude);
Visually, the dot product is high when the vector is going in the same direction as the previous result (which, being blurred, represents the average direction in the local neighborhood) and is negative when the vector is going against the local flow. Iterating over this rewards vectors that are in step with their neighbors, and minimises the vectors that are marching to a different drummer:

To do the blurring, I just run a small gaussian kernel over the vector field between each iteration.
Once I understood the algorithm, it was not too hard to convert it over to a multi-pass sequence of fragment programs. In my implementation I take the two frames I'm interested in, and encode the luminance of each pixel into a 16 bit fixed point value, and store this value in two 8 bit channels in a 32 bit pbuffer, packing both images into a single surface.
Inside my iteration loop, I then run a fragment program that contains the gradient calculation and flow estimation algorithm explained in lines (1) to (6), with the core of it looking like this:
# left, right, bottom, top and centre have the luminance of the neighboring pixels of the first frame
# in the x component, and the luminance from the second in z
SUB left, right, left;
DP4 differential.x, left, {0.5, 0.0, 0.5, 0.0};
SUB bottom, top, bottom;
DP4 differential.y, bottom, {0.5, 0.0, 0.5, 0.0};
SUB differential.z, center.z, center.x;
# velocity.x and .y are the previous velocity for the pixel, both 0 first pass through
MOV velocity.z, 1.0;
DP3 flowError.x, differential, velocity;
MOV differential.z, lambda.x;
DP3 differentialMagnitude.x, differential, differential;
RCP differentialMagnitude.x, differentialMagnitude.x;
MUL differentialScale.x, flowError.x, differentialMagnitude.x;
MUL differentialScale.x, differentialScale.x, -1.0;
MAD velocity, differential, differentialScale.x, velocity;
I run two passes of a seperable gaussian blur convolution over the resulting vector field, one horizontal and one vertical, and then iterate over again. The lambda value I have between 0 and 1, depending on the image, and I normally run between 2 and 10 iterations, again depending on my needs and the image quality.
Posted by petewarden at May 10, 2005 06:02 PM