Photo by Mark, Vicki, Ellaura, and Mason
A good friend of mine just asked me “What are GPUs?”. It came up because she’s a great digital artist who’s getting into VR, and the general advice she gets is “Buy a PC with a video card that costs more than $350”. What makes that one component cost so much, why do we need them, and what do they do? To help answer that, I thought I’d try to give an overview aimed at non-engineers.
Graphics Processing Units were created to draw images, text, and geometry onto the screen. This means they’re designed very differently than the CPUs that run applications. CPUs need to be good at following very complex recipes of instructions so they can deal with all sorts of user inputs and switch between tasks rapidly. GPUs are much more specialized. They only need to do a limited range of things, but each job they’re given can involve touching millions of memory locations in one go.
To see the difference between the kind of programs that run on CPUs and GPUs, think about a CPU reading from a text box. The CPU will sit waiting for you to press a key, and as soon as you do it might need to look in a list to figure out if there’s an autocomplete entry, check the spelling, or move to the next box if you hit return. This is a complex set of instructions with a lot of decisions involved.
By contrast, a typical GPU task would be drawing an image on-screen. A picture that’s 1,000 pixels wide and high has a million elements, and drawing it means moving all of those into the screen buffer. That’s a lot more work than just waiting for a key press, but it also involves a lot fewer decisions since you just need to move a large number of pixels from one place to another.
The differences in the kinds of tasks that CPUs and GPUs need to do means that they’re designed in very different ways. CPUs are very flexible and able to do a lot of complicated tasks involving decision-making. GPUs are less adaptable but can operate on large numbers of elements at once, so they can perform many operations much faster.
The way GPUs achieve this flexibility is that they break their tasks into much smaller components that can be shared across a large set of many small processors running at once. Because the jobs they’re being asked to do are simpler than CPUs, it’s easy to automatically split them up like this. As an example you can imagine having hundreds of little processors, each of which is given a tile of an image to draw. By having them work in parallel, the whole picture can be drawn much faster.
The key advantage of GPUs is this scalability. They can’t do every job, but for the ones they can tackle, you essentially can just pack in more processors on the board to get faster performance. This is why video cards that are capable of handling the high resolutions and framerates you need for VR are more expensive, they have more (and individually faster) processors to handle those larger sizes as you go up in price. This scalability is harder to do on CPUs because it’s much trickier to break up the logic needed to run applications into smaller jobs.
This is a painfully simplified explanation I know, but I’m hoping to get across what makes GPUs fundamentally different from CPUs. If you have a task that involves a lot of computation but few decision points, then GPUs are set up to parallelize that job automatically. This is clearest in graphics, but also comes up as part of deep learning, where there are similar heavy-lifting requirements across millions of artificial neurons. As Moore’s Law continues to fade, leaving CPU speeds to languish, these sort of parallel approaches will become more and more attractive.
Pingback: CPU Vs GPU – Artifacting
Pete. Love your blog especially the deep learning with 8-bit arithmetic posts. Can you say something about the TensorFlow Processing Unit (TPU) which is receiving increasing attention in the media but not a lot of details?
Interesting read. The CPU/GPU separation made me think about the dorsal and ventral streams in the brain. https://en.wikipedia.org/wiki/Two-streams_hypothesis
this is a somewhat general question.. if i take a network that say takes 500×500 sized patches, how do i extend that network to take in say 1920×1080 sized patches. is there a formal procedure for this?