Netezza shows there’s more than one way to handle Big Data

Photo by Nick Dimmock

As you may have noticed I'm a strong advocate of MapReduce, Hadoop and NoSQL, but I'm not blind to their limits. They're perfect for my needs primarily because they're dirt-cheap. There's no way on God's earth I could build my projects on enterprise systems like Oracle, the prices are just too high. The trade-off I make is that I spend a lot of my own time on learning and maintenance, whereas a lot of the cost of commercial systems reflects the support companies receive, so they don't need their own employees to geek out like that.

Another thread has been the relatively poor price/performance ratio of standard SQL databases when it comes to terabytes of data. That's where Netezza has been interesting, and today's announcement that they were being acquired by IBM highlighted how much success they've had with their unique approach.

The best way to describe Netezza's architecture was that they built the equivalent of graphics cards, but instead of being focused on rendering they worked at the disk controller level to implement common SQL operations as the data was being loaded from the underlying files, before it even reached the CPU. As a practical example, instead of passing every row and column from a table up to main system memory, their FPGA-based hardware has enough smarts to weed out the unneeded rows and cells and upload a much smaller set for the CPU to do more complex operations on. For more information, check out Curt Monash's much more in-depth analysis.

Why does this matter? This completely flies in the face of the trend towards throwing a Mongolian horde of cheap commodity servers at Big Data problems in the way that Google popularized. It also suggests that there might be another turn of the Wheel of Reincarnation about to get started. These happen when a common processing problem's requirements outstrip a typical CPU's horsepower, and the task is important enough that it becomes worthwhile to build specialized co-processors to handle it. CPU power then increases faster than the problem's requirements, and so eventually the hardware capability gets folded back into the main processor. The classic example from my generation is the Amiga-era graphics blitters for 2D work which got overwhelmed by software renderers in the 90's, but were then reincarnated as 3D graphics cards.

At the moment we're in the 'Everything on the CPU with standard hardware' trough of the wheel for Big Data processing, but with Google hinting at limits to its MapReduce usage, maybe we'll be specializing more over the next few years?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: