Is MySQL viable for data mining?

Photo by Aitor Escauriaza

I've been involved in an interesting Twitter conversation with Rafi Kam. I don't know anything about his background or plans, but he's obviously working on a data project. I was pleased to be able to point him to EC2 for Poets as a great introduction to Amazon's hosting, but this morning he asked "I'm concerned about nosql learning time and lack of simple querying. Can mysql be a viable back end for data mining?".

The quick answer is that MySQL and other traditional databases are absolutely viable for data mining. In most cases they're actually far superior to NoSQL solutions for anything that involves exploration and experimentation, simply because they have far more mature tools and documentation and a much more flexible interface.

My advice is to always start with a relational database when you're prototyping your product. NoSQL systems like Cassandra offer advantages once you're dealing with truly massive data sets, but relational databases will get you a long, long way. Once your queries start slowing down, that's the time to look at optimizing your database, whether it's by switching to a key/value solution, or more traditional approaches like heavier indexing or even vertically scaling by just buying a faster machine!

Now, NoSQL and the MapReduce approach to data processing are a lot of fun to play with, so I highly recommend learning more about them and using them in toy projects to get familiar with them, but unless the point of the project is to train yourself on the tools, start with something simpler.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: