Don’t repeat yourself with XML and SQL

Rascallyrepeatingrabbits
Photo by TW Collins

One key principle of Agile development is Don’t Repeat Yourself. If you’ve got one piece of data, make sure it’s only defined in one place in your code. That way it’s easy to change without either having to remember everywhere else you need to modify, or introducing bugs because your software’s inconsistent.

This gets really hard when you’re dealing with data flowing back and forth between XML and SQL. There’s a fundamental mismatch between a relational database that wants to store its information in columns and rows, and the tree structure of an XML document. Stylus do a great job describing the technical details of why XML is from Venus and SQL is from Mars, but the upshot is that it’s hard to find a common language that you can use to describe the data in both. A simple example is a list of the recipients of a particular email. The natural XML idiom would be something like this:

<message>
<snipped the other data>
<recipients>
<email>bob@bob.com</email>
<email>sue@sue.com</email>
</recipients>
</message>

But in mysql, you’re completely listless. To accommodate a variable length collection of items you need to set up a separate table that connects back to the owner of the data. In this case you might have a seperate ‘recipients’ table with rows that contained each address, together with some identifier that linked it with the original message held in another table. It’s issues like this that make a native XML database like MarkLogic very appealing if you’re mostly dealing with XML documents.

What I’d like to do is define my email message data model once, and then derive both the XML parsing and mysql interaction code from that. That would let me rapidly change the details of what’s stored without having to trawl through pages of boiler-plate code. I’m getting close, sticking to a simple subset of XML that’s very close to JSON, but defining a general way to translate lists of items back and forth is really tough.

I’m trying to avoid being an architecture astronaut, but it’s one of those problems that feels worth spending a little bit of upfront time on. It passes the "will this save me more time than it takes in the next four weeks?" code ROI test. I’d welcome any suggestions too, this feels like something that must have been solved many times before.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: