A better source of Enron’s emails in PSTs

Seriousmining

Photo by Irish Typepad

Dr John Wang [Update- sorry, wrong John Wang!] has just started a new site called EnronData.org, dedicated to developing and refining the Enron email dataset. It’s off to a cracking start, offering all the Enron emails as 148 PST files, one for each ‘custodian’ (informally each mail user). I did my own PST conversion, but it was primarily so I had a large data set to load onto an Exchange server and test Mailana against. John’s version is much closer to the original source data, and so will be more of a real-world test for applications.

I’m really pleased John has put this together, it will be a boon to anyone looking at doing heavy-duty email data-mining. I can’t wait to see what else the project produces.

One response

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: