One of the challenges of trying to build a tool that does something useful with a corporation’s emails is finding a good data set to experiment on. No company is going to give a random developer access to all of their internal emails. That’s where Enron comes to the rescue. The Federal Energy Regulation Commission released over 16,000 emails to the public as part of its investigation into the 2001 energy crisis.
They’re theoretically available online, but through a database interface that seems designed to make it hard to access, and throws up server errors whenever I try to use it. Luckily, they do promise to send you full copies of their .pst databases through the postal system if you pay a fee. If only there were some kind of global electronic network that you could use to transmit files… I will check the license and try to make it available online myself if I can, once I receive the data.
I first became aware of this data through Trampoline Systems’s Enron Explorer, which demonstrates their email analysis using this data set. Since then, I also ran across a paper analyzing the human response times to emails that also builds on this information.