I'm sorry to say that I won't be releasing the Facebook data I'd hoped to share with the research community. In fact I've destroyed my own copies of the information, under threat of a lawsuit from Facebook.
As you can imagine I'm not very happy about this, especially since nobody ever alleged that my data gathering was outside the rules the web has operated by since crawlers existed. I followed their robots.txt directions, and was even helped by microformatting in the public profile pages. Literally hundreds of commercial search engines have followed the same path and have the same data. You can even pull identical information from Google's cache if you don't want to hit Facebook's servers. So why am I destroying the data? This area has never been litigated and I don't have enough money to be a test case.
Despite the bad taste left in my mouth by the legal pressure, I actually have some sympathy for Facebook. I put them on the spot by planning to release data they weren't aware was available. I know from my time at Apple that reaching for the lawyers is a tempting first option when there's a nasty surprise like that. If I had to do it all over again, I'd try harder not to catch them off-guard.
So what's the good news? From my conversations with technical folks at Facebook, there seems to be a real commitment to figuring out safeguards around the widespread availability of this data. They have a lot of interest in helping researchers find ways of doing worthwhile work without exposing private information.
To the many researchers I've disappointed, there's a whole world of similar data available from other sources too. By downloading the Google Profile crawling code you can build your own data set, and it's easy enough to build something similar for Twitter. I'm already in the middle of some new research based on public Buzz information, so this won't be stopping my work, and I still plan to share my source data with the research community in the future.