The first thing I discovered when I looked over Facebook’s recent platform code release was a security flaw that lets you run malicious Javascript through applications, bypassing their security, but I won’t be blogging any details until the team has implemented a fix.
When they recently released some of their platform code as open source (link seems to be temporarily down, but you can download the source directly here) it led to a lot of discussion on the strategic significance of the move, aimed at keeping Facebook’s lead in the application space against competitors like OpenSocial, and on the implications of the unusual CPAL license chosen.
I’m much more interested in the technical lessons you can learn about Facebook’s code and architecture from the source. From looking through it, I’m confident this is drawn from their actual production code, so it’s a rare glimpse inside the implementation of a web application battle-tested with millions of users. I’ve uploaded a version with an Xcode project for easy browsing if you want to explore for yourself on a Mac.
There’s a disappointing lack of swearing in the comments, though I did find one "omg this is so retarded" in typeaheadpro.js. With that fun out of the way, a good place to start after the main README is to search for "FBOPEN:" in all files, since this brings up comments that were added to document the parts the developers thought would be interesting to users of the open version.
Examining the basic structure confirms that Facebook are still basically a LAMP shop. The only part that I wondered about was the M of Mysql, since that’s traditionally been tough to scale., but all of the database access here is through raw SQL strings. They’re known for their use of memcache to speed up data fetching, but there’s no sign of it in the code they’ve released. I was hoping for some heavy-weight examples of how to handle snooping on updates to invalidate memcache entries, but no such luck. They do have an interesting pattern of assembling their query strings using printf style format strings and varargs, rather than directly appending, which results in cleaner-looking code. If you want to look at the implementation, that’s in lib/core/mysql.php.
One component I hadn’t seen before was Thrift, Facebook’s open source framework for building cross-language APIs and data structures. It takes an interface definition file, and then creates a lot of the glue you need to implement the methods and data structures in PHP, Java, C++, Ruby and Erlang. I was interested because I’ve found I need a lowest-common-denominator data definition and code generation framework as I end up bouncing between C++, PHP and SQL tables. They don’t address the database storage side, which I hit problems with too since some basic data structures like lists inside structures don’t translate into a relational database unambiguously.
They look like they hit similar illegal character problems to my XML parsing woes, since they’ve got a call to iconv(‘utf-8’, ‘utf-8//IGNORE’, $str) that they use to sanitize their input in strings.php.