Why aren’t we using humans as robots?

Photo by Regolare

Yesterday I had lunch with Stan James of Lijit fame, and it was a blast. One of the topics that’s fascinated both of us is breaking down the walls that companies put up around your data. In the 90’s it was undocumented file formats and this decade it’s EULAs on web services like Facebook. The intent is to keep your data locked in to a service, so that you’ll remain a customer, but what’s interesting is that they don’t have any legal way of enforcing exactly that. Instead they forbid processing the data with automated scripts and giving out your account information to third-party services. It’s pretty simple to detect when somebody’s using a robot to walk your site, and so this is easy to enforce.

The approach I took with Google Hot Keys was to rely on users themselves to visit sites and view pages. I was then able to analyze and extract semantic information on the client side, as a post processing step using a browser extension. It would be pretty straightforward to do the same thing on Facebook, sucking down your friends information every time you visited their profile. I Am Not A Lawyer, but this sort of approach is both impossible to detect from the server side and seems hard to EULA out of existence. You’re inherently running an automated script on the pages you receive just to display them, unless you only read the raw HTTP/HTML responses.

So why isn’t this approach more popular? One thing both me and Stan agreed on is that getting browser plugins distributed is really, really hard. Some days the majority of Google’s site ads seem to be for their very useful toolbar, but based on my experience only a tiny fraction of users have it installed. If Google’s marketing machine can’t persuade people to install client software, it’s obvious you need a very compelling proposition before you can get a lot of uptake.

How to build your own Facebook server

Photo by Coccinelle69

In the last post I talked about the mechanics of how an app communicates with Facebook. With the alpha release of Ringside, there’s now an example of how to implement the server side of Facebook. It’s open-source and the two most interesting parts are their underlying mysql database and the PHP interface code that implements the API on top of that. Using mysql makes it hard to scale to massive numbers of users, so it’s not ready to power Facebook yet. On the other hand, having enough users to strain a single database server is a good problem to have. At that point you should have the resources to reimplement something more advanced under the hood.

Having a reference host for any plugin architecture is immensely helpful, especially one that’s open source. For example, if I was having trouble with the details of fetching events, I could open up ringside/api/includes/ringside/api/facebook/EventsGet.php and inspect exactly what their implementation is. There’s no guarantee that it’s the same as Facebook’s code, but it’s at least an unambiguous and exact specification of what somebody else thinks it should be doing. To get your own copy of the source using SVN, run
svn co https://ringside.svn.sourceforge.net/svnroot/ringside ringside

The other exciting part of Ringside’s release is their mysql schema. It could become a defacto standard for expressing the data that underlies all social networks. Anybody who’s able to take their own data source and translate it into the same tables can plug that into Ringside’s system. Turn the key, and you’ve got your own private Facebook. The schema is at ringside/api/config/ringside-schema.sql

If you want to customize it, the API source is full of great examples of how to work with the database to extend its capabilities, though the LGPL licence might require your changes to also be published.

What’s going on under the hood of Facebook’s API?


Photo by fallsroad

Facebook’s API comes wrapped in libraries for all the popular server languages, but there will come a day when you need to debug the raw HTTP transactions that they all boil down to. As a scripting language, the PHP implementation is easy to understand, and I ended up tweaking mine to output the exact text that’s flowing between me and Facebook. This was partly to help debugging, but also for my own curiosity. I’d like to model some of my interfaces on Facebook’s since it’s simple, robust and flexible.

You call a method by sending an HTTP request to "http://api.facebook.com/restserver.php". Arguments to the method are passed in the POST string sent as part of the request. Here’s an example for an event API call, split up on ampersands so that it won’t go off the edge of the blog, and with any secret values replaced with X:


This is generated by taking the normal PHP arguments to each method, along with stored login and API keys, and serializing them into this string. If CURL is present on the server, this is then used to send the request, otherwise PHP’s native HTTP access functions are used.

Assuming that the call name (specified in "method") and the other arguments check out, then the Facebook server will return a string as its response. This string is in XML, and looks something like this:

<?xml version="1.0" encoding="UTF_8"?>
<events_get_response xmlns="http://api.facebook.com/1.0/&quot;
xsi:schemaLocation="http://api.facebook.com/1.0/ http://api.facebook.com/1.0/facebook.xsd&quot;
    <name>Blog World Expo example</name>

… <snip …

The library then takes this simple XML string, and parses it into a PHP hierarchical array of values that looks like this:

    [0] => Array
            [eid] => 5172087276
            [name] => Blog World Expo example
            [tagline] => http://www.blogworldexpo.com/
            [nid] => 0
            [pic] => http://profile.ak.facebook.com/object2/5/55/s5172087276_7478.jpg
            [pic_big] => http://profile.ak.facebook.com/object2/5/55/n5172087276_7478.jpg
            [pic_small] => http://profile.ak.facebook.com/object2/5/55/t5172087276_7478.jpg
            [host] => BlogWorld

… <snip> …

This always matches the structure of the XML. Facebook use a restricted subset that avoids tag attributes and anything else that might make it hard to map to this JSON style format.

Another possibility is that an error will be returned. In that case, the XML will normally just be a couple of tags, the error message string and the numeric error code. This gets converted to a PHP exception.

To dig into this code yourself, I recommend looking through facebookapi_php5_restlib.php in the client folder of the Facebook SDK. That’s a good place to add your own debugging code too, though there’s already some that can be enabled by setting the $GLOBALS[‘facebook_config’][‘debug’] variable to true.

A Facebook Ajax Example

Photo of the original Ajax by Oboulko

One of the toughest parts of the Facebook API is their Ajax support. There’s a good page on their wiki with a small piece of sample code, but since Event Connector uses Ajax heavily, I thought it would be a good real-world example. Here’s the PHP source code.

I’ve removed the application settings from config.php, so you’ll need to create your own application in Facebook and follow the same steps you do for the Footprints sample before you can use it. There’s some inline comments explain the control flow, and covering some of the Ajax quirks. One thing to be aware of is the 10 second time-out in all Facebook page requests. If you’re doing any heavy work on the server, or it could get overloaded, you’ll need a strategy to prevent your users seeing an error screen, which is exactly why I went with Ajax for this situation.

More Facebook API posts

Slinky companies and public transport

Yesterday, Brad posted an article talking about bubble times in Boulder, and quoted a great line from Bill Perry about how they spawned ‘slinky companies’ that "aren’t very useful but they are fun to watch as they tumble down the stairs".

Rick Segal had a post about why he took the train to work, and how people-watching there was a great reality check to a lot of the grand technology ideas he was presented with.

And via Execupundit, I came across a column discussing whether people were really dissatisfied with their jobs, or just liked to gripe and fantasize. One employee who’d been involved in two start-ups that didn’t take off said "Most dreams aren’t market researched."

These all seemed to speak to the tough balance between keeping your feet on the ground and your eyes on the stars. As Tom Evlin’s tagline goes, "Nothing great has ever been accomplished without irrational exuberance." I’ve been wrestling with how to avoid creating a slinky with technology that sounds neat enough to be funded, but will never amount to anything. To do that, I’ve focused on solving a painful problem, and validating both the widespread existence of the problem, and that people like my solution.

I’ve turned my ideas into concrete services, and got them into the wild as quickly as possible. Google Hot Keys has proved that it’s possible to robustly extract data from screen-scraping within both Firefox and IE, but its slow take-up suggests there isn’t a massive demand for a swankier search interface. Defrag Connector shows that being able to connect with friends before a conference is really popular, but the lack of interest so far in Event Connector from conference promoters I’ve contacted shows me it won’t just sell itself. Funhouse Photo’s lack of viral growth tells me that I need to provide a compelling reason for people to contact their friends about the app, and not just rely on offering them tools to do so.

I really believe in all of these projects, but I want to know how to take them forward by testing them against the real world. All my career, I’ve avoided grand projects that take years before they show results. I’ve been lucky enough that all of the dozen or so major applications I’ve worked on have shipped, none were cancelled. Part of that is down to my choice of working on services that have tangible benefits to users, and can be prototyped and iteratively tested against that user need from an early stage. Whether it’s formal market research, watching people on trains, or just releasing an early version and seeing what happens, you have to test against reality.

I’m happy to take the risk of failing, there’s a lot of factors I can’t control. What I can control is the risk of creating something useless!

Funhouse Photo User Count: 1,746 total, 70 active. Much the same as before, I haven’t made any changes yet.

Event Connector User Count: 73 total, 9 active. Still no conference takeup. I did experiment with a post to PodCamp Boston’s forum to see if I could reach guests directly, but I think the only way to get good distribution is through the organizers.

Facebook and event promotion

As I’ve been approaching conference organizers to try Event Connector, I’ve been surprised at how few have Facebook events. It seems like a no-brainer to me if your audience includes anyone under thirty, since it only takes a couple of minutes to create an event. In return, you get a great platform for potential guests to discover your conference, and attendees to hear from you and each other before and after the event. You’re being given permission to market to them, and even better, the participants themselves will spread the word as their attendance shows up on their friends’ feeds, and they get involved on the discussions on the event page itself.

Most of the events I have run across have been unofficial, started by participants rather than organizers. Without publicity from the promoters, these tend to attract only a few guests. To be effective you need to include a link to the event in some material that goes out to a decent number of your guests.

I don’t think it’s that conference organizers don’t want the benefits that facebook events offer, since I see a lot of organizations trying to hand-roll similar services. PodCamp Boston has a page listing all of the attendees who wanted their names to be public, but as a plain text alphabetical list, it’s a lot harder to discover friends than the equivalent on facebook. Facebook events are popular with guests, the New Media Expo 2008 one picked up over a hundred guests in the first few hours after it was created, and this is for an event almost a year away!

Trying to put myself in their shoes, I’d guess that the main obstacles are the fact that no one else is doing it, it’s an unknown quantity, it feels a bit out of their control, and they’ve never needed it before. It does require a willingness to try something new, but the reward for doing so before it’s mainstream is that you’ll get a lot of buzz, publicity and guest goodwill for taking that leap!

If you’re an event promoter, I’d highly recommend you set up a Facebook event, and give it a little promotion. It’s quick, free, and offers both you and your guests significant benefits.

Even better, once you’ve got one set up, you get an Event Connector for free. Go to the main page of the app, and your event will show up at the top. There’s a link you can mail out, and free blogger, typepad and facebook profile badges you can distribute. It adds value to the plain facebook events by allowing users to see which of their friends, and friends-of-friends, are going, which supplies the social proof that will persuade them to sign up.

Funhouse Photo User Count: 1,729 total, 63 active. The same steady growth, and looking at the breakdown, I see the same pattern of non-viral acquisition of users, mostly through the directory and searches.

Event Connector User Count: 72 total, 8 active. Still very quiet, with no conference signed up, and a trickle of users from the directory.

Facebook’s new application statistics


I’m a statistics junkie. Picking some significant metrics, and sticking with them to measure performance is the only way to figure out what’s working and what isn’t. I usually try to design in some measurement tools, but that’s hard with Facebook apps, since their setup hides the referring address/previous page and other useful information.

Luckily, they recently introduced a new statistics page for every app, which you can access from the More Stats link below the application name, from the main developer page.

The first big innovation is the ability to see how many people added and removed you during the previous 24 hours. Before, you could only guess at this by comparing the user totals from day to day, but this wouldn’t tell you how much turnover you had from people removing your app. The most useful part of this, and one that’s a bit hidden, is that the total number of adds is actually a link. If you click on it, you’ll see a bar graph like the one above, showing you exactly where your new users came from.

The top picture is for Funhouse Photo, and it tells me a lot. I was suspicious that my app was very non-viral because the growth in users was very linear, but this confirms that I’m getting the majority from the directory and direct searches, rather than feed stories or other friend-to-friend communications. To improve growth that needs to change, and I’ll be able to tell very quickly if alterations to the app help by looking at those stats.

Less exciting, but still very useful, are the response metrics. I’ve had a recurrent problem with time-outs on my facebook apps because they’re doing heavy processing on the server. It seems like any page request that takes more than 8 seconds to complete results in a Facebook error screen for the user, so to work around that I had to implement asynchronous Ajax loading of page elements that might take a while. Looking at the response statistics shows that both my apps that use this aren’t returning error pages for any users, something I couldn’t verify before.

The final interesting feature is a selection of the URLs that were requested from the app recently. This sampling is a great way to figure out how people are using your app, which features they’re accessing and how often. My apps generally encode a lot of information in the URL using GET rather than POST, so I’m able to get quite a fine-grained look at my users’ interactions with them.

Funhouse Photo User Count: 1,723 total, 95 active. As I mention above, I’ve got new insight into the growth pattern from the add statistics. It explains why growth is so linear, there’s little friend-to-friend spreading of the app.

Event Connector User Count: 71 total, 11 active. The add source statistics show that most of the trickle of new users came from the product directory, which is what I’d expect since I don’t have a conference signed up yet.