How to persuade users to sign up with Facebook Connect


Photo by Poppy Thomas-Hill

A friend told me that she was encountering a lot of people who liked the idea of her new service, but were put off by the idea of using Facebook to connect. She asked me how we tackled this issue for Jetpac, so I thought I'd put together a quick summary of what we've found to help her, and anyone else who's struggling.

In my experience there are three kinds of potential users, who all require different approaches:

Facebook Negatives

Especially in the tech world there's a small but real group of people who either don't have Facebook accounts at all, or who use it minimally. I'm not a heavy Facebook user myself, and I understand a lot of their reasons, and so I just try to make it clear that we plan to support other services like Flickr and Instagram in the future, and leave them in peace.

App Enthusiasts

At the other end of the spectrum, there's a good number of people who don't have any concerns about adding new applications. This population is slowly shrinking, thanks to the feedback from friends who are annoyed when they get spammed, but they're still out there. That means if you can't persuade anyone to sign up for your service, then you're doing really badly and should triple-check your messaging.

Persuadable Skeptics

The biggest group are those who may be willing to connect, but want some reassurance before they do. Here's the things we've found help convince them:

Clear message – The name, tagline and copy on the website and in any ads need to make it very clear why Facebook is needed. People are very wary of signing up if they don't understand why your site needs access to their social network. A friend even suggested putting in an extra dialog before the external permissions page, spelling out exactly what the benefits of connecting are, and why it's necessary for your service, which we hope to try out soon.

Minimal permissions – Cut down the number of different permissions you're asking for to the absolute minimum. As an example, we originally asked for feed posting permissions just so we could easily support an in-app method for users to comment on their friends photos, but experience of spammy applications who abuse that power made many of our early users refuse. We reworked the feature so that we use a Facebook widget for commenting instead, so we didn't have to ask for posting rights, and our conversion rate went way up.

High production values – It sounds superficial, but a professional design for your site is essential. People are looking for any clues about your trustworthiness, and the fact that you've put a lot of effort into the look of your site reassures them that you aren't a scam. Sometimes judging a book by its cover is a useful heuristic.

Personal touch – High production values don't mean adopting a distant, corporate voice, that's guaranteed to put people off. Most likely you're a small team of enthusiasts like us, so use that as a strength, put yourselves front and center. Seeing that the team is proud of what they've built and willing to stand behind it makes a world of difference to wavering users. It's also a helpful culture-building tool internally, if the team knows they're putting their own reputations on the line they will be extra-careful about protecting user information.

The data geekery behind Jetpac

My new startup has just gone public, and I wanted to talk a bit about the data geekery behind the consumer experience, and some of the technical and privacy challenges, so I threw together a quick video cast. You can get more information on the project over on the company blog, and by following us on Twitter. I'm pretty excited to finally be able to talk about what I've been working on for the last six months!

Don't forget too check out my new visualization too, an interactive word map of 50 million photo captions:


Five short links

Photo by Michael Donovan

Place graphs are the new social graphs – Fascinating work by Matt Biddulph, looking for geographic analogies (for example tell me the neighborhood in New York that's most similar to Noe Valley in San Francisco).

Yet another government portal to ignore – Though they're a massive step forward conceptually, most government open data efforts are crippled by terrible usability. I still find myself digging through the FTP server for the US Census, after failing to navigate their web interfaces.

Angels of the Right – There's been a lot of attempts to produce graphs showing networks of influence, but this is by far the most approachable and informative I've seen. It's actually useful for discovering things! Even better, Skye's helped package the code behind it into an open-source framework called nodeviz.  

Data Illumination – An intriguing new data blog that's just started. Not much content yet, but I like what's there so far. I'm guessing the more readers and commenters appear, the more likely we'll keep getting more.

Shuttle Radar Topography Mission – Free elevation data for the entire world, with samples as close as 30m in the US, and 90m for the rest.


Communists in Space, and now on the Kindle

Picture by Joseph Morris

When I was eight years old, I found a book in my brother's room about nuclear war. In it was a map showing the likely British targets of a Soviet nuclear strike as circles. I grew up in East Anglia, surrounded by American air bases, so everywhere for miles around was such a solid mass that you couldn't even see the individual dots. This so terrified me that I made excuses for years to avoid going into the nearby city of Cambridge, I had such a vivid picture in my head of roasting alive as the air caught fire. 

A two-week school visit to Russia just before the fall of the USSR gave a glimpse of the grim and tawdry reality of the Soviet system (brown fruit juice, anyone?) but the idea of communists as terrifying bogeymen has never really left me. I've had a strange fascination, an impulse to understand how people ended up in such a twisted state, that's led me to read up on the early Soviet era, especially Stalin's particularly demonic rule. As I've got older I've also tried to understand what drove well-intentioned people to support terrible actions, and the humanistic resistance of others like George Orwell.

That all left me a prime audience for Ken Macleod's Fall Revolution series. I first came across Star Fraction by accident, but was immediately captured by a very British near future, inhabited by people I recognized. Trotskyite militants battle the Animal Liberation Front, a quasi-Richard-Dawkins summons familiars to attack enemies from his Seastead, and a combined UN/US 'peacekeeping force' has suffered the ultimate mission creep and runs the world from its space weapons platforms. Running through the book is a Communist conspiracy theory that blows the tired Templar myths out of the water because it's based on historical templates that actually happened. Communists truly ran effective underground organizations for decades and otherthrew governments, so for someone with Macleod's knowledge of the movements (here's his take on Orwell in context) there's rich material to choose from.

In case this sounds too stuffy, it's fundamentally an adventure story that has pleasant echoes of Neuromancer, it's not heavy reading. The only thing that has surprised me is how little attention it ever received, people seem far more focused on later books like his Cosmonaut Keep series. Star Fraction was one of those novels that stuck in my head, and since my paper copy is still in storage in the UK, I've been hoping for an ebook version so I could justify buying it again. When I saw Ken announce that one of his more recent books had just been released electronically, I went back to search for a copy of Star Fraction and finally found one for the Kindle, bundled with The Stone Canal as Fractions: The First Half of the Fall Revolution. I'm now a few chapters in and it's every bit as good as I remember, popping with wild ideas and a refreshingly different angle on the world.

Since I didn't see the news appear on Ken's blog, and he didn't know about it when I hassled him on Twitter a few months ago, consider this a public service announcement: Star Fraction is available as an ebook! If you find the idea of Communist Conspiracies in Space at all intriguing, buy it now, you won't be sorry.

How to enter a data contest – machine learning for newbies like me

Photo by John Carleton

I've not had much experience with machine learning, most of my work has been a struggle just to get data sets that are large enough to be interesting! That's a big reason why I turned to the Kaggle community when I needed a good prediction algorithm for my current project. I wasn't completely off the hook though, I still needed to create an example of our current approach, limited as it is, to serve as a benchmark for the teams. While I was at it, it seemed worthwhile to open up the code too, so I've created a new Github project:

It actually produces very poor results, but does demonstrate the basics of how to pull in the data and apply one of scikit-learn's great collection of algorithms. If you get the itch there's lots of room for improvement, and the contest has another two weeks to run!

Installing scikits-learn

Before you can run the python scripts, you'll need to install the scikits-learn machine-learning framework. Here's the instructions.

It's also worth checking out the tutorial and their other guides, they've written some great documentation.

Getting the code

To pull the latest copy of this code and enter the directory run these commands:

git clone git://

cd MLloWorld/

Creating a model

Before you can predict unknown values, you need to train up the algorithm with example data. I've packaged a set of 40,000 items as a CSV file, with each column representing an attribute of the original photo albums. You'll need to run these through the training script to build a model that can be used for prediction. Here's the command:

python training_data.csv storedmodel

That may take ten or twenty minutes to run, but at the end you should have a file called storedmodel in the current directory.

Predicting results

Now that you have a model built, you can take the test set of data and predict their values:

python test_data.csv storedmodel > results.csv

This will also take a few minutes, but at the end you'll have a CSV file containing a list of the album ids and a prediction for each one. It's in the right format to submit to Kaggle, and if you look for the 'Full scikit-learn example' in the benchmarks at the bottom of the leaderboard, you'll see how this simple approach scored:

As you can see, it's not that great! If you modify the code and think you've improved its predictions, you can create a team and submit your new results to find out how well you've done. There's already stiff competition from the current teams of course!

Notes on the internal data format

The trickiest part for me was getting the data into a format that scikit-learn's functions could understand. Because the CSV stores which words occurred for an album, the full row vector for each of them could be thousands of entries long, most of them zero. To speed up the training and save on memory, I used numpy's sparse matrix class to store the results, coo_matrix. You can see the sort of unpacking I do in the expand_to_vectors() function in

[Update – Big thanks to Olivier Grisel who vastly improved the results by fixing some errors in the CSV reader and picking a more accurate and much faster classifier. I've integrated his changes, and now see a score of 0.44, which still puts it at the bottom of the leaderboard but is at least respectable!]