Invites Done Right

Invitation
Photo by Zaknitwij

Tonight I launched InvitesDoneRight. Here's why.

Despite being pulled away from my original focus on email, I'm still obsessed by how much valuable information is sitting neglected in our inboxes. Now that both Yahoo and Gmail support OAuth, I decided to release an application that's been on my mind for years.

If you're running a consumer web service, one of your most important distribution channels is your users sharing with their friends. Unfortunately there's never been an easy way to encourage this. Facebook might seem promising, but for good reason the service has made it hard for applications to broadcast indiscriminately to their user's social network. Many services use contact importing, but address books are both notoriously incomplete and full of people you met once at a trade show. Without extra information, you're stuck presenting the user with a space-shuttle control panel full of checkboxes, and asking them to wade through and figure out who to send invites to. If you get it wrong, not only are your invites ineffective but they'll be marked as spam by the recipients, making it very hard to reach even your existing users!

What's the answer? I think it's getting user's permission to scan message headers and pulling out a shortlist of people they actually exchange emails with. The user gets a nice experience, with only a few people to pick from. The web service gets a better-targeted set of recipients, which means higher conversions and fewer spam reports.

To implement this approach I've just launched the InvitesDoneRight service. If you have a website that signs up new users, just add an extra step that directs them to the service, I'll ask them for permission to figure out a shortlist of contacts, and then I'll call back a URL you provide with ten contacts that they've been in touch with recently. It couldn't be much simpler to integrate.

What about privacy? I'm still thinking hard about how this service could be abused, but I'm rigorous in removing all user data the instant they leave my site. It acts purely as a middle-man between the external service and the mail provider, and only passes the short contact list back to that external service. No other information is either passed or stored or shared anywhere. No email content at all is fetched, just who the recipients are.

I think this could be a powerful tool for websites, as well as improving users' experiences, but this launch is an experiment. Is it too creepy? Are there problems I'm missing that will render it ineffective? Tell me what you think in the comments, or email me.

Why user permissions don’t work

Termsandconditions
Photo by Andrew Currie

Tom Scott is shocked [Bad way of putting it, sorry, see the comments] that OAuth granted permission to applications to read user's direct messages. That's obvious to me as a developer. Everything else a non-private user does on the service is freely available through the search API and streams so there's no need to ask for permission. The important point is that Tom's confusion shows how little even sophisticated users understand Twitter's comparatively simple security model. What made my heart sink was his suggestion that Twitter go down the Facebook path and offer more fine-grained permissions.

If users don't have the time to understand the current model, does he really think they'll spend time tweaking a set of checkboxes? When Internet Explorer throws up dialogs complaining about a mix of secure and unsecure content on a page, does any user know what that means? The Facebook/Microsoft approach is a bunch of legalistic ass-covering by people keen to avoid blame, not a good way of getting informed consent from users.

Just look at this exhaustive study of the effect of contract disclosures on internet user's habits. Hardly anybody reads them, and the few who do don't change their behavior at all! The prevailing model of user security permissions hits exactly the same problem. We've trained our users to click through screens of goobledegook without reading or caring.

So what's the answer? No checkboxes. No space-shuttle control panel of permissions. The only model that people understand is completely binary, public or private, open or closed. Look back to the phone book, everyone understood going ex-directory, and that's all you needed to know. In Twitter's case, this means redesigning the OAuth process so that it's just a 'give me access to your DMs' dialog, since for almost all users that's what it really means. It's not a technology problem, it's a user-experience one.

Messy beats tidy

Messypainting
Painting by Mark Chadwick

I was pleased to see James Clark admitting JSON has overtaken XML, at least for the web world, and William Vambenepe pointing out that RPC over HTTP is giving True REST a good kicking. They both imposed a lot of up-front demands on developers and promised a lot of automated benefits once the world caught up. Neither delivered.

They failed for web developers because they require planning and design to use effectively. Most of us barely have an idea of who our customers when we start a project, let alone a strong set of requirements. The strength of our world is that our systems are malleable enough that we can make it up as we go along. I spent most of my career in embedded systems and on the desktop, and it was an absolute delight to discover how simple it is to produce workable non-trivial applications on the web.

XML asks you to figure out a schema before you start. Proper REST APIs require providers to plan a set of URL locations and verbs to apply to those resources, and clients to figure out their conventions. Enthusiasts counter that you should be doing this sort of planning anyway to build a robust system, but that's a preference, not a law of nature. Doing a shoddy job quickly and learning from it often beats long development cycles.

Technologies like JSON, or APIs using HTTP for transport but with a domain-specific payload, fit into this chaotic development model. There's a similar dynamic with Hadoop, one of its biggest advantages is that you never have to pick a database schema, you just run jobs on the raw source files like logs or other data dumps.

The painful thing for any adult supervisor watching this sort of development is that they know how much technical debt is being accumulated. The reason it makes sense is that the debt almost never comes due. We re-write whole systems as the requirements change out from under us, projects fail (almost never for technical reasons) or we switch to a well-designed open source framework someone else has built. It's no free ride, you can see how hard it was for Twitter to retrofit their systems to cope with massive scaling, but the successful startups have been the ones able to move fast. We're in an era where punk rockers beat chamber orchestras.

Five short links

Fivetrees
Photo by Elod Horvath

An academic wanders into Washington D.C. – Arvind went to attend an IETF workshop on privacy, presenting a one-pager we co-authored (though his contribution was far greater than mine). His experiences of dealing with government people ring very true, and I think he's absolutely right to advocate we all get more involved. As hackers we tend to assume we can code our way around government restrictions, but as everything from Napster to Wikileaks shows, if they get riled enough they can shut services down very effectively.

Glu – LinkedIn have open-sourced their application deployment framework. Not much material here for a sexy demo, but this addresses a problem I see a lot of data companies with large clusters struggling with. A battle-hardened system like this should be a big time-saver for the whole community.

The history of selling personal data – We all know somebody's making money from our personal information, so why not cut out the middle-man and sell it direct? A great run-down of some experiments in this area, including a tantalizing tale of a UK local council contemplating selling data and offering a tax cut in return (sadly with a dead link to the story).

Why Rosetta Stone's attack on Google's keyword advertising system should be rejected – This is one of the reasons technologists should be more involved with the world of government. Rosetta is attempting to prevent Google from showing competitor's ads when users search for "Rosetta Stone". I have issues with how much power Google wields in the search space, but this is an obvious attempt by the language software company to limit user choice for their own benefit. Trademarks exist to prevent confusion, nobody will be tricked into buying a competing product just because their ads appear in this context.

A concise and brilliant peer-reviewed article on writer's block – I've never suffered from writer's block, presumably because my standards are so lax.

“Strata Rejects” lightning talks

Madscientist
Photo by Jennifer Rouse

I've met several people in the last few days who had interesting-sounding proposals that didn't make it through the review process for the Strata conference. After the last conversation, I was struck by the idea of an unofficial get-together the evening before the main conference. So (with absolutely no implied connection to Strata®!) I'm running a 'Rejects' series of lightning talks on Monday. Anyone who had a talk rejected, or didn't get it in before the deadline, gets five minutes to give the PDQ version.

It will be starting at 7pm on Monday January 31st, in an undisclosed location near the hotel in Santa Clara. Munchies, beer and transport will be provided. Email me with your name, and talk title if you'd like to speak, and I'll get back to you with the full details. Bonus points will be given for starting with 'They laughed at my ideas, but I'll show those fools, muhahahahah!'

Map your leads with ForceMapper

I've had a lot of sales professionals using OpenHeatMap to visualize their customers, and they've often asked if I could make it even easier. It doesn't get much simpler than an integrated Salesforce version, so just in time for Dreamforce, here's an early preview of ForceMapper:

https://forcemapper.com/

Just log in with your Salesforce ID and you'll get a dashboard showing your leads and accounts by state, country and city. It's only available for Enterprise-level Salesforce users currently, but once it's been accepted into AppExchange it should be usable by Partner customers too. We're pleased to offer this completely free for the next 30 days, and early users will be rewarded for their help once the premium version is rolled out.

It's not just a visualization tool – load the site up on your iPhone and it will use your current location to suggest nearby customers to visit, complete with directions.

Have questions? Call us free on 1 800 408 6046

Five short links

Fiveluck
Photo by Clara Alim

Lawnchair – “Sorta like a couch except smaller and outside”. Great little project for storing client-side data in a simple, modern way.

Synopsis Data Structures for Massive Data Sets – The more I deal with large data sets, the more I appreciate how useful approaches like Bloom Filters can be. This somewhat dense paper introduced me to a whole menagerie of other ‘synopsis data structures’ that store an incomplete but useful set of properties about a massive data set, within a surprisingly small memory footprint.

Let your gray hair light your way through unfamiliar data – Short but interesting perspective from an investment banker on the attitude you need to effectively analyze data.

Burglaries in Auckland’s Eastern SuburbsOpenHeatMap hits New Zealand, thanks to Reuben Schwarz

GetAround – I spent yesterday trying to make it to a series of meetings scattered around Silicon Valley using only my new folding bike and trains, but the sparse non-peak Caltrain schedule made it impossible. At the end of the day I bumped into Jessica Scorpio handing out donut holes in Palo Alto, and after my day her startup sounded very appealing. It lets you rent out your car when you’re not using it, like a peer-to-peer equivalent of ZipCar. I would feel a lot better about keeping my car for visits like yesterday’s if it wasn’t sitting idle for the rest of the time.

Data is snake oil

Snakeoil
Photo by Library Company of Philadelphia

There's a whole new world of data emerging, along with cheap and easy tools for processing it. Unfortunately a lot of snake-oil salesmen have spotted this too, and are now eagerly mis-using 'big data' in their pitches. I was reminded of this when I read the recent Wall Street Journal article on health insurance companies looking at social network data. There's been detailed demographic and purchase data available for every household in the US for decades, so why haven't they used that existing data if the approach is as effective as the many hopeful consultants claim?

It's because data is powerful but fickle. A lot of theoretically promising approaches don't work because there's so many barriers between spotting a possible relationship and turning it into something useful and actionable. Russell Jurney's post on Agile Data should give you a flavor of how long and hard path from raw data to product usually is. Here's some of the hurdles you'll have to jump:

Acquisition. Few data sets are freely available, and even if you can afford the price, the licensing terms are likely to be restrictive. Even if you have that sorted, you're at the mercy of the providers unless you're gathering it yourself. If they see you making money, in the best case they'll ramp up their price, and in the worst case they'll cut you off either for reputational reasons or so they can offer a similar service themselves. Can you imagine the outcry if insurance companies penalize donors to cancer charities, as the article postulates? Nobody will want to provide data with that sort of reputational risk looming.

Coverage. No matter how good your analysis results are, if you only have source data on 10% of the targets the product will be useless.

Over-determination. Age, income and industry probably do a pretty effective job of predicting your chances of becoming overweight. Is going to the trouble of spotting that somebody's buying exercise equipment really going to improve your prediction enough to justify the expense of testing, implementing and tuning the process?

Poor correlations. The data may just not carry the answers you need. This is more common than you'd think, many relationships that seem like they should be rock-solid don't pan out when you test them against reality.

Noise. A lot of information gets lost in the noise of real-world data sets. I think of this as the Megan Fox problem; so many Facebook users were fans of her that she appeared in almost every region's top 10 list and I had to run normalizing steps to remove her malign influence on my results. That of course degraded the overall fidelity of the conclusions.

So what's the solution? As Russell says, you need a whole new approach to prototyping, focused on building something that works with actual data and lets you interactively explore what works in reality, versus the relationships you hope are there from thought experiments. At least the Aviva study in the article did try out their techniques on 60,000 records, though the report left me with lots of unanswered questions.

Next time somebody's trying to sell you on the awesomeness of their new data technique, ask to see a prototype. If they haven't got that far, it's snake oil.

A simple PHP LinkedIn OAuth example

Dogdoor
Photo by Ranger Gord

While I was researching my LinkedIn data scientists article I had coffee with Adam Trachtenberg. I haven't used the LinkedIn API since when I last looked into it there was no way to find users just by their names and locations. I was happy to discover that the People Search API now makes this straightforward, so in the last few days I've been researching how I can integrate this into my work.

The biggest obstacle was getting past the OAuth login stage, since implementing a secure protocol over plain http means a convoluted dance, and no two vendors do it quite the same. There's a few other examples out there, but I adapted my Gmail/IMAP OAuth PHP code to work with their setup. For your delectation and delight, I present LinkedInOAuthExample, now live on github. It's written to be as concise and dependency-free as possible, but I still find the steps involved somewhat mind-bending. OAuth 2.0 is a lot cleaner, at the cost of requiring an https server, I hope that will become the default for future APIs.

Five short links

Pentahall
Photo by Erwin Morales

Insurers Test Data Profiles to Identify Risky Clients – Intriguing research from the WSJ, and sheds light on possible downsides of all the information we’re sharing online. The concrete examples seem highly unlikely to pan out though, we’ve had detailed household purchase data available for decades, and insurers haven’t found it useful.

IP over Avian Carriers – My favorite part is the test implementation.

A Bully Finds a Pulpit on the Web – The writing and human-interest angles on this article are excellent, but its central thesis that Get Satisfaction and other review service are boosting complained-about sites PageRank is completely wrong. They use rel=”nofollow” specifically to avoid those sort of manipulations by spammers, as do all the major services that support user-generated content. Even Thor from GetSatisfaction was baffled – “The article approaches SEO in near-mystical terms”

HMS Invincible – Interested in a cheap aircraft carrier? I remember walking around this ship on Navy Days (my grandad was in the service), a bit of a shock to discover it’s being sold off.

The Xenotext Experiment – Encoding poetry on the genome of a bacteria capable of surviving heavy radiation and hard vacuum. If this works, the poem will endure until the sun burns out.