The unknown marketing databases that know everything about you

Photo by Jovike

I'm amazed at how much information is available in marketing databases. InfoUSA will sell anyone data on 210 million consumers, that's pretty much every adult in the country. What kind of information? Name, address, age, gender, occupation, income, mail order history, charitable donations, pets, whether there's a grandparent in their household and even whether they've served in the military!

What really interests me is that this has been going on for decades, with no apparent public concern. I wonder if part of it is because people doesn't realize how much is available to marketers? Or don't they care about the privacy of this information? Either way, it's something to think about as we try to figure out the rules for similar information on the internet.

Class, and why I left Britain

Licensed from Getty Images

I recently read a meditation on this iconic photo, and it got me thinking about how the British class system affected my life.

For most of my childhood, I wasn't aware of class at all. My mum was a nurse on night shifts and my dad worked in a chemical factory, but by getting a degree through night-school when we were growing up, rose to a position as a trained chemist. Looking back I'm amazed at how my parents raised three kids on their income, and while I noticed some of my school friends had bigger houses or their parents had two cars, I never felt any gap between us because of that.

The change came when I was 16 and started at Hills Road Sixth Form College. The British system gives kids a choice of where to do the last two years of high school, and Hills Road boasted amazing exam scores. I was incredibly geeky even then, so I leapt at the chance for more challenging lessons. By the end of the two years, I'd come to dread the place.
I started in 'Double maths', which was an advanced course that squeezed two A-levels worth of material into the normal time allotted for one. Looking around the class of 20, it soon became apparent that 15 of them already knew each other, and had already been taught a lot of the material. I soon realized that they'd come from the posh Perse private school, and that Hills Road was attractive because spending their last two years there let them qualify as state school pupils for the admission quotas for Oxford and Cambridge universities.
Not only were they blowing me out of the water academically, they were bursting with self-confidence and always had a put-down to hand. I was already going through a stormy adolescence, and spending time in their presence left me feeling awful, like the 'oik' they considered me. I dropped out of double maths after a couple of terms, but never felt like I truly fitted in, in any of my classes.

It left me resentful, and I started to notice class a lot more. A neighbor who'd attended a private school got a well-paying job 'in the city' at 18, despite terrible exam results. From my perspective as I was stacking shelves at Tesco supermarket, that didn't seem right. After I got my degree and started working in the games industry, I noticed how even at such high-tech companies management often came from posh backgrounds. Managing seemed more like a class privilege than something you earned by experience. I looked around at people on TV, politicians and journalists and noticed how many of them seemed to come from the same class.

I ended up working closely with a few private school kids and got to know them pretty well. They seemed twisted by the system too, forced to show an outwards confidence that was fragile, and left with a fear that their achievements might be more to do with connections than ability.
It affected me too; I had a chip on my shoulder, and I didn't like the sort of person I was becoming. I didn't want to spend the rest of my life either nursing a grudge, or climbing the social ladder. I was learning to be happy with who I was, and I just wanted to go through life with people accepting me for my abilities, not judging me by my accent.

In the summer between finishing Hills Road and starting university, I spent three months in a tree-house in Alaska. That's a whole different story, but what stuck with me was how people judged me. They were a lot more interested in what I could do than who my parents were. It was like a breath of fresh air being asked to honestly prove my abilities, and the memory of that summer stuck with me as I toiled back in the UK. I pushed myself hard to get the sort of specialized experience that was in demand in America.

Finally, after 5 years of learning everything I could about game console graphics, I took a job in the US. I told myself I'd just try it for 6 months, but pretty quickly I knew I couldn't go back, there were opportunities I could never dream of at home. I'm not naive, there's plenty of 'old boys networks' over here (the greek system springs to mind), but they're fragmented and in competition with each other. There's nothing as dominant or closed as the British system, they all have to be open to newcomers or they quickly lose relevance.

I didn't leave Britain because I hated toffs, but because I hated dealing with the issue at all. Both alternatives, grudge-holding or social-climbing, meant burning massive amounts of energy on something completely unproductive, and I wouldn't like the person either would make me become. I love America because it's given me the freedom to get things done without wasting my time on class.

All the cool kids are using the Rapleaf API

Photo by Georgios Karamanis

I spend a lot of time worrying about the privacy implications of the new wave of information about people that's becoming available, but I'm also fascinated by the beneficial possibilities. Rapportive and Etacts are great examples of that, using public profile data in innovative ways to solve every day problems.

What's less well-known is that they're both built on top of the Rapleaf API. Rapleaf has traditionally been focused on B2B applications, and any firm selling personal information to other companies is going to suffer from an 'ick' factor, but the new startups demonstrate that a 'phone book for the internet' can offer some practical benefits to users too.

I sat down with Auren and Dayo from Rapleaf on Friday and had a wide-ranging discussion about this world. They're very careful not to steal any of their partners' thunder by trumpeting the connection, are in the habit of keeping a low profile generally and so probably wouldn't want me to blog about this, but I think their API is a massively under-used resource in the startup world. If you're doing anything with sets of email addresses, you can offer your users much richer views of the people behind those addresses using Rapleaf. It's not perfectly accurate in the connections it finds, but it does a pretty good job and if you need an example of how to implement it, you can find one here in my FindByEmail project.

Just remember, this is personal information about real people you're dealing with, so use it for the forces of good, not evil! And if you want to remove your information entirely from Rapleaf, you can do that here.

Business plans for public data

Photo by Leo Reynolds

More information about their users is being made public by social networks, and the tools to work with massive data sets are getting cheaper. A lot of companies are trying to figure out ways to make money from these two trends, so I wanted to give an overview of some practical revenue streams either potential customers have asked me about, or that I’ve seen competitors in this space using.

I’ll focus on cataloging what I’ve seen, rather than digging into the ethical debates that some of the applications raise. It’s important to understand what’s possible and happening right now so we can have a meaningful argument about what the rules should be.

Improved search results

I got started working this area when one of my products needed to match up email contacts with their social network accounts. I wanted to automate the process of Googling a person’s name when you first exchange emails with them, and so my first thought was using an API to one of the existing search engines. Unfortunately Google actually blocks most Facebook results from their API, and Bing and Yahoo have very spotty coverage, missing a lot of users. That led me to write my own simple crawler to catalog Facebook profiles myself just to do those name/location lookups.

I later realized how much other fascinating information was available in those public profiles, but I ran across several smaller search startups willing to pay for just the information matching a name and location to a profile. It’s definitely not a massive market, but there’s money to be made, and since it’s identical to Google’s functionality it doesn’t raise many ethical questions.

Examples: ,

Better targeting for direct email marketing

This is one of the least known but most lucrative uses for public profile data. A company with a large email list will run all the addresses through a lookup service that gives them a list of their customer’s social network accounts. That knowledge can then be used in all sorts of ways to target customers, from sending special Twitter offers only to people you know are on the service, to pulling detailed location information for localized campaigns. I’ve even heard rumors of a Vegas casino that upgrades guests to suites if it spots they have a lot of Twitter followers! If you want to see something like this in action, Flowtown offers a lot of these features thanks to the Rapleaf API.

Any business-to-business use of our personal information is inherently a bit creepy, but direct marketing firms have been doing similar analysis for decades using traditional data sources like magazine subscriber surveys, so this seems fairly uncontroversial.

Examples: Flowtown, Rapleaf

Hedge funds

The most direct link between information and money is in the financial world. For example, if you can detect that a brand is becoming popular before anyone else, you can buy shares in that firm and benefit from the price rise when that success shows up in their profits.

Hedge funds have been using non-traditional metrics for years, doing things like running their own focus groups and opinion polls, but recently there’s been a lot of interest in the flood of information flowing through social networks. Twitter is the most obvious example of a data source, but the audience is both small and heavily skewed towards geeks, making it hard to pull out meaningful information. My feeling is that this will only become really useful once mass-market data is more available. Imagine being able to spot companies where a lot of employees have recently updated their LinkedIn profiles, for an early warning of firms in trouble.

One challenge to this approach is that you need some kind of historical baseline to compare current figures against, to tell if they represent something real or are just noise. That’s a barrier because it means you need to have been collecting the data for some time before it starts to become valuable to hedge funds.

Again this seems to be an extension of existing processes, just slotting in public profiles as a new data source, so it’s hard to see what new ethical ground is being broken.

Examples: YouGov

General marketing intelligence

Marketing managers for big brands constantly have to make decisions about how to allocate their resources and craft their messages, and they need the right information to make good choices. My FanPageAnalytics project was aimed at those people, giving them unique information about who their and their competitor’s fans were, what else they were fans of and where they lived.

There’s definitely money to be made in this area, but brand managers are busy and non-technical, so they require something very targeted to their needs and don’t seek out new solutions. My feeling is that makes the leaders like Radian6 hard to beat even as the technology changes, because they have built relationships with most brand managers that gives them a defensible distribution channel.

Examples: Radian6, Scout Labs

Reaching influencers for PR purposes

Public relations people want to persuade influential people to write about their clients. One problem is that they may not know who the influential people are in a given area, or they may know but be unable to reach them effectively.  Ever since I did my Twitter visualization, I’ve been asked about this use case repeatedly. The holy grail is being able to enter a topic, see who the most influential people are *and* who they are influenced by. Very often there are lesser-known specialists who are read by more popular writers for story ideas, and those sources may be an easier route to getting your stories to those mainstream influencers than approaching them directly.

This is one of the few areas where Twitter’s comparatively small user base is not a issue since most people who broadcast to an audience are using the service as another channel. Using information from other networks to reach them can feel like stalking though, so I expect that the increasing availability of public data will be countered by celebrities locking down their privacy settings.

Examples: Klout

Recruitment targeting

Weak relationships, people you met once at a trade show, are surprisingly effective when it comes to getting a job. Recruiters contribute a massive chunk of LinkedIn’s revenue, and people are largely happy to see their resumes and connections shared for job-hunting purposes. It’s a pretty sweet position for LinkedIn, since it makes them the only customer-facing business that’s able to sell their users’ private data to other companies without fear of a backlash. It’s an area that could be helped by the new flood of public profile data too, especially if you can get some information about people’s connections. I’ve run across two different firms who’ve tapped into their employees’ friends networks on Facebook and Twitter to help fill positions, and I imagine there has to be a lot more innovation coming in this area.

Examples: LinkedIn

How to create a job in Elastic MapReduce

I’m on a crusade to spread the word about the potential of Elastic MapReduce to revolutionize data processing for startups (a 100 machine cluster for $10 an hour!) so I’ve produced a 7 minute screencast showing exactly how to create a new job. I’ve embedded the YouTube version above, or you can find a higher-quality version here.

I’m thinking about rolling out a series of these, taking you all the way from gathering the source data to visualizing the results, so please let me know what you do and don’t like about this version.

How to set a custom screen resolution in OS X

Photo by AMagill

I just lost an hour of my life trying to figure out how to set my MacBook Pro to 1280×720 on the main display, so to save anyone else from banging their head against the desk, here's the steps that finally worked for me:

1 – Download and install SwitchResX

2 – Go to System Preferences and click on the SwitchResX icon

3 – Click on Color LCD on the left side

4 – Choose the Custom Resolutions tab

5 – Click on the plus icon to create a new resolution

6 – Choose Scaled Resolution from the top drop-down menu

7 – Enter the resolution you want in the two boxes below

8 – Click OK

9 – Check to make sure the resolution that now shows up in the list is correct. I've found it will sometimes forget one of the values and set it to zero! If it does that, go back in and re-edit and save it until it does appear correctly.

10 – The Type column should read Scaled, and the Status should be Uninstalled. Now press Command-S to save your changes.

11 -You should now see Needs to Reboot in the Status column. As you may have guessed, this means you need to reboot your machine so choose Restart from the main system menu.

12 – Once the system has restarted, go to the normal display preferences and you should see your new resolution listed there.

If this does fix your problem, please buy a copy of SwitchResX for 14 Euros to support its development.

Wondering why I needed a custom resolution of 1280×720? I'm working on building some more professional screencasts that are going to be run through a 720p video production pipeline, so I need to capture my whole screen at that resolution. I expected it to be fairly simple to set up, but everywhere I turned I hit baffling UI. It makes no sense that you can't set a custom resolution within Apple's preferences in the first place, and then I spent a lot of time and $20 on DisplayConfigX with no luck, before I figured out how to get SwitchResX working.

The decline and fall of enterprise relationship management

Photo by Hanadi Traifeh

For the first year of Mailana's life, I was focused on building a system that large companies could use to identify internal experts, based on the content of email messages that their employees sent. What I built was effectively an auto-generated LinkedIn, and you can see a demo of it here. There was a lot of interest in the idea, but it floundered on both privacy concerns and the fact that only a small percentage of people in a company spend time looking outside their immediate team. I had plenty of mechanisms to ensure people's information stayed under their control, but it still felt a bit creepy, and definitely freaked out legal and compliance folks. The real killer was that the people who drooled over it, the internal entrepreneurs and the uber-salesmen, were not the people writing the checks.

After I switched to applying the same technology to the consumer world, I still kept an eye on my competitors' progress. I've been pretty sad to see a lot of ERMs flounder, though there are still a few fighting on:


Tacit was a big, late 90's traditional enterprise software company that received a lot of investment. They were building on a very similar concept to mine, finding experts on technical topics based on mail messages. I heard some good reports from their users, but was also told their interface for contacting experts was extremely clunky, apparently because they had a lot of mechanisms to preserve privacy that got in the way of the user experience. The company closed and the technology was sold in a fire sale to Oracle a couple of years ago.

Visible Path

Another well-funded startup, VP spent six years in the '00's focused more on the connection side of corporate social networks, trying to offer value through identifying strong relationships both inside and outside the company. It was an interesting contrast with LinkedIn's approach of capturing any connections you have, with no way to differentiate between childhood friends and people you met once at a trade show. Unfortunately they hit the same sort of issues I did when trying to sell enterprise-wide systems, and were moving to a more individually-based product when they were bought out by Hoovers. I was very sad to see that the product has now been discontinued, it had some rabid fans.

Contact Networks

Bought out by Thomson/Reuters a couple of years ago, Contact Networks was also focused on mining the contact information that lives within an organization, but without worrying too much about the strengths of any connections. They still appear to be going strong with some recent updates on their site.

Trampoline Systems

A firm I discovered when they attended the first Defrag, Trampoline were looking both at identifying experts and internal relationships. They used to have a great demonstration of their Sonar platform using the Enron email data set, but unfortunately that seems to have been taken down. They're still pushing ahead with their work, and recently have been looking to an innovative way of raising money that they've dubbed crowdfunding.

Microsoft's Knowledge Network

This was an innovative experiment by MS a few years back, rolling a lot of these expertise and relationship mining ideas into a prototype Outlook plugin. The add-in was removed after a few months, but I hear some of the same technology is finally making it into the latest versions of Sharepoint and Outlook.

So what does the future hold? I think Microsoft's moves are a good indicator. We've now got a world where more and more social network features are being accepted within the enterprise, and internal services like Sharepoint or Jive are the natural distribution channels for this sort of work. There's always been people who love having this sort of information to help them in their jobs, the problem's been getting it to them and then getting revenue back in return!

Blocked from accessing Gmail using OAuth and IMAP

Photo by Vizzzual

I was pretty excited to see Google rolling out an extension to IMAP that lets you authenticate using OAuth. That all sounds incredibly geeky, but it means you won't have to share your password with a site that wants to work with your inbox. Before this, any innovative services working with your messages had to request and store user's passwords in plain text!

I went ahead and implemented the new extension, and wrote a simple example showing how to use OAuth to log in to IMAP. It's all available in the Handmade IMAP library at with a live version running at

Unfortunately it looks like Google have blocked access to this feature to most developers. The awesome etacts service is able to use it, but they seem to have disabled it for all other sites. I've sent out some emails to Google folks asking for help, but no response so far.

This is a real shame, since this is a great opportunity to close a big security hole, and remove any reason to share passwords with third-party sites. I hope it gets sorted out soon, I'll let you know if I make any progress.

[Update – I got a reply from Eric Sachs at Google: "We ended up having much higher interest then was expected in that API,
so we have decided that instead of answering questions about the current
test version, we are going to focus on trying to get it fully launched
in the next few weeks."]

How to implement the Twitter oAuth UI in PHP

Photo by Richard Parmiter

There's some great PHP libraries out there for handling OAuth and the Twitter API, but I never found a simple example showing how to handle the user interface side. It's a bit of a pain because you have to send the user away from your site and then deal with their return at some later time.

After I implemented my own workflow for this (on top of Abraham William's Twitter library), I thought it would be useful to strip it down to a template that other people could reuse. I've put the code up as and you can try it for yourself at

It creates an authorization link for users to click on, and handles retrieving their access tokens when they return from Twitter. In a real application you'd want to store all the information in a database, but to simplify the code I'm keeping the tokens in session variables. To get it running on your own server:

Go to, click on 'Register an application' and fill out the form. You'll need to make sure the Callback URL field points back to the place you're going to put the example code. In my case, this is

The next screen will give you the API keys you need. Open up config.php in an editor, and put the value from the heading 'Consumer key' into TWITTER_API_KEY_PUBLIC and the value from 'Consumer secret' into TWITTER_API_KEY_PRIVATE.

Copy the code up to your web server, and then you should have a working process for authorizing access to the Twitter API.

Why you should visit Santa Cruz Island


Liz and I just got back from a four day trip to Santa Cruz Island, helping to maintain the hiking trails. We drove all the way from Colorado to California for the opportunity, and we've been so many times we've lost count. On the drive, I was thinking about what keeps us coming back, and why I recommend it to anyone who loves the outdoors.


The trip over

To get to Santa Cruz Island you need to take an Island Packers boat from Ventura harbor. The trip only takes about an hour, but it packs in an amazing range of sea-life. Just yesterday we had a humpback whale leap out of the water and do a 180 degree twist only 200 feet from the boat, there's always hundreds of dolphins, and I've even had Orcas approach close enough to bite me.


Solitude on LA's doorstep

That's the view I wake up to every morning on the island. With no permanent inhabitants or cell reception, the only vehicles a few ranger trucks, and a hundred square miles to lose yourself in, Santa Cruz is heaven for anyone looking to get away from it all. Even better, you can be there in just a couple of hours from the center of LA, whether you want a quick day trip or a longer camp out.

There's no commercial presence there at all, no food stands, not even a soda machine, so you'll need to be prepared for a trip back to the 19th century, but it's worth it for the tranquility.


Watch a world recover

When we first started visiting almost a decade ago, the sheep had only just been removed and there were still wild pigs roaming everywhere. Ecologically it was a mess, the sheep had devoured almost all the native vegetation, leaving nothing but brown grass to cover the hills in the summer; the pigs were digging up the dirt in search of roots and causing the hillsides to erode, and with no predators the mice were everywhere. Now they've eradicated all the pigs, deported the golden eagles that lived on them and reintroduced bald eagles, and got rid of the fennel thickets that choked the trails. The difference over just a few years has been astonishing, with clumps of buckwheat, coreopsis and the unique island oaks popping up over previously bare hillsides. Even better, the indigenous island foxes have gone from being endangered to pests in the campground in record time, with numbers up from a few hundred to 1,200 in just two years now the golden eagles are no longer picking them off.

You'll never get another chance to see a whole National Park turn itself from a barren wasteland into a natural garden packed with plants and animals you'll find nowhere else. Get out there now while it's still in progress, and I guarantee you'll be amazed at the changes as you keep coming back.


Experience a dark history

I don't know if the island attracts crazy people, or if it turns normally sane people a little nuts, but you'll be surprised at how many of the people you'll meet there have ended up with a borderline obsession with the place. I'm one to talk, driving 1200 miles to visit, but its recorded history is a long succession of feuds, disputes and dreams of private empires. The one man who built a successful ranching venture on the island left behind a family that squabbled for over a century, with lawsuits ricocheting so long that finally most of it was sold to pay the legal bills, with the final parcel taken over after a dawn helicopter raid by a SWAT team in 1997! Long before that there's evidence of over 13,000 years of Chumash habitation, possibly the earliest in the Americas, before the population was taken to the mainland for easier control. There's so much archaeology, it's hard to walk anywhere that doesn't show evidence of a midden or worked chert fragments.

You'll need to be a big donor or volunteer with the Nature Conservancy before you can visit the main ranch situated on their land (their acquisition of that property was more fallout of legal feuding; the previous owner was determined to avoid being forced to sell to the NPS) but you can explore the smaller stations such as Scorpion and Smugglers, with century-old groves of olive and cypress trees to shelter under. There's also a new visitor's center at Scorpion, with some amazing work by Exhibitology giving a fascinating look into the island's past.

I haven't even touched on the breathtaking hikes, secluded campgrounds like del Norte or diving so spectacular that Jacques Costeau considered it the best in the temperate world. If you need to refresh your soul (and are willing to risk developing a lifetime obsession) visit Santa Cruz Island.