Blocked from accessing Gmail using OAuth and IMAP

Brickwall
Photo by Vizzzual

I was pretty excited to see Google rolling out an extension to IMAP that lets you authenticate using OAuth. That all sounds incredibly geeky, but it means you won't have to share your password with a site that wants to work with your inbox. Before this, any innovative services working with your messages had to request and store user's passwords in plain text!

I went ahead and implemented the new extension, and wrote a simple example showing how to use OAuth to log in to IMAP. It's all available in the Handmade IMAP library at http://github.com/petewarden/handmadeimap/ with a live version running at http://web.mailana.com/labs/handmadeimap/gmailoauthexample/

Unfortunately it looks like Google have blocked access to this feature to most developers. The awesome etacts service is able to use it, but they seem to have disabled it for all other sites. I've sent out some emails to Google folks asking for help, but no response so far.

This is a real shame, since this is a great opportunity to close a big security hole, and remove any reason to share passwords with third-party sites. I hope it gets sorted out soon, I'll let you know if I make any progress.

[Update – I got a reply from Eric Sachs at Google: "We ended up having much higher interest then was expected in that API,
so we have decided that instead of answering questions about the current
test version, we are going to focus on trying to get it fully launched
in the next few weeks."]

How to implement the Twitter oAuth UI in PHP

Passwordpants
Photo by Richard Parmiter

There's some great PHP libraries out there for handling OAuth and the Twitter API, but I never found a simple example showing how to handle the user interface side. It's a bit of a pain because you have to send the user away from your site and then deal with their return at some later time.

After I implemented my own workflow for this (on top of Abraham William's Twitter library), I thought it would be useful to strip it down to a template that other people could reuse. I've put the code up as github.com/petewarden/twitteroauthexample and you can try it for yourself at web.mailana.com/labs/twitteroauthexample.

It creates an authorization link for users to click on, and handles retrieving their access tokens when they return from Twitter. In a real application you'd want to store all the information in a database, but to simplify the code I'm keeping the tokens in session variables. To get it running on your own server:

Twitteroauthscreenshot
Go to http://twitter.com/oauth_clients, click on 'Register an application' and fill out the form. You'll need to make sure the Callback URL field points back to the place you're going to put the example code. In my case, this is http://web.mailana.com/labs/twitteroauthexample/

The next screen will give you the API keys you need. Open up config.php in an editor, and put the value from the heading 'Consumer key' into TWITTER_API_KEY_PUBLIC and the value from 'Consumer secret' into TWITTER_API_KEY_PRIVATE.

Copy the code up to your web server, and then you should have a working process for authorizing access to the Twitter API.

Why you should visit Santa Cruz Island

Cavernpoint

Liz and I just got back from a four day trip to Santa Cruz Island, helping to maintain the hiking trails. We drove all the way from Colorado to California for the opportunity, and we've been so many times we've lost count. On the drive, I was thinking about what keeps us coming back, and why I recommend it to anyone who loves the outdoors.

Dolphins

The trip over

To get to Santa Cruz Island you need to take an Island Packers boat from Ventura harbor. The trip only takes about an hour, but it packs in an amazing range of sea-life. Just yesterday we had a humpback whale leap out of the water and do a 180 degree twist only 200 feet from the boat, there's always hundreds of dolphins, and I've even had Orcas approach close enough to bite me.

Morningview

Solitude on LA's doorstep

That's the view I wake up to every morning on the island. With no permanent inhabitants or cell reception, the only vehicles a few ranger trucks, and a hundred square miles to lose yourself in, Santa Cruz is heaven for anyone looking to get away from it all. Even better, you can be there in just a couple of hours from the center of LA, whether you want a quick day trip or a longer camp out.

There's no commercial presence there at all, no food stands, not even a soda machine, so you'll need to be prepared for a trip back to the 19th century, but it's worth it for the tranquility.

Islandfox

Watch a world recover

When we first started visiting almost a decade ago, the sheep had only just been removed and there were still wild pigs roaming everywhere. Ecologically it was a mess, the sheep had devoured almost all the native vegetation, leaving nothing but brown grass to cover the hills in the summer; the pigs were digging up the dirt in search of roots and causing the hillsides to erode, and with no predators the mice were everywhere. Now they've eradicated all the pigs, deported the golden eagles that lived on them and reintroduced bald eagles, and got rid of the fennel thickets that choked the trails. The difference over just a few years has been astonishing, with clumps of buckwheat, coreopsis and the unique island oaks popping up over previously bare hillsides. Even better, the indigenous island foxes have gone from being endangered to pests in the campground in record time, with numbers up from a few hundred to 1,200 in just two years now the golden eagles are no longer picking them off.

You'll never get another chance to see a whole National Park turn itself from a barren wasteland into a natural garden packed with plants and animals you'll find nowhere else. Get out there now while it's still in progress, and I guarantee you'll be amazed at the changes as you keep coming back.

Railing

Experience a dark history

I don't know if the island attracts crazy people, or if it turns normally sane people a little nuts, but you'll be surprised at how many of the people you'll meet there have ended up with a borderline obsession with the place. I'm one to talk, driving 1200 miles to visit, but its recorded history is a long succession of feuds, disputes and dreams of private empires. The one man who built a successful ranching venture on the island left behind a family that squabbled for over a century, with lawsuits ricocheting so long that finally most of it was sold to pay the legal bills, with the final parcel taken over after a dawn helicopter raid by a SWAT team in 1997! Long before that there's evidence of over 13,000 years of Chumash habitation, possibly the earliest in the Americas, before the population was taken to the mainland for easier control. There's so much archaeology, it's hard to walk anywhere that doesn't show evidence of a midden or worked chert fragments.

You'll need to be a big donor or volunteer with the Nature Conservancy before you can visit the main ranch situated on their land (their acquisition of that property was more fallout of legal feuding; the previous owner was determined to avoid being forced to sell to the NPS) but you can explore the smaller stations such as Scorpion and Smugglers, with century-old groves of olive and cypress trees to shelter under. There's also a new visitor's center at Scorpion, with some amazing work by Exhibitology giving a fascinating look into the island's past.

I haven't even touched on the breathtaking hikes, secluded campgrounds like del Norte or diving so spectacular that Jacques Costeau considered it the best in the temperate world. If you need to refresh your soul (and are willing to risk developing a lifetime obsession) visit Santa Cruz Island.

Facebook data destruction

I'm sorry to say that I won't be releasing the Facebook data I'd hoped to share with the research community. In fact I've destroyed my own copies of the information, under threat of a lawsuit from Facebook.

As you can imagine I'm not very happy about this, especially since nobody ever alleged that my data gathering was outside the rules the web has operated by since crawlers existed. I followed their robots.txt directions, and was even helped by microformatting in the public profile pages. Literally hundreds of commercial search engines have followed the same path and have the same data. You can even pull identical information from Google's cache if you don't want to hit Facebook's servers. So why am I destroying the data? This area has never been litigated and I don't have enough money to be a test case.

Despite the bad taste left in my mouth by the legal pressure, I actually have some sympathy for Facebook. I put them on the spot by planning to release data they weren't aware was available. I know from my time at Apple that reaching for the lawyers is a tempting first option when there's a nasty surprise like that. If I had to do it all over again, I'd try harder not to catch them off-guard.

So what's the good news? From my conversations with technical folks at Facebook, there seems to be a real commitment to figuring out safeguards around the widespread availability of this data. They have a lot of interest in helping researchers find ways of doing worthwhile work without exposing private information.

To the many researchers I've disappointed, there's a whole world of similar data available from other sources too. By downloading the Google Profile crawling code you can build your own data set, and it's easy enough to build something similar for Twitter. I'm already in the middle of some new research based on public Buzz information, so this won't be stopping my work, and I still plan to share my source data with the research community in the future.

Flexible access to Gmail, Yahoo and Hotmail in PHP

Knittedmask
Photo by Poppalina

I've been a heavy user of the php-imap extension, but last year I was driven to reimplement the protocol in native PHP by its limitations. Because it was a compiled extension implementing a thin wrapper on top of the standard UW IMAP library, any changes meant working in C code and propagating them through several API layers. A couple of problems in particular forced me to resort to my own implementation:

– Yahoo uses a strange variant of IMAP on their basic accounts, where you need to send an extra undocumented command before you can login.

– Accessing message headers only gives you the first recipient of an email, and getting any more requires a full message download. This is a design decision by the library, not a limitation of the protocol, and severely slowed down access to the information Mailana needed.

I've now open-sourced my code as HandmadeIMAP. I chose the name to reflect the hand-crafted and somewhat idiosyncratic nature of the project. It's all pulled from production code, so while it works for my purposes it isn't particularly pretty, only implements the parts I need and focuses on supporting Gmail, Yahoo and Hotmail. On the plus side, it's been used pretty heavily and works well for my purposes, so hopefully it will prove useful to you too.

I'm also hoping to use it as a testing-ground for Gmail's new oAuth extension to IMAP that makes it possible to give mailbox access without handing over your password. Because new commands can easily be inserted, it should be possible to follow the new specification once the correct tokens have been received, but I will let you know how that progresses.

To give it a try for yourself, just download the code and run the test script, eg:

php handmadeimaptest.php -u youremail@gmail.com -p yourpasswordhere -a list -d

Thinking by Coding

Thinker
Photo by Sidereal

David Mandell took me under his wing at Techstars last summer, and we ended up spending a lot of time together. One day towards the end of the program he told me "Pete, you think by coding". That phrase stuck with me because it felt true, and summed up a lot of both my strengths and weaknesses.

I choose my projects by finding general areas that have interesting possibilities for fun and profit, but then almost immediately start hacking on code to figure out the details of the technical landscape. I find it intensely frustrating to sit and discuss the theory of what we should be building before we have a good idea of what's technically possible. Just as no battle plan survives contact with the enemy, so no feature list on a technically innovative product survives the realities of the platform's limitations.

I've never been a straight-A student, and I'm not a deep abstract thinker, but I have spent decades obsessed with solving engineering problems. I think at this point that's probably affected the wiring of my brain, or at least given me a good mental toolkit to apply to those sort of puzzles. I find myself putting almost any issue in programming terms if I want to get a handle on it. For example, the path to sorting out my finances suddenly became clear when I realized I could treat it as a code optimization problem.

It also helps explain why I'm so comfortable open-sourcing my code. For me, the value I get out of creating any code is the deep understanding I have to achieve to reach a solution. The model remains in my head, the actual lines of code are comparatively easy to recreate once I've built the original version. If I give away the code, anyone else still needs to make a lot of effort to reach the level of understanding where they can confidently make changes, so the odds are good they'll turn to me and my mental model. My open-source projects act as an unfakeable signal that I really understand these domains.

This approach has served me well in the past, but it can be my Achilles heel. Most internet businesses these days don't require technical innovation. They're much more likely to die because no potential customers are interested than because they couldn't get the engineering working. Market risk used to be an alien concept to me, but the dawning realization that I kept building solutions to non-existent problems drove me into the arms of Customer Development.

I feel like I've now been punched in the face often enough by that issue that I've (mostly) learned to duck, but I'm also aware that I'm still driven by techno-lust. I love working in areas where the fundamental engineering rules are still being figured out, where every day I have the chance to experiment and discover something nobody else had known before. I can already feel most of the investors I've met shaking their heads sadly. I'm well aware that starting a revenue-generating business is almost impossible anyway, let alone exposing yourself to the risk and pain of relying on bleeding-edge technology. The trouble is, that's what my startup dream has always been. I want to do something genuinely fresh and new with the technology, and hope to be smart and fast enough to build a business before the engineering barriers to entry crumble.

I don't know if I'd recommend my approach to anyone else, but it seems baked in to who I am. Oddly enough, I didn't fully understand that until I wrote this post. Perhaps there's some thinking that I can only do by writing too?

How to gather the Google Profiles data set

Beebody
Photo by Max Xx

With the launch of Buzz, millions of people have created Google public profiles. These contain detailed personal information, including name, a portrait, location, your job title and employer, where you were born, where you've lived since, links to any blogs and other sites associated with you, and some public buzz comments. All of this information is public by default, and Google micro-formats the page to make it easier for crawlers to understand, allows crawling in robots.txt and even provides a directory listing to help robots find all the profiles (which is actually their recommended way to build a firehose of Buzz messages).

This sort of information is obviously a gold-mine for researchers interested in things like migration and employment patterns, but I've been treading very carefully since this is people's personal information. I've spent the last week emailing people I know at Google, posting on the public Buzz API list, even contacting the various government privacy agencies who've been in touch, but with no replies from anyone.

Since it's now clear that there's a bunch of other people using this technique, I'm open-sourcing my implementation as BuzzProfileCrawl. As you can tell from looking at the code this is not rocket-science, just running some simple regular expressions on each page as it's crawled.

We need to have a debate on how much of this information we want exposed, on how to balance innovation against privacy, but the first step is making it clear how much is already out there. There's a tremendous mis-match between what's technologically possible, and ordinary people's expectations. I hope this example helps spark an actual debate on this, rather than the current indifference.