Hello Social World

February 15, 2011 By Pete Warden in Uncategorized Leave a comment

These days it doesn't make much sense to build a consumer site with its own private account system. If users sign in using Twitter or Facebook, you can have them easily spread the word on what they're doing in your application through tweets or wall posts. The old 'enter details'/'email confirmation'/'click link' process was a leaky funnel that lost potential users too, and was a pain to implement, since automated emails alone are an art in themselves.

I wanted to use this approach on one of my sites, but I couldn't find a good example. Especially in the Rails/Sinatra world there were plenty of building blocks available for doing the individual tasks, like authenticating through Twitter or Facebook, calling their APIs, and storing data, but nothing that showed how to use them together. So, because I needed to build it anyway and it would be a good Ruby learning exercise, I set out to create a minimal-but-complete template that handled user creation, authentication, custom data storage and external sharing. In the spirit of the classic 'Hello World' code examples, here's Hello Social World.

You can try a live demo at http://hellosocialworld.heroku.com/. It's built to do the bare minimum you need for a modern social site, letting users log in, edit some data they own on your site, and then share the results with their friends on Twitter or Facebook. The data in this case is just the user's favorite color, but imagine replacing that with whatever content they actually create on your site; blog posts, photo galleries, comments, etc. It's deliberately left with zero styling, so you can add your own view code.

I was really impressed by the elegance of Ruby and the Sinatra framework. Aside from the API keys, all the code's in a single 440 line file, and 120 of those lines are comments. This is possible because the gems handle most of the heavy lifting for me. It feels very productive to focus on my application logic instead of writing boilerplate code. There were moments of frustration when things went wrong, the black box nature of gems makes them tough to debug, but there seems to be a big enough community using them that they mostly just work.

I'll be using this template heavily myself over the next few weeks, which will hopefully give me ideas on improvements, but let me know if you have bug reports or suggestions. I'm looking forward to hearing people's thoughts on this approach to a turn-key social site too.

Ruby on Rails first impressions

February 12, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Afternoon Sunlight

Ruby on Rails has often attracted me, but I've never had a strong enough reason to dive in. This afternoon, as I unhappily contemplated writing yet another web app with Twitter authentication in PHP, I finally caved. I did think about trying Django instead, since I'm using Python heavily in the back end, but the examples I found felt a bit too bondage-and-discipline for a UI that I want to rapidly iterate on. So, five years after the rest of the world, here's what I discovered in my explorations.

Fantastic Documentation. I read through the first few chapters of the Rails Tutorial book to get a feel for how things worked. It's task-oriented, hands-on, focused on things I care about and generally nicely written. All of the code and examples Just Worked, and from the footnotes this is obviously an actively maintained and tested guide.

Arrgghh Version Changes. With my confidence boosted by working with the tutorial, I set out to integrate this open-source Twitter OAuth module. None of the directions worked, some of them didn't even make sense based on what I'd learned. What is 'script/generate'? Is that like 'rails generate'? After digging I realized it was created for Rails 2 and I was on 3. How big a change could that be? I eventually got things working, but I had to hack so many things manually, from the option loading to the database migration, it was obviously a massive shift in the way things were done. In a way it's really refreshing to find a community so willing to radically clean up things between versions, but it does orphan valuable external projects like this.

Everything's There. It is really nice to have a single, standard way to do most tasks. Since git is the recommended source tool, deploying to Heroku becomes very easy to explain. When things work, they just work. There seems to be One True Way for everything from laying out project files to which source code control system and database to use, and that feels very Apple-esque. The un-hackerish secret I learned from Mr Jobs was that denying people choices saves a lot of confusion and explaining. This is definitely very constraining at times, but when it works it gives a great user experience. I know there's nothing here that I couldn't technically be doing with PHP, but getting it all out of the box and easily google-able (because everyone's using a similar setup) makes a big difference.

I'm pretty happy with my progress today, I have a basic Twitter authentication cycle working, and I'm definitely discovering parts of Ruby that make me smile. I don't know if RoR will become my framework of choice going forward, but it definitely seems a strong fit for a lot of my tasks.

Have you seen this bike?

February 11, 2011 By Pete Warden in Uncategorized Leave a comment

Some of my friends have had their bike stolen, by thieves who broke through two locked doors and cut through a chain to get to it. The bike is pretty unique, an A2B electric 'Metro' model which I very rarely see around the city, so if you're near San Francisco and spot one of these in unusual circumstances or for sale, please get in touch. There's some distinctive orange tape on several parts, and they have a serial number to check if it's the same one.

They live near the top of a steep hill near me, in Haight, and use the bike to get to the Caltrain station for a commute to the valley, and it will cost several thousand dollars to replace, so it's a real problem for them. The thieves were pretty brazen, breaking into an occupied apartment building during the night, and didn't take the keys or charger so they'll be lugging it around.

The police are involved, and they'll be scouring for this online too, but since this is such a rotten thing to happen, any help from my readers is much appreciated.

Five short links

February 10, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Chris in Plymouth

Visualizing Large Facebook Friendship Networks – There’s lots of academic work emerging using social network information. What I find really interesting are the techniques people are developing to make sense of the ‘hairball’ that results from a naive approach to plotting the raw networks, since people have so many friend connections.

What the strange persistence of rockets can teach us about innovation – A really fresh way of looking at technological progress, and a reminder that what seems inevitable now is often actually very path-dependent on the past.

Why did economists not spot the crisis? – The compelling answer is “We don’t reward or encourage people to be generalists”. Academic kudos is only available to hedgehogs who know one thing really well, not foxes who bounce around. I think the skills required to be a generalist are undervalued in the technology world too, and that causes very similar problems.

Africa Rules the World – Some commentary on a slick visualization of growth rates around the world. As he says, it’s a bit misleading because a 20% growth rate in a desperately poor country is not that much in absolute terms, but it does show the dynamism of Africa.

Hilary Mason on NPR – “Everything is interesting”. Bit.ly’s chief scientist does a great job explaining the joys and perils of data.

The American Way of Dating

February 7, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Brandon Warren

With the (mostly) shared language, it's easy to for people from the UK to think that America is basically like Britain, apart from the funny accents. I had a little of that attitude when I moved here, but rapidly learned how wrong I was. With Valentine's coming up, I was reminded of one of the best examples of the alienness lurking under the surface; dating. As Kira Cochrane amusingly chronicled in The Guardian, the British standard is "go to a party, down some drinks, make eye contact with a person you fancy, proceed to kissing and often much more, wake up the next morning to find that you have magically become one half of a couple". It seems like the goal was to avoid any unambiguous declarations of interest, so that at any point either person can end the process without the other losing face.

This isn't how it usually works in the US, at least in the mainstream. The formality and rituals surrounding courtship feel like something out of a Noh play. The very idea of actually asking a near-stranger for a date, explicitly and with no particular preamble, in the full knowledge that you may be turned down, seems nothing short of revolutionary compared to the system I grew up with.

Kira ended up avoiding the rules when she was over here, but even she acknowledges there's a need they're filling. Maybe it's because American culture is so varied that the system has to be so explicit about intentions, since people growing up with radically different backgrounds will never be able to communicate using the subtle signs that the British rely on. There's also something refreshingly honest about the whole procedure. A friend was telling me about her travels in Ireland, and being romanced by a hopeful local man. She discovered he was married, with kids, so she asked if it was an open relationship? "Don't be disgusting, woman!" was the reply.

Eighteen Short Links

February 6, 2011 By Pete Warden in Uncategorized 2 Comments

Photo by Laura Thorne

With my book launch, BigDataCamp and Strata, I’ve accumulated a backlog, so here’s five short links, plus 13!

Gluecon – Eric Norlin knows how to put on a great conference on emerging topics, and the world of integrating different web services, APIs and data sources is one that’s close to my heart. I’m looking forward to seeing the tribe that he gathers in Colorado, and if you’re part of it, you should think about taking up this opportunity to demo your application.

Big Data with Ken Krugler – Ken’s off-the-cuff talk on the pre-electronic US Census was one of the highlights of BigDataCamp for me. This covers a lot of the same ground, but in much more depth. O’Reilly folks, you need to pull this guy on board somehow!

Mapfluence Data Catalog – A well-chosen set of geo and demographic data sets from UrbanMapping. It’s all commercial, which I have no objection to at all, but the lack of obvious pricing means you’ll have to invest time in negotiation with them to decide whether it’s for you. An unlabeled graph doesn’t count.

pipe2py – An intriguing open-source project that takes data flows built in Yahoo Pipes, and converts them into pure Python code. There’s also a quick tutorial available describing how to run the results on Google’s AppEngine.

PeopleSearch – A simple but effective hack, using Google’s custom search APIs to find people’s profiles on major services.

$3m Heritage Health Prize – A fantastic idea, using a Netflix-style data competition at Kaggle to research better ways to predict healthcare needs. There’s some questions around how to best preserve anonymity, but this is such an important goal that it’s worth accepting some small risks on the privacy front.

The O’Reilly Stylesheet – I love reading through stylesheets from different publishers. There’s been a few rules in here I’ve struggled to follow, like referring to a company as ‘it’ rather than ‘they’.

GroundCrew – A simple but effective service for organizing volunteers using cell phones.

Walkshed – There’s a lot of promise in visualizing attributes like walkability and accessibility across cities. A lot of these attributes are really hard to understand unless you devote serious time to exploring the neighborhoods, which made it tough to chose a location when I had to move to San Francisco as an outside.

Map of Scientific Collaboration – A beautiful view of the citation networks in research papers, presented geographically. The next step is to make these interactive and explorable.

Chequered Airwaves – How the high-brow Czech language radio stations ceded the battle for minds to the less scrupulous German broadcasters in the run-up to the Second World War. This struck me as relevant when we consider the right approach to ignorant populist diatribes, in the debate I keep having with myself about how sensational to go.

Ruby Geocoder – The most recent version of the original Perl Tiger/Line US geocoder, rewritten in Ruby and able to ingest the latest shapefiles.

Hacking Lottery Scratchcards – There’s a whole world of statistical data hacking out there, revealing information that publishers never believed they could possibly be exposing.

Small Business Innovation Research Grants – There’s a massive world of US government money available to startups. The main drawbacks are the almost overwhelming barriers to getting through the initial paperwork, the pernicious influence of managing to please federal managers instead of real customers, and in this case becoming part of the military-industrial complex.

Where the Ladies At? App – I may not like it, but this is probably the future of location-based services. After all, Facebook basically started as a way to stalk fellow students at Harvard.

How the O’Reilly Animals are Chosen – I still have no idea how I got a bull for my cover, but given my childhood in a farming village I can’t complain.

Strata Interview – I talk about the Data Source Handbook on camera. I wasn’t happy with this one, I should have talked about all the cool maps people are building with OpenHeatMap instead of going off into an abstract ramble.

Europe vs the US on Privacy – There’s a strong tradition in Europe of assigning a higher value than the US to privacy relative to freedom of expression and innovation. There’s going to be an increasing clash over this as more and more data sources merge and reveal increasing amounts of personal-but-public information.

Discover public data with the Data Source Handbook

January 31, 2011 By Pete Warden in Uncategorized Leave a comment

I’m pleased to announce that the Data Source Handbook is now available from O’Reilly. It’s a compact ebook guide to the most useful APIs and bulk data sets I’ve found, packed with examples and advice. These are hand-picked services that I’ve actually spent time using during my own work, and I chose them because they add insights and information to data you’re already likely to be dealing with. You can check out the table of contents below, and I’ve also included a couple of excerpts.

It’s organized by the kind of data that you want to look up information on, from websites to locations, email addresses to ISBNs. There’s a whole new world of free or cheap public data out there, I’ve been having a blast exploring it myself, so I hope you’ll enjoy it as much as I have. A big thanks to everyone who helped me compile this too, from my editors Mike Loukides and Teresa Elsey to all the helpful people on Quora, along with the many friends who emailed me ideas. Keep the suggestions coming, I’ll be working on an updated edition soon.

Websites:

WHOIS
Blekko
bit.ly
Compete
Delicious
BackType
PagePeeker

People by email:

WebFinger
Flickr
Gravatar
Amazon
AIM
Friendfeed
Google Social Graph
MySpace
Github
Rapleaf
Jigsaw

People by name:

WhitePages
LinkedIn
GenderFromName

People by account:

Klout
Qwerly
Search terms
BOSS
Blekko
Bing
Google Custom Search
Wikipedia
Google Suggest
Wolfram Alpha

Locations:

SimpleGeo
Yahoo
Google Geocoding API
CityGrid
Geo-Coder-US
Geodict
GeoNames
US Census
Zillow Neighborhoods
Natural Earth
US National Weather Service
OpenStreetMap
MaxMind

Companies:

CrunchBase
ZoomInfo
Hoovers
Yahoo Finance
IP Addresses
MaxMind
Infochimps

Books, films, music and products:

Amazon
Google Shopping
Google Book Search
Netflix
Yahoo music
Musicbrainz
The Movie DB
Freebase

WHOIS

The whois unix command is still a workhorse, and I’ve found this web service a decent alternative too. You can get the basic registration information for any website. In recent years, some owners have chosen ‘private’ registration which hides their details from view, but in many cases you’ll see a name, address, email and phone number for the person who registered the site. You can also enter numerical IP addresses here and get data on the organization or individual that owns that server.

Unfortunately the terms-of-service of most providers forbid automated gathering and processing of this information, but you can craft links to the Domain Tools site to make it easy for your users to access the information.

<a href="http://whois.domaintools.com/www.google.com">Info for www.google.com</a>

There is a commercial API available through whoisxmlapi.com that offers a JSON interface and bulk downloads, which seems to contradict the terms mentioned in most WHOIS results. It costs $15 per thousand queries. Be careful though, it requires you to send your password as an non-secure URL parameter, so don’t use a valuable one.

curl "http://www.whoisxmlapi.com/whoisserver/WhoisService?\
domainName=oreilly.com&outputFormat=json&userName=<username>&password=<password>"
{"WhoisRecord": {
"createdDate": "26-May-97",
"updatedDate": "26-May-10",
"expiresDate": "25-May-11",
"registrant": {
"city": "Sebastopol",
"state": "California",
"postalCode": "95472",
"country": "United States",
"rawText": "O'Reilly Media, Inc.\u000a1005 Gravenstein Highway North
\u000aSebastopol, California 95472\u000aUnited States\u000a",
"unparsable": "O'Reilly Media, Inc.\u000a1005 Gravenstein Highway North"
},
"administrativeContact": {
"city": "Sebastopol",
...

Blekko

The newest search engine in town, one of Blekko’s selling points is the richness of the data it offers. If you type in a domain name followed by /seo you’ll receive a page of statistics on that URL

They are also very keen on developers accessing their data, so they offer an easy-to-use API through the /json slash tag, which returns a JSON object instead of HTML.

http://blekko.com/?q=cure+for+headaches+/json+/ps=100&auth=<APIKEY>&ft=&p=1

To obtain an API key, email apiauth@blekko.com. Their terms of service are available at https://blekko.com/ws/+/terms, and while they’re somewhat restrictive, they are flexible in practice:

You should note that it prohibits practically all interesting uses of the blekko API. We are not currently issuing formal written authorization to do things prohibited in the agreement, but, if you are well behaved (e.g. not flooding us with queries), and we know your email address (from when you applied for an API auth key, see above), we will have the ability to attempt to contact you and discuss your usage patterns if needed.

Currently, the /seo results aren’t available through the JSON interface, so you have to scrape the HTML to obtain it. There’s a demonstration of that athttps://github.com/petewarden/pagerankgraph.

A fundamental bug in HTML5’s Canvas?

January 30, 2011 By Pete Warden in Uncategorized 2 Comments

I still get requests for an HTML5 implementation of OpenHeatMap, so I guess I've done a terrible job of telling people about the Canvas-based renderer I've had in there since it launched. The confusion comes about because I default to Flash if your browser has it installed, since it's usually faster and there's still one problem with the Canvas implementation that I haven't been able to fix.

If you look at the screenshot above, you'll see pale white lines within the states. Those are boundaries between the internal polygons that they're made of, and in the Flash version they don't show up. The fundamental problem is that if you render two polygons that share an edge, Canvas will show a visible join along that edge, whereas Flash will seamlessly meld the two together, with no difference visible if they're the same color. I've put together a minimal page here to show the issue:

http://web.mailana.com/labs/stitchingbug/

The source, along with a Flash project doing the same thing and producing the expected results, is here:

http://github.com/petewarden/stitchingbug

The fundamental issue is that it's impossible to do any complex polygonal rendering if you can't stitch polygons together without seams. I don't know exactly what Flash's fill rules are, but they produce the correct results, as do the 3D renderers I've used in the past. It's cross-browser, which makes it seem deliberate, so any references to the rules used would also be appreciated. Here's the Canvas code:

    var ctx = canvas.getContext('2d');
    ctx.fillStyle = 'rgb(0,0,0)';

    ctx.beginPath()
    ctx.moveTo(0, 0);
    ctx.lineTo(50.5, 0);
    ctx.lineTo(50.5, 100);
    ctx.lineTo(0, 100);
    ctx.closePath();
    ctx.fill();

    ctx.beginPath()
    ctx.moveTo(50.5, 0);
    ctx.lineTo(100, 0);
    ctx.lineTo(100, 100);
    ctx.lineTo(50.5, 100);
    ctx.closePath();
    ctx.fill();

Anybody have any insights on this? I'd love to deprecate the Flash version, but I need to understand what's going on here, and I'm at a dead end. I'd love to hear there's something obvious I'm doing wrong.

Update – Thanks for the suggestions. I purposely simplified the example to avoid alpha, but that's the issue that most of the 'stroke()' approaches I'd already tried hit. I've included code to demo that below. I'm really not trying to be a jerk about this, honestly I'd love to know that I'm an idiot, as long as I find a solution. The public humiliation will be worth it, I swear.

        ctx.fillStyle = 'rgba(0,0,0,0.5)';
        ctx.strokeStyle = 'rgba(0,0,0,0.5)'
        ctx.beginPath()
        ctx.moveTo(0, 0);
        ctx.lineTo(50.5, 0);
        ctx.lineTo(50.5, 100);
        ctx.lineTo(0, 100);
        ctx.closePath();
        ctx.fill();
        ctx.stroke();
        ctx.beginPath()
        ctx.moveTo(50.5, 0);
        ctx.lineTo(100, 0);
        ctx.lineTo(100, 100);
        ctx.lineTo(50.5, 100);
        ctx.closePath();
        ctx.fill();
        ctx.stroke();

Trunk.ly and Egypt

January 29, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Al Jazeera

We have to be careful not to project our own obsessions onto the rest of the world. As one Egyptian commentator said; "it's not about you". Technology has clearly played a role in the protests, but I'm betting that cell phones and television are a lot more influential than social networks.

Still, it's been amazing to watch how Twitter has been used to spread the word, especially to the outside world. The only trouble is that there's so many links and comments, it's almost impossible to follow any one topic amongst the volume of messages. That's where the Trunk.ly curation service comes in.

As I was chatting to the founders Tim and Alex yesterday, they pointed me at one of their users, ExiledSurfer. Since the protests began he's been collecting a massive number of links and reports from inside the country and relaying them through his Twitter stream. When people get in touch asking for information, he points them at his trunk.ly stream that give his links "with no noise". With ability to search and categorize by tags, and a simple view showing just the links and snippets of information about them, it's a much better way of documenting the material than just Twitter.

With over 5,000 links in his archive, he's turned his messages from an ephemeral stream into something more like a library of reference material, all without having to do anything more than use Twitter the way he always has. This is a case where technology has enabled a single person to become vastly more effective as a news broker than we could have imagined just a few years ago.

I've been watching the progress of Trunk.ly since before it was a glimmer in Tim's eye, thanks to his wonderful weekly blog chronicling their startup progress as it happens. With a site that's headed into the top 20,000 worldwide according to Alexa, and over seven million links collected, it feels like there's a lot of people who like their curation model. I'm looking forward to seeing how it helps us all organize our knowledge, and maybe play a small part in spreading the word in situations like Egypt.

OpenHeatMap now supports Hong Kong and Canadian constituencies

January 29, 2011 By Pete Warden in Uncategorized Leave a comment

Thanks to some very helpful folks at the University of Hong Kong, I've been able to add support for parliamentary constituencies in both Hong Kong and Canada. Like all OpenHeatMap areas, all you need to do is list the names of the constituencies and the values you want to display in a CSV file, Excel spreadsheet, or a Google document. Upload it, choose your display options, and you can create your own interactive political maps.

For more information and example files to download, go to the documentation sections on Hong Kong or Canada. I look forward to seeing what you build with these, so let me know how you get on.

	Moonshine Voice v2 v… on Announcing Moonshine Voice
	Pete Warden on Launching a free, open-source,…
	riddelln on Launching a free, open-source,…
	I see dead people. Y… on Announcing Moonshine Voice
	Pete Warden: Announc… on Announcing Moonshine Voice

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Hello Social World

Ruby on Rails first impressions

Have you seen this bike?

Five short links

The American Way of Dating

Eighteen Short Links

Discover public data with the Data Source Handbook

WHOIS

Blekko

A fundamental bug in HTML5’s Canvas?

Trunk.ly and Egypt

OpenHeatMap now supports Hong Kong and Canadian constituencies