The View from Your Window Globe

November 22, 2011 By Pete Warden in Uncategorized Leave a comment

I love Andrew Sullivan's 'View from Your Window' feature. Readers from around the world send in their favorite shots, and over weeks and months you start to see a picture of the whole world emerging. Unlike the usual news-driven photography, these are all quiet, subtle shots without any commentary and few people. Individually they're not striking, but together they become something magical.

Andrew's already published a book of the best photos, but I've always wanted a more dynamic way to explore the hundreds of images. Last year I created an OpenHeatMap showing the locations, but VFYW always makes me imagine a day in the life of the planet, so I kept trying to imagine better ways to share that vision. The recent rise of WebGL meant I could finally build something truly interactive, a 3D globe showing the world taking photos as the day progresses:

http://vfyw.jetpac.com

It does use the latest HTML5 features so you'll need a recent version of Chrome or Firefox before you can use it. I have contacted Chris Bodenner at the Daily Dish to make sure they're happy with this new view of their content, but I'm not affiliated with them in any way, just a fan! A big thanks to the Google folks for their WebGL sample code too, MrDoob for his fantastic Three.js framework, and JHT for the earth textures.

Why you need a minimum-viable profiler

November 21, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Antonio Rodriguez

Chances are that you'll hit performance problems on any non-trivial project, so you'll need some kind of profiling. Scripting languages have poor profiling support, with intrusive tools that require fiddly setup and lack strong interfaces. This is understandable, raw performance is much less of a priority when you control the hardware the code runs on and can scale horizontally, but the lack of casual profiling can really slow down development.

As a result I've ended up rolling my own painfully simple profilers when I'm working in PHP, Python or Ruby. The beauty of this minimal approach is that you don't have to spend time setting up the big guns like xdebug or perftools, most issues are obvious from a surface inspection and there's no external dependencies to juggle. I wrap the top-level modules of my code with profiler calls, and then output the results to stderr once the script's done. There's usually only five or six timings, but it's enough to answer most questions about 'why is this taking forever?'. If not I can dive into the offending function and manually add more detailed timing output. The key is that it has to be lightweight and easy to deploy (meaning no external dependencies) or it loses a lot of its value.

As an example I've open-sourced the Ruby version I'm using on Jetpac's data pipeline:

https://github.com/petewarden/minimalprofiler

You bracket your whole script with MinimalProfiler.start/stop('MAIN') calls, and then wrap any significant high-level functions with MinimalProfiler.start/stop('Some function'). This is obviously a lot more manual and error-prone than a more automatic approach, but in practice it's not hard to maintain, and when a module becomes a hotspot you can add more calls within the function. At the end of the script it writes out a summary of where the time went:

Total time 3.001541 seconds

33% – Something else (1.001017 seconds)

66% – Doing something (2.000449 seconds)

I'm not suggesting my particular implementation is better than a full profiler, but it's way better than no profiler at all, which is the status quo for web development. It's only a single page of code, so if it's not to your taste rewriting it for your own needs shouldn't be a problem. Just make sure it's easy to deploy and use, and you'll be amazed at how much time you'll save tracking down performance issues.

Five short links

November 20, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Omnos

ASTER GDEM – It turns out there's more than one global set of elevation data! Thanks to Matthieu Molinier for pointing out this alternative to SRTM3 that has better coverage on steep terrain and high latitudes.

Frontend view generation with Hadoop – Anyone who's built big data pipelines has to confront the problem of how to efficiently output the results. If the output data is small, doing a normal load from CSV into a database, or even running dynamic insertion calls can be fast enough, but as soon as it's something larger (like a search index) writing out the results will be a bottleneck for the whole process. I first ran across a pattern to tackle this at Backtype; writing out binary BerkeleyDB database files directly to disk from the final reducer stage and then just hot-swapping them in so they're available to the front end. This post from Datasalt looks at some other ways of doing the same thing with different technologies, including Voldemort and SOLR. I'd never seen SOLR used as just a distributed key/value store, it feels a bit like using Concorde for crop-spraying, but they seem to have had luck with it for their application.

Vizify – A clean dashboard of statistics and visualizations of your Twitter activity (here's my profile). They have some fun with an Angry Birds clone, with a cunning hook asking you to tweet your high score.

Topsy Analytics – I knew these guys had been doing some fascinating backend real-time search work, but I didn't realize they exposed analytics too. We'll actually be moving into their building in a couple of weeks, to cope with the growing team, so I look forward to geeking out with them.

A simple explanation of Benford's Law – I don't find it quite as simple as they hope, but it is an approachable but rigorous look at one of the most fascinating statistical hacks around. I'm just worried that by popularizing it, fraudsters will wise up and we'll lose one of the easiest ways to spot dodgy numbers!

Ten terrible captions – Why your friends hate your kid photos

November 19, 2011 By Pete Warden in Uncategorized 1 Comment

Photo by Jesse Menn

On average your friends share 200,000 Facebook photos with you and my job at Jetpac is to turn them into beautiful slideshows. To do that I've had users manually rate over a million friends' pictures so I can train algorithms to automatically identify the good ones. What I found interesting is that the worst photos are obviously aimed at a narrow audience, like your family or colleagues:

#1 – Mommy – 3.26%

#2 – Graduation – 3.85%

#3 – Daddy – 4.33%

#4 – Reunion – 5.36%

#5 – CEO – 6.03%

#6 – Social – 6.08%

#7 – Grandpa – 7.61%

#8 – Cousins – 7.72%

#9 – Hyatt – 8.65%

#10 – Niece – 9.30%

The percentage is how many photos with those words in the caption were rated as 'good' by our testers

The real lesson here isn't that we shouldn't take pictures of our kids, it's that we need a better way to target our pictures to the people who care about them. Our co-workers are the audience for the CEO getting blitzed at the Hyatt bar but it's hard to share with only them. Facebook's working on lists and finer-grained access controls, and Google's built the whole of Plus around circles, but none of them are very easy to use. Until we have something better, our social streams will be full of pictures we don't want to see.

Is Michigan more beautiful than Italy?

November 17, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Rachel Kramer

I can now officially pronounce Michigan as the fifth most beautiful place in the world!

With the launch of Jetpac, my big data science job is identifying the photos you'll find most inspiring. I've been exploring the 50 million captions you've shared with us so far, trying to identify patterns, and it really is the most fun part of my day! There's so many surprises hidden in the data, but one of the biggest came when I calculated the places where people were most likely to use the word 'beautiful' in their captions:

#1 Sedona, 10.8x

#2 Cabo San Lucas, 9.7x

#3 Lake Victoria, 6.6x

#4 Amarillo, TX, 6.2x

#5 Michigan's Upper Peninsula, 6.0x

#6 Algarve, Portugal, 5.9x

#7 Montevideo, Uruguay, 5.9x

#8 Bath, UK, 5.9x

#9 Florence, Italy, 5.8x

#10 Hood River Valley, Oregon, 5.2x

A lot of those made perfect sense, who doesn't love Sedona or Lake Victoria, but I had to triple-check my calculations when Michigan showed up! How did the world center of trashed building photography end up in fifth place above Florence?!

As I looked through my friends photos on Facebook and the public ones on Flickr it all started to make a lot more sense. The area around Lake Michigan and the Upper Peninsula in particular are full of stunning scenes, with the storms, massive skys and cliffs producing amazing shots.

Photo by Kevin Dooley

So now the results started to make a lot more sense. The numbers don't lie! Now I just need to see if I can get a free vacation from the Michigan Tourist Commision, I'm actually itching to check it out after being sucked in to all the photos I've had to check out. I've discovered somewhere new in the world that I'm dying to visit, even though I'd never have thought of going there in a million years.

If you're as interested in playing with the data as I am, I've also put up a tool where you can find the distribution of any words that show up a lot in our 50 million photos, including beautiful, I'd love to hear what patterns you find.

About that awesome video

November 16, 2011 By Pete Warden in Uncategorized Leave a comment

No, not my one from yesterday, the product demo we have on the front page of Jetpac! I wasn't sure we should be spending time and effort on something that seemed non-critical, but it turned out to be incredibly useful in explaining what we're doing and persuading users to try us out. I spent years working on high-end video software, so I tend to be very critical of production work, but the producer Mike Kaney did an amazing job, even on very tricky elements like the reflections. What's even more impressive is that he achieved all this on a startup budget! If you're interested in getting something made for one of your own projects, check out his Rockbridge production company and tell him we sent you.

I have to mention the star performance by Jetpac's very own mad marketing genius Stephanie Southerland. Despite no background in acting, she's apparently a natural performer, even on top of a building in her swimwear on a freezing-cold November day.

How to persuade users to sign up with Facebook Connect

November 16, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Poppy Thomas-Hill

A friend told me that she was encountering a lot of people who liked the idea of her new service, but were put off by the idea of using Facebook to connect. She asked me how we tackled this issue for Jetpac, so I thought I'd put together a quick summary of what we've found to help her, and anyone else who's struggling.

In my experience there are three kinds of potential users, who all require different approaches:

Facebook Negatives

Especially in the tech world there's a small but real group of people who either don't have Facebook accounts at all, or who use it minimally. I'm not a heavy Facebook user myself, and I understand a lot of their reasons, and so I just try to make it clear that we plan to support other services like Flickr and Instagram in the future, and leave them in peace.

App Enthusiasts

At the other end of the spectrum, there's a good number of people who don't have any concerns about adding new applications. This population is slowly shrinking, thanks to the feedback from friends who are annoyed when they get spammed, but they're still out there. That means if you can't persuade anyone to sign up for your service, then you're doing really badly and should triple-check your messaging.

Persuadable Skeptics

The biggest group are those who may be willing to connect, but want some reassurance before they do. Here's the things we've found help convince them:

Clear message – The name, tagline and copy on the website and in any ads need to make it very clear why Facebook is needed. People are very wary of signing up if they don't understand why your site needs access to their social network. A friend even suggested putting in an extra dialog before the external permissions page, spelling out exactly what the benefits of connecting are, and why it's necessary for your service, which we hope to try out soon.

Minimal permissions – Cut down the number of different permissions you're asking for to the absolute minimum. As an example, we originally asked for feed posting permissions just so we could easily support an in-app method for users to comment on their friends photos, but experience of spammy applications who abuse that power made many of our early users refuse. We reworked the feature so that we use a Facebook widget for commenting instead, so we didn't have to ask for posting rights, and our conversion rate went way up.

High production values – It sounds superficial, but a professional design for your site is essential. People are looking for any clues about your trustworthiness, and the fact that you've put a lot of effort into the look of your site reassures them that you aren't a scam. Sometimes judging a book by its cover is a useful heuristic.

Personal touch – High production values don't mean adopting a distant, corporate voice, that's guaranteed to put people off. Most likely you're a small team of enthusiasts like us, so use that as a strength, put yourselves front and center. Seeing that the team is proud of what they've built and willing to stand behind it makes a world of difference to wavering users. It's also a helpful culture-building tool internally, if the team knows they're putting their own reputations on the line they will be extra-careful about protecting user information.

The data geekery behind Jetpac

November 15, 2011 By Pete Warden in Uncategorized Leave a comment

My new startup has just gone public, and I wanted to talk a bit about the data geekery behind the consumer experience, and some of the technical and privacy challenges, so I threw together a quick video cast. You can get more information on the project over on the company blog, and by following us on Twitter. I'm pretty excited to finally be able to talk about what I've been working on for the last six months!

Don't forget too check out my new visualization too, an interactive word map of 50 million photo captions:

Five short links

November 14, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Michael Donovan

Place graphs are the new social graphs – Fascinating work by Matt Biddulph, looking for geographic analogies (for example tell me the neighborhood in New York that's most similar to Noe Valley in San Francisco).

Yet another government portal to ignore – Though they're a massive step forward conceptually, most government open data efforts are crippled by terrible usability. I still find myself digging through the FTP server for the US Census, after failing to navigate their web interfaces.

Angels of the Right – There's been a lot of attempts to produce graphs showing networks of influence, but this is by far the most approachable and informative I've seen. It's actually useful for discovering things! Even better, Skye's helped package the code behind it into an open-source framework called nodeviz.

Data Illumination – An intriguing new data blog that's just started. Not much content yet, but I like what's there so far. I'm guessing the more readers and commenters appear, the more likely we'll keep getting more.

Shuttle Radar Topography Mission – Free elevation data for the entire world, with samples as close as 30m in the US, and 90m for the rest.

Communists in Space, and now on the Kindle

November 11, 2011 By Pete Warden in Uncategorized Leave a comment

Picture by Joseph Morris

When I was eight years old, I found a book in my brother's room about nuclear war. In it was a map showing the likely British targets of a Soviet nuclear strike as circles. I grew up in East Anglia, surrounded by American air bases, so everywhere for miles around was such a solid mass that you couldn't even see the individual dots. This so terrified me that I made excuses for years to avoid going into the nearby city of Cambridge, I had such a vivid picture in my head of roasting alive as the air caught fire.

A two-week school visit to Russia just before the fall of the USSR gave a glimpse of the grim and tawdry reality of the Soviet system (brown fruit juice, anyone?) but the idea of communists as terrifying bogeymen has never really left me. I've had a strange fascination, an impulse to understand how people ended up in such a twisted state, that's led me to read up on the early Soviet era, especially Stalin's particularly demonic rule. As I've got older I've also tried to understand what drove well-intentioned people to support terrible actions, and the humanistic resistance of others like George Orwell.

That all left me a prime audience for Ken Macleod's Fall Revolution series. I first came across Star Fraction by accident, but was immediately captured by a very British near future, inhabited by people I recognized. Trotskyite militants battle the Animal Liberation Front, a quasi-Richard-Dawkins summons familiars to attack enemies from his Seastead, and a combined UN/US 'peacekeeping force' has suffered the ultimate mission creep and runs the world from its space weapons platforms. Running through the book is a Communist conspiracy theory that blows the tired Templar myths out of the water because it's based on historical templates that actually happened. Communists truly ran effective underground organizations for decades and otherthrew governments, so for someone with Macleod's knowledge of the movements (here's his take on Orwell in context) there's rich material to choose from.

In case this sounds too stuffy, it's fundamentally an adventure story that has pleasant echoes of Neuromancer, it's not heavy reading. The only thing that has surprised me is how little attention it ever received, people seem far more focused on later books like his Cosmonaut Keep series. Star Fraction was one of those novels that stuck in my head, and since my paper copy is still in storage in the UK, I've been hoping for an ebook version so I could justify buying it again. When I saw Ken announce that one of his more recent books had just been released electronically, I went back to search for a copy of Star Fraction and finally found one for the Kindle, bundled with The Stone Canal as Fractions: The First Half of the Fall Revolution. I'm now a few chapters in and it's every bit as good as I remember, popping with wild ideas and a refreshingly different angle on the world.

Since I didn't see the news appear on Ken's blog, and he didn't know about it when I hassled him on Twitter a few months ago, consider this a public service announcement: Star Fraction is available as an ebook! If you find the idea of Communist Conspiracies in Space at all intriguing, buy it now, you won't be sorry.

	Moonshine Voice v2 v… on Announcing Moonshine Voice
	Pete Warden on Launching a free, open-source,…
	riddelln on Launching a free, open-source,…
	I see dead people. Y… on Announcing Moonshine Voice
	Pete Warden: Announc… on Announcing Moonshine Voice

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

The View from Your Window Globe

Why you need a minimum-viable profiler

Five short links

Ten terrible captions – Why your friends hate your kid photos

Is Michigan more beautiful than Italy?

About that awesome video

How to persuade users to sign up with Facebook Connect

The data geekery behind Jetpac

Five short links

Communists in Space, and now on the Kindle