Five short links

Streetbarchart
Photo by Broken Simulcra

Email Data Source – These guys had a cunning idea – listen in to commercial mailing lists by subscribing to them. They then analyze all of the data they gather to build a detailed picture of different industries and companies email marketing. It surprised me at first, but a lot of the companies I talk find their email lists are their most effective marketing channels despite their distinct lack of trendiness, so I'm pleased to see someone innovating around them.

Brien Lane, Melbourne – This Australian alley has been covered with charts representing real demographic information from the area. I love seeing visualization like this out in the real world, it makes me want to visit. Here's some more photos.

Clue is a renewable resource – This reminds me so much of my experiences at Apple. I spent over a year battling their legal department to honor an agreement we'd made when I joined, to allow me to just fix bugs in the same open-source project that had got me hired. A good friend spent a lot longer trying to get them to sign off on an Objective C mode he'd built for Emacs, and as far as I know still hasn't succeeded in releasing that simple config into the wild with the company's blessing. And Apple is actually one of the good guys when it comes to open-source, so I can only imagine what some other places must be like.

Chartbeat for the ChatRoulette site – I've been using Chartbeat on one of my own sites recently, but actually seeing it running on a site with serious numbers of visitors makes its power a lot clearer.

Official Seattle crime map – While it's nowhere near as slick as others like the San Francisco Crimespotting map, I'm impressed to see a city government produce one of these for themselves. Hopefully more official bodies will see the advantages of making data available in an easy-to-use form like this.

Five short links

Congoportrait

Portraits from the Congo at 50 – An astonishing collection of photos showing people living in the DR of Congo, together with short stories talking about their lives. Anyone who’s read In the Footsteps of Mr Kurtz will understand what a hell-on-earth the Congo has been for the last hundred years, but the tenacity of people determined to keep living their lives is amazing.

Conspiratorial Thinking – The best explanation I’ve seen of why otherwise-smart people can go spectacularly wrong when they only have a superficial understanding of a domain. The other side of any argument rarely consists of idiots and crazy people, so when I find myself asking “how could they be so dumb?”, it’s usually a sign I’m missing something important.

Mountain Lion Kittens in the Santa Monica Mountains – Liz was lucky enough to see the back end of a lion disappearing down a trail when we lived in LA. I never saw one myself, but always felt amazed to be living in a place so wild it still had them roaming free.

Data sets for data mining – A good list of high-quality sources of large data sets

Goin’ down that road feeling bad – At the start of this song Woody Guthrie talks about its creator, how “he wrote this song… or got it started”. The dominant model of the 20th century was the ‘auteur theory’, trying to find a single person to focus on as the sole driving force behind any project, but I felt the way Woody phrased it there captures a lot more of the reality of the creative process. Everything worthwhile I’ve been involved in has taken both a crazy person to start things rolling and a lot of people to join in and actually build it. I feel a post about “folk coding” coming on, it feels like the open source world has a lot in common with the way traditional music was passed around and improved.

Five short links

Fivehands
Photo by Search Engine People

Informed Consent in Information Technology – An awesome PhD thesis on the problems with those ridiculous license agreements we all click through without reading, and even better with some practical suggestions on how to fix those problems. Apparently Catherine's now looking for more funding to continue her work – am I allowed to dream that Apple or Microsoft might want to bring her on board to fix their EULAs?

TravellerMap – I was never quite cool enough to play the Traveller role-playing game back in the 80's, but they built a fascinating background universe. I stumbled across this site by accident, but the author has built a beautifully detailed interactive map for exploring the whole galaxy, and I'm in awe of this as a labor of love.

Analysis of the 'Flash Crash' – I've always been hooked on odd events, and May's sudden stock-market drop and recovery is one of the oddest I've come across. I don't have enough financial world chops to understand everything in this paper, but it's a detailed technical post-mortem of what actually happened.

Wikiposit – Another rich collection of public data sets, mostly financial, with the site code released under the GPL

Swarm Light – This art installation sends shivers down my spine every time I watch it, and it's a technical masterpiece too, using hundreds of CPUs to control the lights. Make sure you go to 1'30'' in the video, that's where it really starts to take off.

Don’t shave that yak – God loves lazy programmers

Yakshaving
Photo by Liminal Mike

I just wasted four days of my life on something completely worthless. It started off innocently enough, I wanted to take a cloud of data points and display them as a nice heatmap.

The Story

Hmmm, sounds just like the mesh creation I used to do when I was back in games, so let's dust off that Computational Geometry textbook and write a Delaunay Triangulator. Sweet, there's even an example in Javascript I can adapt to Actionscript. Awesome, it all works on my test data. Oh, it's O(n^2) in complexity, so it doesn't work so well on my larger test of 1,000 points, and it will take the lifetime of the universe to process the 30,000 points I need it to.

No worries, I'm a clever chap, I'll dig through the literature and find a better algorithm. They all seem to be either too complex to implement easily or don't have the performance characteristics I need. I don't need a strict Delaunay arrangement, so I should be able to brew up my own divide-and-conquer version that uses the exact algorithm on sub-sets of the points and then stitches them back together with Delaunay-like strips.

Huh, looks like there's a bug in the convex hull creation code I wrote. And another. And another. Arrrggh! I need a better way of visualizing what's going on, so I'll build a canvas-based web page that lets me view the output of my algorithm. And then I need to…

What? It's Saturday afternoon? Where did my week go! And why am I still banging my head against this code? What was I trying to do again? Oh yes, display a heatmap of these points. So why have I been debugging convex hull merging code for the last two days? There must be a simpler way.

<… two hours pass …>

There we are, I just adapted some point blob rendering code I was already using, so I don't have to worry about triangulating a massive cloud of points, I just throw blobs at the screen and build up the image. Works great. Now I just have to write a blog post to remind myself once again – *Never Shave a Yak!*

The Lesson

Yak shaving is a term I first ran across in the Jargon File, and it stuck in my head because it's so common and so dangerous in programming. It's when you're working on a task because you need to do it to get something else done, which you're doing to complete another job, and so on up a long dependency chain towards your real goal. This happens a lot when you're coding, and just like my story each step is very logical but you ultimately end up wasting massive amounts of time on something that has very little effect on what you really want to do.

I swear that the biggest reason I'm a more effective programmer now than when I was 20 is that I'm better at spotting when I'm shaving a yak, and finding another way. The biggest clue is when I'm working too hard. I went into a serious deep dive for the last few days, staying up late, skipping dog walks and getting way behind on my emails (sorry to anyone waiting for a reply!). If I'm making progress on something core this sort of crunch time is actually an energizing process as long as I don't keep it up too long. In this case I was feeling frustrated, and looking back it was largely because it was a peripheral task that at some level I knew didn't have to be solved.

There's already lots of other reasons to embrace lazy programming. Fewer lines of code means fewer bugs. The best route to an easy life is writing solid code that doesn't require constant maintenance, and is documented enough so people don't bug you with questions. Harness your inner laziness to spot yak shaving too, and find a simpler way when you're spinning your wheels on a peripheral task.

Five short links

Sinclairc5
Photo by Grant Mitchell

Delegate co-memberships – A network map showing which groups Republican and Democratic convention delegates belong to and how large ones like the NRA and the Sierra Club are connected to each other

A short note on random load balancing – Interesting algorithm that has a lot of the advantages of doing completely random assignments of tasks to buckets, but with a lot less variance in the work assigned to each bucket

Politicosphere – A network map of political news sites. I’m impressed by the presentation, it’s actually a pretty useful way of exploring the political blogosphere

Myths and Fallacies of “Personally Identifiable Information” – Another great post by Arvind, this time dissecting why “PII” is not a very helpful concept, since it encourages people in the mindset that a simple “anonymization” of obvious identifiers is enough to safeguard people’s identity

Tell them no, just never use that word – The hardest part of being an engineer is bridging the gap between users’ expectations and the limits imposed by technology. At the start of my career if I was asked for something impossible I’d say no and explain why. It took me a while to learn to shut up and let them explain what they wanted in more detail and think of creative ways of satisfying their underlying requirements

Five short links

Fivespot
Photo by Ken-ichi

Timetric – A large collection of data sets, complete with online tools to chart and analyze them via Pete Forde

Jsonduit – These JSON streams could be really powerful for building mashups, since they bypass the same-origin policy that makes combining data so hard via Pete Forde

USA Election Atlas – The presentation is a bit old-school, but you can find almost anything you'd want to know about past and present American election results

SimpleDB essentials – Hard-won wisdom on getting the most out of SimpleDB, from Sid Anand who's using it heavily at Netflix. My own experiments with it stalled because it was so hard to upload large amounts of data reliably. My code is available for anyone who wants to pick up where I left off

TrendsMap – An intriguing geographical prototype, showing Twitter trending topics on a map via Régis Gaidot