Foo Short Links

Foocamp
Photo by Kay Thaney

Networks of book makers in late Medieval EnglandAlex Gillespie's talk on medieval manuscripts was eye-opening in a lot of ways. I never realized that you could get cheap books before printing arrived, on demand from local scribes. The impact of the technology wasn't so much due to the price, as the fact that mass production made books far more plentiful than ever before, with a much more centralized distribution model.

I also loved discovering Girdle Books, bound with their own little raincoats so they could hang from the owner's belt. There was a lot of discussion about what the move to ebooks could learn from these sort of historical examples, with Alex riffing on the idea of texts as souls that inhabit physical bodies, and how creepy that makes electronic readers, as the virtual books flitting in and out of them seem like body snatchers.

I was struggling to make a point about how Girdle Books were an ostentatious way of using the written word to connect socially, and how that was a real loss with ebooks, but it came off sounding like I just missed showing off to girls on the bus. In fact I just want to show off to people on the internet, or to be less flippant I think we'll really miss that process of discovery if we no longer see books in people's hands, on coffee tables, or in their bookcases.

Google Consumer Surveys – This was a really unusual Foo talk, it was almost a pure product pitch, but I was really glad I attended because it's an incredibly useful product I'd never heard of. You design a survey, Google charges you 10 cents for each person who answers, and they handle getting a statistically representative group of 1,000+ users over the course of a day or two. For startups, this is a fast and cheap way of testing ideas, like the old hack of creating a Google Ad with your value proposition and seeing who clicks, but on steroids.

What I also liked is that they're providing a new revenue source to the newspaper industry. The questions appear on local news sites as an alternative to registering to read a full story, and pull in a lot more money for the publishers than regular ads.

Boxie the story-gathering robot – Taking a lesson from little sisters everywhere, the team at MIT set out to use raw cuteness to get other people to assist Boxie in its mission. I, for one, welcome our new robot overlords if they're all so adorable. The talk (with terrible audio unfortunately) is here.

Inside Etsy's gambit to hire more female programmers – I've been a long-time fan of Marc Hedlund, but I hadn't run across his initiative at Etsy to hire more engineers who were women. The results show how effective publicity can be, with over 600 female applicants for 40 slots on the Hacker School program, and demonstrate that there are effective ways to recruit from a wider pool. I'm going to take inspiration for my own hiring process (email me!).

Legal Entity IdentifiersDan Goroff described how the financial world is trying to illuminate financial risks by assigning LEI's to all corporations. My deep fear is that this is the wrong approach, it would make the analysis easier if the data was perfect but the true problem is robustly identifying entities in the first place. I'd rather have fuzzy, redundant identifiers like company names, addresses, account numbers, etc, and use them to build a relationship graph. Instead I worry all the time will go into building the id scheme, and we'll never get the financial relationship data that is the real gold-dust. I'm not doing the topic justice here, I need to do a longer post, but it was a great presentation on a crucial debate I wasn't even aware of.

Five short links

Origamifive
Photo by Goran Konjevod

Juju at scale – Spinning up 2,000 EC2 instances automagically using Ubuntu's cloud tools. The thought of all that raw power at your fingertips is amazing, and unthinkable for anyone outside of a major corporation just a few years ago. We live in an age of wonders.

The TTY demystified – I never cease to learn new things about the Unix stack I rely on. This history is a good lesson in how tricky and changing requirements can be managed with good engineering, but end up freezing handling for decades-obsolete hardware in code we'll likely be using for centuries to come.

Men invented the internet – We're entering a very weird time, as computing becomes a higher-status profession it feels like women are even less welcome.

Cloudian – Layering an S3 API on top of a Cassandra hosted service. Amazon's cloud service interfaces have become a de-facto standard, and other providers should adopt them. They feel like the system calls of a distributed OS.

Tigerweb – An online viewer of the latest version of the US Census's geo data. It's a great way to explore the completely free and open data on boundaries, natural features, and especially roads that the government makes available. If only it didn't rely on Silverlight!

 

We need a growth hacker

Sprout
Photo by Ariari

Jetpac's been a success, with a flood of five star ratings on the App Store, great reviews, and most importantly users are spending a long time on the product! We know people love it, so now we're gearing up to take it to the next level, getting the app into more people's hands.

This is so important to us that the next key member of our team will be a growth hacker. Funnily enough, we'd already drawn up the job description but were struggling for a title when we came across Andrew Chen's article. I know several of the people on his list, and we've been inspired by the success of their approaches, so his description seems perfect for what we need.

We're looking for an existing expert growth hacker or someone with the aptitude for it. You'll get to join a funded team of successful entrepreneurs early enough to make a big difference, and have a big stake in our success. If you're interested email me at growthhacker@jetpac.com, and please send this on to anyone you know who might be keen.

Five short links

Fiveshortgraybles
Picture by Fred Seibert

I've got Eurosong fever, Ted – I can't describe how much I love this data analysis, big thanks to Anthony Goldbloom for pointing me to it. It's both a wonderful example of the insights you can obtain into the real world from surprising data sets, and an excuse to enjoy the delights of Ruslana and Verka Seduchka.

Setun – An experimental computer from Russian in the 60's, built on ternary logic instead of binary (here's a Wikipedia summary). It's not that long ago that we were arguing over things that we take for granted now, I wonder if we'll have to revisit those assumptions as we keep innovating?

Green Marl – On the subject of revisiting assumptions, MapReduce isn't the only way to run distributed algorithms, and I've been trying to wrap my head around projects like this graph analysis framework from the Stanford Pervasive Parallel Computing team.

The role of intuition in business – Metrics are like a compass leading you to a local maximum, the human part of all our jobs is knowing when to make a big leap to get to a whole new surface.

Cassandra compression is like getting more servers for free! – I'm itching to try this on our cluster, once I upgrade to a newer Cassandra version.

Five short links

Fivefish
Photo by Linda Cronin

Probabilistic data structures for web analytics and data mining – A lot of the time we're processing massive amounts of data and producing very detailed intermediate results, only to throw almost all that detail away because all we want is a much smaller summary of the data's properties. I've got a lot of mileage out of approaches like this that cut out the middleman and produce much more manageable intermediates by throwing away parts of the data early when they are unlikely to be significant.

Twenty-four hours in a VC's life – Professional investors theoretical economic role is as discoverers of new information, finding opportunities for deploying other people's savings in useful ways. It's interesting to see some of that information transfer in action.

Haptic labs soft maps – I would have loved one of these quilt maps as a kid.

TopCoder – Using the competition model to drive software development.

American healthcare fraud and scalable investigative reporting – I'm always excited to see the new data techniques being used for more than ad targeting, and journalism is one of the most promising areas where they can make a difference.

Five short links

Fivepuppies
Photo by Pinké

Brain of Mat Kelcey – Mat's been doing some interesting work with Common Crawl, and his blog is a must-read for anyone interested in extracting data from unstructured text.

Google releases natural language dictionaries – Based around Wikipedia page titles as their list of concepts, Google Research have released a really interesting resource, a bit like a thesaurus for machines. Even better, it's available under a liberal CC BY license, so there should be no problem using it in any sort of project.

Dumb like me – A scary story for anyone who makes their living with their mind, from my friend Russ Jurney; "Smart people, like the very attractive, get special treatment they do not know they are getting". Despite all our techno-utopianism, we're still reliant on a fallible hardware platform made of meat.

NSA's security guide to iOS 5 – A wealth of detailed practical information on securing Apple mobile devices.

Things you might not know about jQuery – A good refresher on some of the less obvious cool features of the framework. I've been using .data() extensively on Jetpac.

Want a magic wand?

RedRealmWand-560
I've been collaborated on and off with Nicholas Napp for years, including on a National Science Foundation grant for computer vision on mobile devices. He's extremely experienced in the world of traditional toys, as well as video games, and he's had a compelling dream of tying together motion sensors, smart phones, and a rich real-world game system to produce something magic. The gameplay uses precise location tracking and gestural control through a wand to give you an interface to cast spells that hurt or help other combatants in fights, or help you progress through adventures in other ways.

I love this sort of overlay of virtual layers on top of the physical world, and I believe Nick and his very talented creative collaborator Kevin Mowrer can execute on their vision. They're now on Kickstarter raising money to get started, so if you're intrigued, go check it out. I've donated $250 myself, and I can't wait to get my hands on an early wand.

Five short links

Fivedwarves
Photo by Randy Robertson

Fancy ML techniques don't matter much – "The reason I don’t like Kaggle is that it’s all about squeezing more juice out of existing data." There's a lot of hard-earned wisdom in this post, but I think he's over-estimating the professional world's familiarity with machine learning techniques, and underestimating how hard they are to acquire. I love Kaggle because it allows me to outsource a whole lot of work that requires very specialized skills, so I don't have to support a full-time ML engineer, and I don't want to spend the time and resources I'd need to train an existing team-member to be good at it when we'll only use it ocassionally.

Are your cookies colluding? – The Mozilla folks have released a plugin showing how ad networks are connected, with a network graph visualization that actually seems useful, rather than just being pretty.

An interactive map of the Roman Empire – Calculates the travel time and cost for journeys in the ancient world. Tools like these bring back a perspective that anyone used to modern transport has lost, especially around the crucial power of the sea as much cheaper and faster than land for travel. I was first struck by their power when I ran across time-based maps like this for the medieval world, showing how much more connected coastal settlements were to fishing villages in other countries than to inland towns in their own, and helped me understand how England held on to Dunkirk for so long!

Image Vision Labs – Offers advanced image-processing algorithms as a service. We seem to be locked in an escalating arms race between users determined to upload pictures of their genitals, and platforms determined to stop them.

Pilot lights are evil – Data-driven detective work on where the actual energy usage is going, with a conclusion that's given away in the title, but remains surprising!

Add humans to your data pipeline

Fargo-wood-chipper-scene

I was lucky enough to meet Chris Van Pelt of Crowdflower tonight, and it was fascinating to hear about some of the new developments bubbling away at the company. I'm a longtime fan, they add a lot of value beyond what you get from more basic crowd-sourcing services like Mechanical Turk, but I've always seen them as only an incremental improvement on their competitors. What Chris talked me through over beers felt like a true step forward though.

We started by chatting about their Real Time Foto Moderation tool. This is basically a penis removal tool for photo uploads; you feed in a stream of images and after a short delay you get back flagged results showing which were accepted according to the sort of criteria used by Apple's App Store for content. I was fascinated to hear about some of the rules – bare-chested guys are fine if they're outdoors, but not if they're inside!

This may not sound that revolutionary, but think about what this means. Your application code is calling an API, and getting results back, but behind the curtain is a workforce of humans! Chris likes to call this an RPC, a Remote Person Call. I'm not aware of any other service that allows this kind of unsupervised interaction, crowd-sourcing has always been much more of a batch process with manual transfers of inputs and outputs between the human and automated stages.

This is important because it turns human tasks into modules that can be flexibly inserted into your data pipeline just by signing up on the web site and installing a Ruby gem. This changes crowd-sourcing from a cumbersome custom process that you have to extensively plan up-front into something you can experiment with just like you would any other API. You can build prototypes in a few minutes, test ideas, benchmark against other solutions, and start shipping code much faster.

Chris is free to experiment on the other side of the abstraction layer too. He might partially or completely automate the process and applications would never need to know, as long as the quality of results is consistent. Human-driven versions are likely to be more expensive than computational ones, and the price people are willing to pay for particular services will be a strong signal of which ones are worth sinking developer time into.

There's a lot of hard problems that benefit from a human in the loop, from sentiment analysis to transcription, and I'd love to have a library of APIs for all those that I could drop into my data pipeline as I'm working on new features. Crowdflower is starting to make this possible, so I'll be excited to follow their progress as they roll out more services. If you have an AI-hard problem that's driving you crazy, they might have a solution that lets you pretend we've solved AI!

David Thomas, RIP

Grandad

My grandfather David Thomas had a long life, and packed a lot in. He was one of the youngest lot to fight in World War II, but he didn't like to talk too much about the actual service he'd done. The easiest parts to get him talking about were the people, friends he'd lost, or who he'd stayed in touch with afterwards back into civilian life. He'd ended up in the navy, and on his way to a land base in Sierra Leone servicing torpedo bombers, he'd endured weeks below decks. He knew there wasn't much of a chance that far below if a u-boat struck, but what he remembered was the stink of so many men, without much access to a shower. He got on with it though.

That was his strength, getting on with it. At first when he came back from the war he worked on the buses, where his aircraft engine skills proved handy. When the buses went on strike, he needed to keep supporting his family and switched over to a job at the Post Office. That's one thing I remember, he always had wonderful access to catalogs showing special editions of stamps, and gave me discounted entry to the mail-order "Dinosaur Club" thanks to his connections. He was always keeping his eye out for things like that, little ways to help first his two daughters, then the grandkids like me, and finally the great-grandkids when they arrived.

He was devoted to his wife, my Nan, too, visiting her every day, all day in the hospital for months before she passed away a few years ago. He stayed active right until his end, despite an array of medical problems. It must have helped that he was surrounded by friends and family who loved him. I remember virtual traffic jams of people coming in to see him in his hospital bed, and within a few hours of a new ward the nurses would be new friends. One of the best presents I was ever able to give him was a calendar showing our pet photos, and the exact name, age, ownership, and character of all the animals in the latest one he received was a hot topic of conversation on my last visit to him two weeks ago. He devored a box of chocolates that were another gift, but just a few days later he had a peaceful end, surrounded by family.

He's somebody I admire very much, for many reasons, but his kindness and lifetime of hard work to support his family stand out most of all. I miss him, but the positive impact he had through the way he lived his life will be around for a long time to come.