Five short links


Photo by Andrew Hudson

EntityTagger – A pleasantly practical natural-language processing paper, via Nat Torkington

How prostitution and alcohol make Uber better – A clever tabloid hook for an interesting data story. One thing I've heard that might explain part of the pattern is that police shifts vary regularly by day, which can impact arrest times.

Social Network Analysis for Telecoms – I've repeatedly heard this used as an anecdote, but it wasn't until I was sitting at an event next to Mike Driscoll this week where it was mentioned that he was able to point me to his original research. It's great to see the original research, I can understand why it's now a classic example of how useful data science can be.

Hue Histograms – A charming way of visualizing image color characteristics by another friend's company. I'm lookin at good ways of anonymizing image data in a way that still preserves enough signals to be useful for machine learning, and this has given me some ideas.

Break an image into tiles – On the topic of images, I was pleasantly surprised at how easy ImageMagick was to install on OS X through MacPorts, I used to dread the failed dependencies. I used the recipe in the article for a hack I'm quite proud of. I needed to generate 'percent of the world seen' thumbnails for Jetpac public profiles shared on Facebook, so I manually created the HTML for a page with a grid of one hundred of the elements, one for each number, took a screenshot and then ran it through the grid command to get the numbered images I needed. You can see it in action if you like this sneak peek of my public profile page - you can unlike it afterwards if you don't want my new pensive portrait in your stream.

