Five short links


Photo by Joe Baz

Tech’s untapped talent pool – I’m a massive fanboy of sociologists, they can reliably answer questions about human behavior in ways that are light-years ahead of most data analysis you see online. Data science’s big advantage is that we have massive new sources of information, and more data beats better algorithms, but I’m excited to see what happens when sociology’s algorithms meet the online world’s data!

ZIP codes are not areas – This one confused the hell out of me when I started getting serious about geo data, but the only true representation of ZIPs is as point clouds, where every building with an address is a point. The spatial patterns make drawing a boundary even for a single moment in time hard enough, but as houses are built and demolished, the layout changes in unexpected ways.

It’s hard not to leak timing information – A cautionary tale of how tough it can be to be sure even a simple function like a string comparison doesn’t give away useful information to a malicious user.

PLOS mandates data availability. Is this a good thing? – We all love open data and reproducible science, but there are hard practical problems around the mechanics of making big data sets available, ensuring they’ll be downloadable over the long term, and avoiding deanonymization attacks.

Better performance at lower occupancy – Processors are incredibly complicated beasts, and our simple mental models break down when we’re trying to squeeze the last drops of performance out of them. This is a great example of how even the manufacturers don’t understand how to best use their devices, as a Berkeley researcher demonstrates how to get far better performance from an Nvidia GPU than the documented best practices allow.

