Five short links


Photo by Jovino

Debunking the 100x GPU vs CPU myth – I love having GPUs available, I couldn’t have done my recent deep belief web demo without them, but this paper matches my experiences. You can do amazing things with highly-tuned CPU code, and in a lot of real applications any speed gains on the computation side are swamped by the time it takes to transfer data to and from the graphics card.

OpenStreetMap isn’t all that open – I never understood the intent of the bondage-and-discipline open database license OSM adopted, but in practice it means that it’s very hard to use for general geocoding. If you use the data to look up the coordinates of a street address, then any data derived from that position is subject to the attribution and sharing requirements in any application it’s used in, no matter how many generations removed from the original. I can’t ask users of the Data Science Toolkit to publicly share their spreadsheets just because I’ve added a lat, lon column for them, so I’m using alternative open sources that don’t infect data sets they interact with.

Privacy in sensor-driven human data collection – I’m not sold on all the recommendations, but this working paper is a must-read if you’re working with sensor data and want to understand where the land mines are. Even plain-old accelerometer data can be very revealing.

Biometric word list – Like “Alpha/Tango/Foxtrot”, but for clearly speaking byte streams aloud, with words picked to minimize errors. Full of fragments of stories, including a “skydive racketeer”, “facial fortitude”, “slingshot rebellion”, and “highchair holiness”.

Potential, possible, or probable predatory scholarly open-access publishers – The dark side of the opening of the academic world, I keep getting contacted by dodgy-looking publications, so this list looks like a great resource. [Update – Cameron Neylon has a good comment on the background of this list. I’ll be looking at his suggestions instead!]

Five short links


Photo by Axel Taferner

Downloading software safely is nearly impossible – I’m resigned to the fact that a determined-enough attacker can access my data, since at the end of the day there’s always duct tape and rusty pliers, but the size of the holes in the stack we have to trust to get our hands on software is still painful to behold. See the followup too.

Bulk whois data – If you ask them nicely, ARIN will send you a complete dump of all their whois contact information, or you can buy it with no questions asked from a third-party supplier. More data that we theoretically know is public, but that becomes more problematic when it’s available en masse.

Dog poop, Facebook, and optimism – In the computer world we’re uncovering all sorts of interesting insights into hidden aspects of humanity, but we haven’t been able to get them into the hands of all the sociologists, historians, planners, aid workers, medical researchers et al who can really use them. I’m hoping Nicholas Christakis’s Human Nature Lab at Yale will bridge some of that gap, I’m very interested to see what emerges.

Potato programming – Even though I’m not a fan overall, I still learned a lot from my forays into functional programming. I almost always never mutate values after I’ve assigned them, and I find code a lot cleaner when I can avoid lower-level for loop constructs in favor of something like a map or each. I just ran across this term for the clunky code that explicit looping produces, and it’s a memorable way of describing the one-potato, two-potato anti-pattern.

Verification handbook – This is a free handbook aimed at helping reporters separate rumor from fact when news is breaking, but it’s just as useful for readers of journalism. There are so many more sources of information these days that responsible citizens in a modern society have to be able to intelligently question what they’re being told.