Debunking the 100x GPU vs CPU myth – I love having GPUs available, I couldn’t have done my recent deep belief web demo without them, but this paper matches my experiences. You can do amazing things with highly-tuned CPU code, and in a lot of real applications any speed gains on the computation side are swamped by the time it takes to transfer data to and from the graphics card.
OpenStreetMap isn’t all that open – I never understood the intent of the bondage-and-discipline open database license OSM adopted, but in practice it means that it’s very hard to use for general geocoding. If you use the data to look up the coordinates of a street address, then any data derived from that position is subject to the attribution and sharing requirements in any application it’s used in, no matter how many generations removed from the original. I can’t ask users of the Data Science Toolkit to publicly share their spreadsheets just because I’ve added a lat, lon column for them, so I’m using alternative open sources that don’t infect data sets they interact with.
Privacy in sensor-driven human data collection – I’m not sold on all the recommendations, but this working paper is a must-read if you’re working with sensor data and want to understand where the land mines are. Even plain-old accelerometer data can be very revealing.
Biometric word list – Like “Alpha/Tango/Foxtrot”, but for clearly speaking byte streams aloud, with words picked to minimize errors. Full of fragments of stories, including a “skydive racketeer”, “facial fortitude”, “slingshot rebellion”, and “highchair holiness”.
Potential, possible, or probable predatory scholarly open-access publishers – The dark side of the opening of the academic world, I keep getting contacted by dodgy-looking publications, so this list looks like a great resource. [Update – Cameron Neylon has a good comment on the background of this list. I’ll be looking at his suggestions instead!]