Five short links

pentagonalwell

Photo by Harshvardhan Dhawan

The county problem in the West – This is a brilliant example of why you need to understand some GIS basics to sensibly use even the most basic geographic statistics. The large size and arbitrary boundaries of western US counties mean that the default view of historical settlement is muddled, and only by switching to alternate spatial partitions can you understand what was actually going on.

The cost of satisfaction – Patients who are  satisfied with their doctors are more likely to die than the malcontents! This appears to be a real effect, judging by the statistics, and I wonder if it’s because picky patients are more likely to push for more information and second opinions? Whatever the cause, it’s a good reminder that even the most obvious metrics might not match up with the goal you’re trying to achieve.

Don’t mix threads and fork – The complexities of getting threads to play nicely with fork() are mind-boggling, and in practice seem insurmountable.

Free GIS data – An impressive list of free-as-in-beer geographic data. As the page recommends, do look into the licensing terms for any you want to use. You might be surprised at the requirements for something like the Open Database License if you use OpenStreetMap files for example.

Signals from the void – A blend of the inspiring vision of picturing the black hole at the center of our galaxy with the mundane grind of performing research from bleak mountain-tops, at the mercy of the weather and unreliable equipment. This story rings very true, especially around the chaos behind the project and the personalities of the people who are attracted to the quest.

Five short links

fivev

Photo by Jed Sullivan

Sunlight intensity based global position system – It turns out you can geo-locate underwater sensors to within a few kilometers just by measuring the sunset and sunrise times. It’s a beautifully cheap way to figure out where fixed-position outdoor sensors are, since taking light measurements a few hundred times across a day is simple to implement, and doesn’t take much power or computation.

NSNotificationCenter with blocks considered harmful – Managing the lifetime of memory allocations is incredibly hard, and this is a cautionary tale in how nasty it can get.

Blackhash – Do you really trust your security audit company with your hashed password files? An interesting approach that allows them to do some limited testing, without handing over the data itself.

Geo-located Twitter as the proxy for global migration patterns – Understanding how the world is connected by analyzing people who tweet from multiple countries.

Analyzing the Iranian Embassy bombing in Beirut from photos – The format of the slideshow is a bit hard to navigate, but it’s worth stepping through. Felim explains how he used a combination of ground and satellite photos to verify that a suspect video was actually taken at the right time and place.

Five short links

keyfive

Photo by egazelle

How to program unreliable chips – It’s been a vitally useful simplification to pretend that computer calculations are 100% reliable, but as our data volumes grow and chips shrink, we’ll need to start planning for errors.

Abigail’s regex to test for prime numbers – A thing of terrifying beauty.

Disguise detection – Using cheap IR detectors to check that the face you’re detecting in visible light isn’t a latex mask.

Mapping “For Whom the Bell Tolls” – A thoughtful look at how locations occur in the novel, with the visualization in its proper place as another tool in the analysis, rather than being seen as the final product. It’s worth following some links too, you’ll find some gems like this analysis of the baggage and future of GIS.

Know thy Java object memory layout – All abstractions leak around the edges, and I love catching glimpses of the machinery that’s whirring away inside black boxes. The complexity of the accumulating layers of software archaeology we’re building on top of is staggering.

Five short links

fivefences

Photo by Christophe Kummer

Why manual memory management can be worse for performance than garbage collection – I spent over a decade coding in C and C++, and these are true words. Instead of GC pause you’ll have ‘deallocation hiccups’ whenever a big object destructor or scope change occurs, and reference counting is an intrusive performance hog that leads to horrific constructs like loops that repeatedly Release() a COM object until the reference count is zero. This allocation meditation from someone switching to C from Python is worth a read too, it captures the near-obsession with malloc that C programmers have to develop.

Proper handling of SIGINT/SIGQUIT – Have you ever wondered what’s going on when you press Control-C in the terminal? This article is a great case study on how a seemingly simple requirement spirals into tough-to-get-right complexity when you have to integrate it into a wider system.

Use multiple CPU cores with your Linux commands – How to use GNU Parallel to speed up your grep, awk, and sed-ing.

Thieves pose as truckers to steal huge cargo loads – The interesting part is that criminals are doing intensive web research to build themselves convincing false identities from publicly-available information. Open data has its downsides.

Accidental aRt – When R attacks!