Five short links

Fiveaces
Photo by RHiNO NEAL

Your ideal performance/consistency tradeoff – It's unclear what the right number of nodes and level of redundancy for a Cassandra cluster are for any particular performance requirements, so most of us experiment until we have something that vaguely seems to work. Thanks to the folks at Berkeley, there's now a better way to figure it out via an interactive tool. Interestingly, they ended up using a Monte Carlo simulation rather than a formula, which shows how complex the problem is.

Why is finance so complex? – One of the most interesting articles I've read in a long time. It posits that finance is effectively a benign con trick, and relies on a lack of transparency to encourage people to take risks they wouldn't if they fully understood what they were getting into. The idea is that it's a collective action problem that only works if everyone jumps on board, and so the opacity helps persuade people to do that and achieve a better overall result than if they made an individually-rational choice. The model seems like it might explain other odd features of our social world.

Run a MapReduce job across five billion web pages for 25 cents – I have a massive data-crush on Common Crawl, and this is a fantastic practical demonstration of why I'm so excited. 

Clickjacking - The web's security model is more like Windows' than Unix's. It's been grafted onto an underlying system that was designed without any security foundations, and there's lots of gaps where different components interact in exploitable ways. This page explains how there's no reliable way to prevent malicious sites from hosting your site as an invisible frame and tricking users into taking actions by unknowingly clicking on it. Luckily we're in a world where software can be frequently updated, unlike 90's desktop software, so at least if this becomes widespread we might quickly see some fixes.

Muse – A noble experiment in mining useful data from your own email archives. It's still a bit too buggy to really get a feel for how interesting the results could be though.

Five short links

Fiveleaves
Photo by Let Ideas Compete

Rust – A trap to ensnare unwary web crawlers, by Tim McNamara. It creates pathological patterns of input data that will slow down naive robots by the sheer volume of processing required, whilst using minimal resources on the server thanks to elegant event-driven code. It's effectively a reversed denial-of-service attack, designed to overwhelm malicious or thoughtless crawlers of your site. Well-written and robust robot scripts will cope with malformed input of course, but the odds are that any crawler that's bringing your site to its knees with an unreasonable number of requests won't be a masterpiece of engineering!

Seeing like a database – Written by another fan of Seeing like a State, this has a great quote from Jay Owens at the end, noting "the asymmetry of personal data, open for the 99% & deep analytics for the 1%".

HttpBin – Echoes back information about HTTP requests you send it, including things like headers, data, and forced result codes. I'm just thankful it introduced me to the 418 (I'm a teapot) status code, I can't believe I've been writing web code for so long without checking for that possibility.

Drone landscapes, intelligent geotextiles, geographic countermeasures – I'd never realized how deeply adding processing to landscape structures could change our world. This is a compelling exploration of some of the possibilities, and I'm especially struck by the possibilties for a robot-readable world.

An end to bad heir days - The copyright on James Joyce's work finally expired! The enforcement process became a poster child for how the combination of insanely-long copyright terms and ornery heirs can derail the enjoyment and exploration of an artist's work. Thankfully scholars are now free to quote Joyce's work and letters, and I've just downloaded A Portrait of an Artist as a Young Man to re-read in celebration.

Five short links

Tally

Photo by Richard Paterson

The Ugliest Map in the World – Such an eyewatering color scheme, you'd think I'd designed it. The swimming-pool bottom caustics for the ocean areas really clinches it.

The Life of a Typeahead Query – An exploration of how hard it is to make an easy interface. Great to see a practical example how someone architected a real-world system with messy requirements.

Ending the Infographic Plague – Visualizations are an excellent hack for getting publicity, which inevitably leads to pollution by bad actors.

The Mess that is NPM – I really, really want to use Node.js, but the library ecosystem isn't quite mature enough for me to use in production. There's a lot of non-technical community hacking that you need to do to create a strong set of modules, and responsible maintainership isn't something I'm perfect at with all my projects, so I know how hard it is.

Brain Grain – Tasty little HTML5 visualization of world-wide migration. It's pretty simple, but has some innovations I've not seen elsewhere and uses animation effectively.

And last but not least, Jetpac is now rounding out a fundraising round, so if you're on Angelist any comments or recommendations would very welcome.