Five short links

hatsofftocanada

Photo by Morgan

Starivore Extraterrestrials? – Are some of the strange binary star systems we’re discovering actually evidence of a strange form of life? Almost certainly not, but it’s worth reading just for the sheer audacious imagination of the idea.

Want to API enable your COBOL applications? – Over the years I’ve developed tremendous respect for the depth of subtle requirements that have been baked into legacy applications through countless undocumented changes. When I was younger my first instinct was always to rewrite them, but after discovering by painful experience that the complexity of the old software almost always reflected the poorly-articulated complexity of the users needs, I learned to love shims like these. By wrapping modern web APIs in a layer that looks like the file system that COBOL programs understand, you can keep the knowledge embedded in them.

Bad Attitude – The motivational-speaker framing will drive you crazy, but there’s truth and real research buried in this analysis of how your attitude affects you at work. It might explain why I often struggled in corporate jobs where I didn’t have as much of a personal connection to the bigger goals, but “true believers” in a company’s mission are rewarded over skeptics, regardless of talent.

Getting real about distributed system reliability – We’ve spent man-months dealing with Cassandra setup and maintenance at Jetpac. It’s a massive investment for a small startup, and I struggled to avoid it for exactly the reasons Jay brings up. The real cost is the amount of time it takes to keep things running reliably, and if DynamoDB had been available when we started it would have been my technology of choice. I even considered using my S3-as-a-database approach to keep the maintenance time minimal!

JSON parser as a single Perl regex – Terrifyingly cool. Coolly terrifying.

Five short links

fivepipes

Photo by Stefano

Seven command-line tools for data science – 90% of data science is loading the damn stuff, and this is a great set of basic utilities for a lot of the formats you’ll have to deal with.

Classifying digits with deep-belief networks – A very readable guide to the new new thing in machine learning!

Our logo looks like underpants – British people are weird.

Busting the King’s Gambit, this time for sure – I don’t know chess, but the state of the art of computerized analysis is amazing.

Sloane’s Gap – A numerical investigation into strange properties of a large collection of number series. I learned that 11630 is the first uninteresting number, and there are 350 interesting sequences that contain 1729, so it’s even more exciting than Ramanujan thought!

 

Geocode the world with the new Data Science Toolkit

watercolorworld

Picture by Nicholas Raymond

I’ve published a new version of the Data Science Toolkit, which includes David Blackman’s awesome TwoFishes city-level geocoder. Largely based on data from the Geonames project, the biggest improvement is that the Google-style geocoder now handles millions of places around the world in hundreds of languages:

http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=القاهرة

{
  "status": "OK",
  "results": [
    {
      "types": [
        "locality",
        "political"
      ],
      "address_components": [
        {
          "types": [
            "locality",
            "political"
          ],
          "short_name": "Cairo",
          "long_name": "Cairo, EG"
        },
        {
          "types": [
            "country",
            "political"
          ],
          "short_name": "EG",
          "long_name": "Egypt"
        }
      ],
      "geometry": {
        "viewport": {
          "southwest": {
            "lng": 31.1625480652,
            "lat": 29.9635601044
          },
          "northeast": {
            "lng": 31.3563537598,
            "lat": 30.1480960846
          }
        },
        "location": {
          "lng": 31.24967,
          "lat": 30.06263
        },
        "location_type": "APPROXIMATE"
      }
    }
  ]
}

You can also access the TwoFishes API directly, which offers a lot of very powerful features like breaking down search queries into their where and what parts, so you can do something useful with “Pizza New York”.

For the first time I’ve made AMIs available in all the EC2 regions worldwide, and you can download or torrent the Vagrant version. Have fun, and let me know about any issues or improvements you’d like to see!

Five short links

chain

Photo by Racineur

Software development without estimates, specs, or other lies – The secret to being a good coder is understanding the business problem you’re being paid to solve. I know, you just want to code, but that skill’s getting democratized to death. Your real value manifests when you hunt down a ton of messy and contradictory needs and figure out a solution that works for the most important ones.

The mystery of San Francisco English – Did you know San Franciscans used to sound like Brooklynites?

On Chomsky and the two cultures of statistical learning – Peter Norvig dismantles Chomsky’s dismissal of statistical models. Bring popcorn.

Five lies your world map told you – Country borders aren’t nearly as well-defined as you might think. Some great examples in here, including the Bir Tawil Trapezoid that neither Sudan nor Egypt want to claim.

If it doesn’t work on mobile, it doesn’t work – Must-read data on understanding mobile. Written by Brian Boyer from the trenches of web development, you’ll learn why almost no mobile use is by people moving, that they prefer reading on phones to desktops or even tablets, and that American hourly usage patterns are very similar to British, but without a tea-time spike at 4pm!

OpenHeatMap is back on Github!

fireworks

Photo by Yasa

After my post last week about OpenHeatMap’s removal from Github, I’m very pleased to say that it’s back up! The Github support team got in touch quickly after my post. We never did figure out what happened with my emails, but they worked hard with me over the weekend to get all the file removal issues straightened out. As I mentioned in the original article, I’ve been fond of Github for a long time, so I’m very pleased to that we could get this sorted. Thanks to John Greet especially for collaborating with me on the technical side of the removals, on a Saturday afternoon!

I’ve also had a lot of questions about why I didn’t just move to an alternative provider like BitBucket? OpenHeatMap is a niche project that’s only useful to a small, scattered group of people. I open-sourced it so as many of those folks as possible could find it, and in my experience Github’s overall popularity and social hooks make it the best place for projects like that to be discovered. I do think there’s a future for self-hosted alternatives, the way WordPress works with commercial and open off-shoots, but projects like Gitlab still feel very clunky compared to Github right now.