Five short links

Grasschains
Photo by Peter Kurdulija

The Visualization Trap – The authors argue that visualizations are dangerous because they’re too persuasive, using accident reconstructions as an example where the computer-generated animation makes viewers more likely to take a strong position on the cause than witnesses to the actual event. I think that production values are a big part of this. We’re unconsciously impressed by the amount of money that someone spends on a presentation. It’s like peacock feathers, if they can expend that many resources on their argument, they must have a lot of confidence in it. That’s why commercials cost millions, and visualizations are just another high-cost way of telling stories, with the same unfair persuasive advantage as any other expensive medium

Statistical Intensity Map Creator – A neat little (commercial) Flash map for displaying US state data

Modest Maps – An awesome open-source project making it easy to include tile-based zoomable maps in either Flash or Python on the server side. One of the authors is Michal Migurski of Stamen, who produce some amazing visualizations

Extracting Place Semantics from Flickr Tags – Users are generating massive amounts of data by tagging photos with known locations. Can we use that information to build a rich database of information on places?

The Buzzer – A spooky Russian radio station that’s been broadcasting an enigmatic signal for decades. Some claim it’s just for atmospheric research, but is it actually a “dead man’s switch” for a nuclear apocalypse?

Five short links

Chainlink
Photo by (nz)dave

WEKA – If you've got big sets of data that you're trying to find patterns in, you should be using WEKA. It's still a very technical process, but the team at Waikato University have assembled a fantastic open-source toolkit of turnkey algorithms to run

TransparencyData – Tonnes of lovely data on US political contributions, and joy-of-joys, they offer full dumps not just an API. The privacy implications of all this data being so easily accessible are worth pondering though

How the CPI analyzed mortgage lenders – I ran across the Palantir guys several years ago, and I've been consistently impressed with their expertise at visualizing complex data. This video is from a while back but it shows off how capable their platform is

The Hitchhikers Guider to the Galaxy on tail risk"The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair". If there were ten commandments for engineers, this would be on there.

The Athens Affair – A compelling read covering some amazing technical feats performed by still-uncaught hackers who bugged the Greek government's cell phones

How to turn a file of addresses into latitude, longitude coordinates

Addressfile
Photo by TypeFiend

Some friends recently needed a hand converting a large set of addresses into map coordinates, so I pulled together some code I was using in other projects into a small script. It uses Yahoo's Geoplanet API since I've found that gives good results with a lot fewer restrictions than Google's geocoder. Since it seemed like this might be handy for other people too, I've put the code up on github at http://github.com/petewarden/geocodefile

To use it, get a free app ID from Yahoo and then run

./geocodefile.php -i testdata.txt -o output.txt -s

After a few seconds it should complete, and the output should contain estimated latitude,longitude coordinates for all the locations in the test data.

Five short links

Lynxhelicopters
Photo by Defence Images

HDFS blogDhruba Borthakur works with Facebook’s Hadoop cluster, and while I wish he’d update his blog more often, every post is packed full of in-depth details about optimizing your Hadoop usage, all obviously learned the hard way!

Sexy Data Geeks – A brilliantly concise rundown of why the world of Big Data is so fascinating right now (via Dániel Molnár)

USDA food atlas – There’s some delectable data hidden in this visualization, but I have a hard time understanding what it’s trying to tell me. It’s still a great project though. (via Joe Mako)

Housing and Transport Affordability Index – I really like what CNT have done with their geo data, I find the interface a lot easier than the USDA food atlas, though the sheer amount of information presented can be a bit overwhelming.

How to make perfect McDonalds-style french fries – Back in Scotland I used to frequently eat at McDonalds because their 99 cent (59p) burgers were a great source of cheap protein, and I wasn’t a fan of the local alternatives (I remember walking into a bakery and asking what was in one of their meat pies, and getting the answer ‘Meat’ – when I asked what kind, the lady just looked puzzled and repeated ‘Meat!’). One of the joys of the US is that I have a lot more choice these days, but McDonalds still make the best french fries, so I was happy to see this comprehensive guide to making your own at home. Even if you’re not a fan of the fries, you’ll be amazed by the depth and rigor of his detective work

Do you have data you’d like to show on a map?

Screenshot2
I'm working on a new open-source project to make it easy for anyone to show their data in an interactive map on the web, and I need volunteers to test it. I'm looking for people who have spreadsheets they'd like to turn into maps, and part of what I'm looking to test is that I'm covering the most popular ways of specifying locations, so I'm interested in everything from zip codes, to street addresses, latitude and longitude to country names. If this sounds like you, please drop me an email via pete@petewarden.com and I'll get you started making maps right away.

How to suck at raising angel investment

Angels
Photo by Alice Popkorn

Today I'm heading back to Techstars to talk to the new class of startups, and it seems like a good time to reflect on what I learned going through the program. Since the three months were focused on raising angel financing and Mailana never did get any investment, it's worth looking at what I did wrong. Here's how to kill your chances of raising angel money.

Be ambivalent about investment

I still don't truly believe planes can fly. I've flown hundreds of thousands of miles, but when I stare at that big hunk of metal sitting on the asphalt, it seems completely implausible that it can climb through the air. Rationally I know it works, but my gut still tells me its impossible. I feel the same way about early-stage technology investment. I see it happening all around me, both on a personal level and in the products I use every day, but I still find it hard to wrap my head around the idea that people will really hand over money for something as risky as a technology startup.

That put me in the worst possible position for raising money. I was interested in getting more resources to build the business, but wary of the strings attached. That meant I burnt up valuable time asking for investment, but wasn't committed enough to close a deal. As Brad Feld said in one of the talks, "Do or don't do, there is no try" when it comes to fund-raising.

Have a bias towards technology risk

I tried to pretend that I was driven by the market in what I was doing, but in my heart I've always been driven by the changes in technology that make new things possible. Almost no investor will be current enough on the geeky details of whatever area you're working on to judge the risk of whether you can actually build something that's never been built before. Most of them are extremely familiar with the human side of the business world, so they do know what questions to ask about your market. Put simply, they can't tell if you're bullshitting about the delivery and barriers to entry to any untried technology, but they can spot bogus market estimates a mile away. Building around technology risk radically limits the pool of investors willing to bet on your company.

Be a lone founder

This one has been beaten to death elsewhere, but a single founder is a major red flag for most investors. It's like seeing someone eating alone in a restaurant. Sure there's all sorts of reasonable explanations but it leaves a question hanging – "what's wrong with that guy that I don't know about?". It also left me with zero time to make product progress while I was talking to investors.

Don't provide reassurance

I've always tried to be very honest that I'm groping and iterating my way towards something that works but that I don't have a master plan. My hope is that I'm finding a thousand ways not to build a lightbulb, and I'll soon find one that works. As a sales pitch to potential investors, that sucks, and I can understand why. If you're going to be putting your money into a company, you want the founder to exude confidence, even if you know that's irrational based on the facts. If nothing else it's a social mechanism that investors hope will motivate everyone to live up to their side of the deal. It's also a crucial part of leadership, something you need to keep the team motivated through the rocky times.

Going back to Techstars, I realize I'm supposed to say that the experience was fun. It wasn't. It was painful and the constant rejection was emotionally grueling, but it was incredibly valuable. I made some amazing friends, absorbed a massive amount of wisdom from some of the smartest people I've ever met, and I'd do it again in a heartbeat. I just hope I've learned enough from it all that I'll be making a whole new set of mistakes over the next year.

What’s my problem with money?

Goldcoins
Photo by Tao Zhyn

I really struggle with conversations about money, and this causes a lot of problems as I work on turning my big-bag-o'-technology into a business. My family never talked about money when I was growing up, and I absorbed that general dread of any financial discussion. The proper way to approach negotiations was as a true guesser, spend a lot of time figuring out what the other side would consider a fair price without directly asking, and only making a proposal once I was certain it was acceptable, to avoid the social calamity of a refusal.

This approach works great if you're living in a Jane Austen novel. It's only effective if dealing with somebody you have known for years, and you have the time to climb inside each others' heads. I'm constantly dealing with strangers who've grown up in a completely different culture so reading the signals is almost impossible. I'm having to push through my discomfort to become an asker, using logic to ignore all the warning lights that go off in my head flashing "You're being a jerk".

This is on my mind thanks to two recent conversations. A friend was overwhelmed by invitations to speak at conferences but had trouble saying no. He knew that he should be using money as a filter, but struggled to ask for a fee, even though the demand for his time clearly outstripped the supply! I'm having a similar problem with consulting, there's so many fascinating projects that people have asked me for help on that I've over-committed, and find myself with no time for my own work. I've avoided charging most of the people I've been helping, and when I have, I've gone with an hourly fee based off my salary at Apple. This came to a head recently when a partner rejected my standard rate as too low, explaining it just wouldn't be credible to his bosses, and recommended I double it!

When I heard that, at first I couldn't figure out why it felt so wrong. Even writing this post about it is a struggle, and I think it all comes down to that same 'guesser' model of negotiations I've carried in my head. Without feedback from the other side I always fell back on something external to anchor on, my previous salary, even if that didn't make sense. It's not like I don't need the money, I just paid my lawyers $14,000 (thanks Facebook) and I'm still using my savings from Apple to help pay the rent.

For my business to be successful I need to behave in a way that I grew up considering pushy. I can't completely blame my reluctance on being British, my brother grew up selling go-kart rides from the school bus-stop and graduating to trading in cars and real-estate. Logically I know that the market sets the rate for whatever you're selling, and a fair price is whatever people are willing to pay. My problem with money is in my own head, and the only solution is to learn ways of talking about it openly. I hope this post is a good start.

Five short links

Bikelinks
Photo by Ian Sane

On the Leakage of Personally Identifiable Information Via Online Social Networks – This was the basis of a somewhat sensational WSJ article, but the original paper is interesting because it points out how standard web mechanisms like referer headers can be exploited now our Facebook ID numbers can be used to look up valuable information

This Week in MapsSteve ‘OpenStreetMap’ Coast alerted me to his new podcast covering the wonderful world of geo. The latest episode with Matt Galligan even gets a bit feisty!

Large Scale Social Media Analysis with Hadoop – If you’re trying to do anything with even moderate amounts of data from networks like Twitter, using a MapReduce model for your processing will save you so much time and effort. This presentation is a great guide to getting started

Living in denial, why sensible people reject the truth – When I was young, there was a vaccination scare that led my parents to skip some of my shots. Sure enough I came down with Whooping Cough shortly afterward, so I’ve always had a personal fascination with how we can judge when something’s a real worry, and when it’s just pseudo-science. I loved this exploration into how harmful beliefs like those are started and sustained

The Dumbest Question – It’s standard for financial advisers to begin by asking their clients ‘How much risk are you comfortable with?’. This is a terrible starting point, nobody has a real idea of how much of an abstract ‘risk’ they want to take with their money, they just have goals, dreams and constraints

Five short links

Golfbunker
Photo by Robert1407

Hadoop Cluster Chef – A very useful project by Joe Kelly Flip Kromer of InfoChimps. It's designed to make building compute clusters across a range of technologies very easy, and it's built by someone in the trenches so it has a lot of street-smarts baked in, things like using spot EC2 instances for rock-bottom server prices

Comparing email address validation regular expressions – This is the mother of all email REs, with a lot of testing behind it. As someone who's wrestled with this problem myself I'm impressed, though I lean towards using a much more accepting version for testing user input

How to check if an email address exists without sending an email – I wasn't aware that you could use SMTP to discover if an email address is valid. It makes the "we can't add an API to look up users by email on our service because spammers will use it to validate emails" line from social networks look even more like a flimsy excuse

Reflections on startup life – week 28 – Tim Bull has been writing weekly entries about the process of building a company around Tribalytic. Writing this stuff in realtime makes it a great antidote to the 'we knew what we were doing all along' PR spin that always obscures the real story of successful startups

Tree of Ténéré – This was the most isolated tree on the planet, over a hundred miles from its nearest neighbor. The last remnant of a grove that grew in another era when the Sahara had water, its roots stretched over 30 meters downwards, and it was a landmark for generations of travelers. Then it was run over by a truck.

Was the financial meltdown an alien invasion?

Alienhead
Photo by Scott Wills

Before you start worrying I've gone all Icke on you the answer is no, but what's interesting about the question is that it's a lot harder to answer than it used to be. Todd Vernon's article on the still-unexplained stock market crash of May 3rd reminded me of something, but it took a while to put my finger on it. Then I realized, it was from Vernor Vinge's A Deepness in the Sky.

In the novel Vinge describes financial weapons that are unleashed by a malevolent intelligence on an unsuspecting civilization, business plans that explode their systems. At the time I was naive enough that I couldn't picture how just transmitting an idea could have that sort of impact, but now it's pretty obvious. All you need is an alien in the early 2000's to email an investment banker "Hey, everyone wants triple-A securities. If you bundle this risky debt into tranches you can persuade the ratings agencies to certify part of it as AAA and you'll make a fortune".

What I find fascinating as a computer engineer is that this is all my profession's fault. We've built an amazing infrastructure of information plumbing to allow the financial system to operate on auto-pilot, without those pesky humans sucking up salaries and slowing the whole process down. The late lamented Tanta opened my eyes to how the job of a mortgage underwriter had changed from being a gatekeeper, almost a detective trying to sniff out risky loans, to someone whose focus was making sure that the submitted forms matched the criteria set in the computer programs they now used to grade loans. In a very literal sense these programs are artificial intelligences, expert systems that try to mechanize the thought process that we used to rely on loan officers to go through.

These loans were then bundled and sold as securities based on complex computer models based on historical data (which didn't include falling house prices). These securities were then bought and resold by investors using their own complex trading programs. This was all in the debt market, but the same takeover of decisions by software occurred in equities, and the end result is a system composed of hundreds of thousands of different programs all talking to each other and making enormous financial decisions on their owner's behalf. What's truly scary about this size of system is that it's reached a scale where it's so complex that it's impossible to understand why anything happened. That's what worries me about the May 3rd crash, it's the first time a vital information system we've built has proved too complex to debug. In the pre-computer world, you could just interview everyone who bought or sold on the daya of a crash and ask them why they took action. We can't do that with our programs, which is why the crash will remain a mystery.

In another part of Deepness, Vinge describes a civilization whose systems have reached the point where they're so entangled and baroque that nobody can fix them when they crash, and the whole world is destined to collapse. Again I couldn't picture that when I first read it, but now I can. In the financial crisis we had cargoes that couldn't be shipped because the standard letters of credit between banks were no longer being honored. Our fundamental mechanisms for delivering food were taken out by problems in our financial information systems! Our AIs don't need to become self-aware to turn into Skynet, they're quite capable of causing serious damage just as they are.

So how can we fix this? As a start we need all programs making financial decisions to produce a clear audit trail in a standard format, explaining not only what actions they took but why. With this sort of log available for every market participant, forensic investigators would be able to build a picture of the cause of events like May 3rd. It may well prove to be impossible for some existing code to come up with meaningful reasons for its decision tree, but that's a feature of this proposal, not a bug. If the operators can't justify the actions of the programs they're running, then that's a clear sign they're too dangerous to be interfacing with our financial system.