Facebook changes the rules for the public web

A friend recently sent me this link to a new legal document Facebook have added to their site:

http://www.facebook.com/apps/site_scraping_tos_terms.php

It's the first time I've seen Facebook formally lay out how they think the world should treat the web pages they've made public and have indicated they allow to be crawled through their robots.txt. What it says is what they told me when they threatened to sue me a few months ago; anyone who crawls the web must obtain prior written permission from every site.

Why should you care? It's their attempt to have their cake and eat it too. They want to make as much information as possible about their members public so that they can get traffic from search engines and drive brands to prioritize their Facebook pages, but they know they have users trapped as long as their data is hard to transfer out of the service into any potential competitors.

So, stuck between these two incompatible goals, they've reached for the lawyers. They could change their robots.txt to disallow crawling, or remove the pages they've made public, but that would remove their valuable search traffic. There's a lot of legal backing to the rules in robots.txt, but you'll need deeper pockets than mine to contest Facebook's new interpretation.

What it means in practice is that large established companies are able to crawl (though always with the threat of legal action hanging over them) but smaller, newer startups will be attacked by Facebook's lawyers as soon as they look threatening. Google definitely fall foul of the new rules (caching web pages, the use of data for advertising purposes), so I'd be interested to know if they've signed up? I know these changes would make it impossible for them to get started today, since they'd have to contact each and every website before they crawled them and respond to things like "an accounting of all uses of data collected through Automated Data Collection within ten (10) days of your receipt of Facebook’s request for such an accounting". Avoiding that sort of mess was exactly why the industry agreed on robots.txt as a standard.

To be completely clear, I understand that Facebook need to protect their users' privacy. This does nothing to help that, anyone malicious is free to gather and analyze all the information they have made public about people, Facebook has left it all completely in the open with no technical safeguards. What this does is gives Facebook a legal stick to beat anyone legitimate who tries to openly use the data they've made available in a way they decide they don't like.

Five short links

Fivedogs
Photo by Xanboozled

The Git Parable – After reading this story by Tom Preston of github, the way git works actually makes sense. It’s still a little maddening that my common workflow is more involved than it was on svn, but I understand more about how that flexibility can also be very powerful. via Elben

MacMail Power User Tips – TA from Gist has a great article on how to use the Apple desktop mail program effectively. I’m exclusively on gmail these days, and I constantly miss these sort of advanced features from Outlook, but the convenience of webmail is hard to beat.

Mapping conference connections – This is an area I’ve been fascinated by, and I actually spent some time with the guys on the late lamented Eventvue trying to create a compelling product around the same idea. I’d love to have a social map of how I’m connected to conference attendees before I went, but it’s been surprisingly tough to turn that into a business. via Eutiquio

Mapping the world’s photos – This is a couple of years old, but it’s still amazing to see the details that get highlighted when you map the locations of 35 million photos from Flickr. via Michael

Poly9 Globe – A Flash component that lets you render an interactive 3D globe and overlay information over it. Very nicely done.

An end to the loneliness of the open-source coder?

Lonewolf
Photo by Ucumari

I’ve been publishing free software since I was 15, back in the days of ‘public domain’ floppy disks and magazine listings. I still get lovely emails from people using my open-source visual effects plugins, and I’m still so amazed by the magic of what computers can do that I can’t help but keep sharing code around the areas that fascinate me.

What’s been strange though is how solitary my open-source coding work has been. In my commercial work I usually end up being the guy who talks to everyone on the team and knows how all the pieces fit together. Partly that’s because all the juiciest bugs are in the gaps between the modules, but I’m also the sort of person who loves to learn other people’s code. By contrast, even my popular free projects have never been shared endeavors. I haven’t even found anyone willing to take on the task of porting my plugins to new versions of After Effects, now that I no longer work in that world.

I don’t think I’m alone in this, looking around there’s a massive long tail of open-source modules that are being ignored by potential users and contributors either because they don’t know about them, or because the barriers to getting involved are too high. Today something happened that gives me hope that things are changing.

I recently began publishing all of my new code on github. I tried it out because somebody nagged me to in the blog comments, and I stayed with it because the website interface was so straightforward and friendly. It’s what Sourceforge would have been if they had any UI skills. I’m still at the stage with git where I’m typing in commands from tutorials without really knowing what I’m doing, and occasionally cursing the new mental model it forces on me, but I’m able to get simple tasks done.

What I realized this afternoon is that its familiar interface hides a deeper hidden infrastructure, something that has the potential to change the way open-source works. I received a pull request for ParallelCurl!

So what? Well, first it was great to know people are using the project enough to want to make changes, but more importantly it made me realize how much github has lowered the barriers to contributing to an open-source project. In the old days I remember biting my nails to the quick as I reviewed, patched, tested and documented small fixes to large codebases. Writing to the mailing list to get your patch accepted was an art all to itself, and on both ends it was a painful amount of work to get changes from new contributors. It was fraught with social problems too, the process had the potential to be confrontational and unpleasant if the patch was rejected.

Github changes all that. SoftwareElves was able to create his own branch of the code, make some changes and then just drop me a notification that he had a new version I should consider rolling in. Reviewing and accepting his changes was simple, but if I’d been unresponsive or hadn’t liked them, his branch of the code would still have been a first-class citizen and there would have been no awkwardness involved. Both git and github have collaboration baked in, which sounds obvious for a version control system but I realize now has been lacking from every other service I’ve used. Github is a social network for code.

There’s still going to be some tumbleweed blowing through the long tail of open-source projects, but Github is a massive step forward. I’m eagerly anticipating lots more people pointing out my mistakes, the world of open-source will be a lot more productive with that sort of collaboration.

Five short links

Grasschains
Photo by Peter Kurdulija

The Visualization Trap – The authors argue that visualizations are dangerous because they’re too persuasive, using accident reconstructions as an example where the computer-generated animation makes viewers more likely to take a strong position on the cause than witnesses to the actual event. I think that production values are a big part of this. We’re unconsciously impressed by the amount of money that someone spends on a presentation. It’s like peacock feathers, if they can expend that many resources on their argument, they must have a lot of confidence in it. That’s why commercials cost millions, and visualizations are just another high-cost way of telling stories, with the same unfair persuasive advantage as any other expensive medium

Statistical Intensity Map Creator – A neat little (commercial) Flash map for displaying US state data

Modest Maps – An awesome open-source project making it easy to include tile-based zoomable maps in either Flash or Python on the server side. One of the authors is Michal Migurski of Stamen, who produce some amazing visualizations

Extracting Place Semantics from Flickr Tags – Users are generating massive amounts of data by tagging photos with known locations. Can we use that information to build a rich database of information on places?

The Buzzer – A spooky Russian radio station that’s been broadcasting an enigmatic signal for decades. Some claim it’s just for atmospheric research, but is it actually a “dead man’s switch” for a nuclear apocalypse?

Five short links

Chainlink
Photo by (nz)dave

WEKA – If you've got big sets of data that you're trying to find patterns in, you should be using WEKA. It's still a very technical process, but the team at Waikato University have assembled a fantastic open-source toolkit of turnkey algorithms to run

TransparencyData – Tonnes of lovely data on US political contributions, and joy-of-joys, they offer full dumps not just an API. The privacy implications of all this data being so easily accessible are worth pondering though

How the CPI analyzed mortgage lenders – I ran across the Palantir guys several years ago, and I've been consistently impressed with their expertise at visualizing complex data. This video is from a while back but it shows off how capable their platform is

The Hitchhikers Guider to the Galaxy on tail risk"The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair". If there were ten commandments for engineers, this would be on there.

The Athens Affair – A compelling read covering some amazing technical feats performed by still-uncaught hackers who bugged the Greek government's cell phones

How to turn a file of addresses into latitude, longitude coordinates

Addressfile
Photo by TypeFiend

Some friends recently needed a hand converting a large set of addresses into map coordinates, so I pulled together some code I was using in other projects into a small script. It uses Yahoo's Geoplanet API since I've found that gives good results with a lot fewer restrictions than Google's geocoder. Since it seemed like this might be handy for other people too, I've put the code up on github at http://github.com/petewarden/geocodefile

To use it, get a free app ID from Yahoo and then run

./geocodefile.php -i testdata.txt -o output.txt -s

After a few seconds it should complete, and the output should contain estimated latitude,longitude coordinates for all the locations in the test data.

Five short links

Lynxhelicopters
Photo by Defence Images

HDFS blogDhruba Borthakur works with Facebook’s Hadoop cluster, and while I wish he’d update his blog more often, every post is packed full of in-depth details about optimizing your Hadoop usage, all obviously learned the hard way!

Sexy Data Geeks – A brilliantly concise rundown of why the world of Big Data is so fascinating right now (via Dániel Molnár)

USDA food atlas – There’s some delectable data hidden in this visualization, but I have a hard time understanding what it’s trying to tell me. It’s still a great project though. (via Joe Mako)

Housing and Transport Affordability Index – I really like what CNT have done with their geo data, I find the interface a lot easier than the USDA food atlas, though the sheer amount of information presented can be a bit overwhelming.

How to make perfect McDonalds-style french fries – Back in Scotland I used to frequently eat at McDonalds because their 99 cent (59p) burgers were a great source of cheap protein, and I wasn’t a fan of the local alternatives (I remember walking into a bakery and asking what was in one of their meat pies, and getting the answer ‘Meat’ – when I asked what kind, the lady just looked puzzled and repeated ‘Meat!’). One of the joys of the US is that I have a lot more choice these days, but McDonalds still make the best french fries, so I was happy to see this comprehensive guide to making your own at home. Even if you’re not a fan of the fries, you’ll be amazed by the depth and rigor of his detective work

Do you have data you’d like to show on a map?

Screenshot2
I'm working on a new open-source project to make it easy for anyone to show their data in an interactive map on the web, and I need volunteers to test it. I'm looking for people who have spreadsheets they'd like to turn into maps, and part of what I'm looking to test is that I'm covering the most popular ways of specifying locations, so I'm interested in everything from zip codes, to street addresses, latitude and longitude to country names. If this sounds like you, please drop me an email via pete@petewarden.com and I'll get you started making maps right away.

How to suck at raising angel investment

Angels
Photo by Alice Popkorn

Today I'm heading back to Techstars to talk to the new class of startups, and it seems like a good time to reflect on what I learned going through the program. Since the three months were focused on raising angel financing and Mailana never did get any investment, it's worth looking at what I did wrong. Here's how to kill your chances of raising angel money.

Be ambivalent about investment

I still don't truly believe planes can fly. I've flown hundreds of thousands of miles, but when I stare at that big hunk of metal sitting on the asphalt, it seems completely implausible that it can climb through the air. Rationally I know it works, but my gut still tells me its impossible. I feel the same way about early-stage technology investment. I see it happening all around me, both on a personal level and in the products I use every day, but I still find it hard to wrap my head around the idea that people will really hand over money for something as risky as a technology startup.

That put me in the worst possible position for raising money. I was interested in getting more resources to build the business, but wary of the strings attached. That meant I burnt up valuable time asking for investment, but wasn't committed enough to close a deal. As Brad Feld said in one of the talks, "Do or don't do, there is no try" when it comes to fund-raising.

Have a bias towards technology risk

I tried to pretend that I was driven by the market in what I was doing, but in my heart I've always been driven by the changes in technology that make new things possible. Almost no investor will be current enough on the geeky details of whatever area you're working on to judge the risk of whether you can actually build something that's never been built before. Most of them are extremely familiar with the human side of the business world, so they do know what questions to ask about your market. Put simply, they can't tell if you're bullshitting about the delivery and barriers to entry to any untried technology, but they can spot bogus market estimates a mile away. Building around technology risk radically limits the pool of investors willing to bet on your company.

Be a lone founder

This one has been beaten to death elsewhere, but a single founder is a major red flag for most investors. It's like seeing someone eating alone in a restaurant. Sure there's all sorts of reasonable explanations but it leaves a question hanging – "what's wrong with that guy that I don't know about?". It also left me with zero time to make product progress while I was talking to investors.

Don't provide reassurance

I've always tried to be very honest that I'm groping and iterating my way towards something that works but that I don't have a master plan. My hope is that I'm finding a thousand ways not to build a lightbulb, and I'll soon find one that works. As a sales pitch to potential investors, that sucks, and I can understand why. If you're going to be putting your money into a company, you want the founder to exude confidence, even if you know that's irrational based on the facts. If nothing else it's a social mechanism that investors hope will motivate everyone to live up to their side of the deal. It's also a crucial part of leadership, something you need to keep the team motivated through the rocky times.

Going back to Techstars, I realize I'm supposed to say that the experience was fun. It wasn't. It was painful and the constant rejection was emotionally grueling, but it was incredibly valuable. I made some amazing friends, absorbed a massive amount of wisdom from some of the smartest people I've ever met, and I'd do it again in a heartbeat. I just hope I've learned enough from it all that I'll be making a whole new set of mistakes over the next year.

What’s my problem with money?

Goldcoins
Photo by Tao Zhyn

I really struggle with conversations about money, and this causes a lot of problems as I work on turning my big-bag-o'-technology into a business. My family never talked about money when I was growing up, and I absorbed that general dread of any financial discussion. The proper way to approach negotiations was as a true guesser, spend a lot of time figuring out what the other side would consider a fair price without directly asking, and only making a proposal once I was certain it was acceptable, to avoid the social calamity of a refusal.

This approach works great if you're living in a Jane Austen novel. It's only effective if dealing with somebody you have known for years, and you have the time to climb inside each others' heads. I'm constantly dealing with strangers who've grown up in a completely different culture so reading the signals is almost impossible. I'm having to push through my discomfort to become an asker, using logic to ignore all the warning lights that go off in my head flashing "You're being a jerk".

This is on my mind thanks to two recent conversations. A friend was overwhelmed by invitations to speak at conferences but had trouble saying no. He knew that he should be using money as a filter, but struggled to ask for a fee, even though the demand for his time clearly outstripped the supply! I'm having a similar problem with consulting, there's so many fascinating projects that people have asked me for help on that I've over-committed, and find myself with no time for my own work. I've avoided charging most of the people I've been helping, and when I have, I've gone with an hourly fee based off my salary at Apple. This came to a head recently when a partner rejected my standard rate as too low, explaining it just wouldn't be credible to his bosses, and recommended I double it!

When I heard that, at first I couldn't figure out why it felt so wrong. Even writing this post about it is a struggle, and I think it all comes down to that same 'guesser' model of negotiations I've carried in my head. Without feedback from the other side I always fell back on something external to anchor on, my previous salary, even if that didn't make sense. It's not like I don't need the money, I just paid my lawyers $14,000 (thanks Facebook) and I'm still using my savings from Apple to help pay the rent.

For my business to be successful I need to behave in a way that I grew up considering pushy. I can't completely blame my reluctance on being British, my brother grew up selling go-kart rides from the school bus-stop and graduating to trading in cars and real-estate. Logically I know that the market sets the rate for whatever you're selling, and a fair price is whatever people are willing to pay. My problem with money is in my own head, and the only solution is to learn ways of talking about it openly. I hope this post is a good start.