Five short links

Lynx
Photo by Ucumari

The Institute for Unnecessary ResearchArt, science and magic, makes me wish I was at NetSci to hear about Cybernetic Bacteria

Pentaho – Some friends have been producing amazing results with this open-source business intelligence package, it's a really impressive framework for exploring your data

An analog video synthesizer – I'm fascinated by the outer reaches of video technology, and I love this application of old-school analog technology to video stream by LZX

CreditUnions.me – I've ranted about this before, but if you're in the US and want your money to be lent out to local businesses rather than pumped into financial speculation and prefer customer service from actual humans, you should check out this site to find a nearby credit union

Karmasphere – I haven't spent much time with it, but so far this friendly interface to Hadoop looks very promising. Elastic MapReduce does a good job of making Hadoop available to the masses, but this goes a long way to make it a lot less intimidating too

From pub to pub on the West Highland Way

Mcewans

One of the joys of hiking in Britain is that you can end every day with a pint of beer. Even the remotest hamlets will have a pub, even if there's barely enough inhabitants to staff it, let alone provide customers. To take advantage of this, we set out on the mother of all pub crawls – 40 miles from bar to bar along the West Highland Way.

Sitting in Lauders pub in the center of Glasgow, I savored my first pint of '80 shilling', the national beer that I've never seen outside of Scotland. It's closest to what you'd call a 'bitter' in England, dark, rich and not fizzy like the typical 'lager' beers in the US. Every brewery has its own version and you'll rarely go wrong in Scotland if you ask for 'a pint of 80'. As we enjoyed our pints, we were entertained by a group of 12 year-olds smoking cigarettes and spattering each other by stamping on ketchup packets.

The next day, we caught the train to the start of our hike, Crianlarich, about half-way along the Way from its beginnings in Glasgow. The station itself had the world's tiniest tea-room squeezed into what appeared to be a converted toilet cubicle on the platform, but fortified by visions of a luke-warm pint at the end of the day we started out on our hike.

Though our first stage was only six miles, we stumbled over the sort of history that tends to accumulate in a small country that's been inhabited for 8,000 years. We passed a pond that Robert the Bruce had thrown away his sword whilst being pursued, and slightly later where he stopped and defeated the pursuers. From my memories of Dundee's nightlife, I'm guessing he just head-butted them into submission.

We rapidly realized that most people weren't carrying 30 pound packs, and instead had taken advantage of the van shuttle services to take their gear between each stop. That just seemed like cheating, but then we weren't the real hard-core, carrying all their camping gear in even heavier packs, or even running the whole 95 miles in under 24 hours!

Just before we arrived in Tyndrum, a large patch of bare earth accompanied a cheery sign explaining that the are had been used for hundreds of years to crush lead ore using child labor, leaving it poisoned and lifeless, and pointing just off the trail to our B&B. Happily it was upstream from the old leadworks, but it was yet another reason to stick to refreshments other than tap-water.
Glengarry

Wandering bedraggled into Glengarry House, we were immediately greeted by the owners Ellen and Pat. Wonderfully helpful, they magicked up a welcome cup of tea as we got settled in our room, and we started to regret our plans to eat out that night as we smelt the meal they prepared for their other guests. Steeling ourselves, we wandered half a mile down the way to Tyndrum proper, and made ourselves at home in Paddy's Bar. My McEwans 80 shilling hit the spot, but the place itself was fairly anonymous, a bit like a converted portakabin. Moving across the road for food, The Real Food Cafe was stupendous, with ana amazing menu of lovingly prepared versions of British classics like fish and chips, all using local, organic and even gluten-free ingredients. The seating was at two long bars, which led to a lot of socialising between the parties.

Next morning we chatted to our hosts a little more (apparently April is a much quieter month than May, with just as good weather) we strapped on our packs and, after a brief diversion to The Green Welly Stop for postcards, we headed for our next stop at Bridge of Orchy. Again, this was a short stage, only 7 miles, but it still left our feet aching as we came down the hill into the village. I'd booked us into the main rooms at the Bridge of Orchy Hotel (they also have a bunkhouse), and the staff treated us like kings. Almost entirely staffed by South Africans, both my pint of Belhaven Best and the following meal of salmon were superb and delivered with great ceremony. The barmaid offered me a tasting glass of some of the other beers while we were browsing, and Liz was handed an impressive data sheet on the different characteristics of the whiskies after she asked the bar tender's advice. The receptionist enthusiastically told us "That's sensational!" after she heard that we'd done an extra reconnaissance hike up the hill that evening, which became our phrase of the day.

Day 3 was the first of the tough hikes, 12 miles to the Kingshouse pub with 1500 feet of elevation. We had a slog up and over the ridge we'd explored the night before, passing the Inveroran hotel and campground, then a seemingly never-ending march up a sloping old drovers road, headed to Ba Bridge. We were apprehensive about this section because we'd seen signs warning that the track was going to be used as part of the Six Day Trials motorbike event, and we did get passed by a few, but they were remarkably quiet, more like mopeds than the Harleys we used to live next door to in California. It was fascinating to watch 'guys trying to ride a motorbike up a waterfall' as Liz put it, and I was actually glad we caught them up close.

The Kingshouse pub appears as you come over the final ridge, and it looks deceptively close. When we finally arrived, we both nursed a Calder's 80 Shilling and called for a taxi to our B&B, since Kingshouse is alone in the middle of a stunningly beautiful stretch of moorland and they'd been booked up even 5 months ago! The taxi was going to take a while to get to us, so we ended up having a second pint which led to a very chatty ride.

Kingshouse

Kinlochleven itself is in a valley at the head of a loch, surrounded by steep hills wooded with beech trees that looked golden with the sun on their spring leaves. The view took my breath away, and our room at The Highland Getaway had a back window looking out onto a wide stream. Wandering around the town, we stopped at the Tailrace Inn for dinner and another pint. It felt like a traditional British pub, my Tennant's Ember 80 shilling was almost spicy, but Liz's request for whiskies was met by the barman with a scowl and a gesture at the shelf behind him. A final whisky at the Harlequin Cafe attached to our B&B was a lot more pleasant, with the dark wood and red wallpaper that every pub needs.

Kinlochleven

We skipped the Kingshouse to Kinlochleven stage of the way, missing out on the Devils Staircase (Beelzebub gets all the cool geography), but our final day was the real test. 16 miles and 2,000 feet of elevation to make it to Fort William, with our heavy packs on. It was just as painful as we'd imagined, but we finally stumbled into Fort William and the end of the way. With astonishing lack of foresight, nobody had built a pub there, and we ended up walking all the way to the town center until we found liquid refreshment. The Crofter gave me another chance to try McEwans, while Liz fell back to the old favorite of Guinness. Wandering down the High Street, I discovered that the little gift shops selling figurines had apparently progressed onto Stripper Fairies since my last visit:
Stripperfairies

I had high hopes for our final lodging, the Cruachan Hotel. As Liz said when she saw its website "it looks like a castle!", and I'd gone crazy and spent $200 for the night. I had my first misgivings when the sign out the front advertised vacancies and $75 rooms for two. That first impression was borne out by the pokey room that looked like a time capsule from 1973, complete with no toilet lid, a single pillow each that was flat as a pancake, mould in the bathroom and strange stains on the wall heater.

We really didn't care though, we'd made it! It was an amazing trip, I just wish we could have had more time and managed the whole 95 miles. The West Highland Way is a hidden treasure, I was surprised at how few people we encountered, and almost everyone was local. If you're considering a visit to Scotland and want to experience that amazing landscape up close, I can't think of a better vacation. Just don't forget to check out the '80'!

Scottishmoor

Five short links

Vikinggolf
Photo by Jeff the Trojan

I'm flying home from the UK today (volcanoes permitting) so if you're waiting for a reply to your email I should be caught up soon.

Facebook files criminal charges against startup for violating terms of service – I think Arvind nails it with his analysis of Facebook's real problem. They're scared that they're losing the trust of their users, and they've fallen back to legal bullying rather than trying to solve the thorny issues that caused it.

Top 10 reasons you should quit Facebook – The incredible popularity of Dan Yoder's article makes it clear  there's deep concern about Facebook in the tech community. I'm still amazed by everything they've achieved, and don't want them to jump the shark, but if this starts resonate with people outside the bubble then they're in deep trouble.

Amazon S3 file deletion fail – Amazon make it very hard to delete large amounts of data once you've uploaded it. You can't even delete a bucket without removing all of the contents first, which makes no sense at all. Of course, you could always delete you account, and somehow they can remove the data then!

Location-based art: Audio Graffiti – Art is an incredibly underrated driver of technology, artists with a technical bent often come up with amazingly innovative uses of new tools, and this presentation made me miss my days of hacking on VJ software.

Jedi vs Tescos – I spent three years working for Tescos on the checkouts, and learned an amazing amount about customer service from one of the few UK stores to care about it. I thought the spokesman's comments at the bottom were pitch-perfect, "If Jedi walk around our stores with their hoods on, they'll miss lots of special offers". (via Overlawyered)

Five short links

Chain
Photo by Wink

Tribalytic – My friends Alex and Tim have been doing some fascinating work applying statistical analysis to Twitter conversations. Their 'space shuttle control panel' interface can be a bit off-putting, but if you dig down you can see insights into the Chirp talks that prompted the most traffic and spot Binh from Klout looking forward to the after-party.

Breadcrumbs – Kate McKinley's created a great demonstration of how many different ways there are to store data about a user, so advertisers can identify them as they move around the web. She's also produced a good paper covering the details, which reminds me of Arvind's work on 'super-cookies'.

Spinn3r have announced in their email newsletter that they're now crawling public Facebook pages and making the results available as a feed for their commercial subscribers. "We're indexing all Facebook public pages, which do not require login, including public fan pages and their wall posts, videos, albums and pictures. We also index Facebook public groups including topics and the comments responding to these topics. The current volume is in excess of 50k permalinks and 30k comments per hour."

Sendgrid have received $5m in VC funding. I went through Techstars this summer with Isaac and Jose, and loved their quiet focus on solving a vital problem, helping companies reach their email subscribers without ending up in the spam box. They have been kicking ass and earning revenue, and this injection of cash will help them reach even higher.

"Plates of Spaghetti" graphs – Almost all visualizations are terrible at communicating information, but are often fantastic marketing devices, drawing people into looking into the source data. I like the quote "The “data visualizations of the year” really are impressive if you think
of them as super-cool illustrations (replacements for the usual photos
or drawings that might accompany a newspaper or magazine article) rather
than as visual displays of quantitative information
". I've long been mulling a post entitled "Most visualizations are useless" (including mine!)

Five short links

Sausages
Photo by Tammy Green

I'm flying off to visit my family in the UK today, but I have a backlog of interesting URLs I wanted to blog about, so I'm temporarily stealing Nat Torkington's Four Short Links format. However, since I go up to 11, my version has five.

Thoughts from the Man Who Would Sell the World, Nicely – I've long been a fan of 80leg's service, they're democratizing crawling. How will the crawled companies react to this now that literally anyone can download millions of profiles from services like LinkedIn and MySpace, with no licensing or terms-of-service restrictions?

Fetch Technologies – On the same topic, I don't know that much about Fetch but they seem to be a sophisticated and well-funded company based on crawling the public web to gather information for commercial purposes.

The World Bank Bares All – I was very excited to discover that the World Bank offers over 1,000 different measures for countries for free. Not only that, but you can download a CSV file of all the data, instead of being restricted to an API. I'm now using this for an upcoming project, I hope more providers consider data dumps in addition to APIs, they open up so many more uses.

IndieMapper.com – A well-produced service for visualizing geographic data on the web. It's great to see more GIS tools migrating online, it's opens up the results to a much larger audience.

Never hire job hoppers. Never. They make terrible employees – Mark's since walked this article back a bit. It reminded me of the evidence that we all have a bias to hire people exactly like ourselves, and Bob Sutton's take on it: "Interviews are strange in that people have excessive confidence in them, especially in their own abilities to pick winners and losers — when in fact the real explanation is that most of us have poor and extremely self-serving memories."

How to look up locations from IP addresses for free

Youarehere
Photo by Mag3737

I'm working on a project where I need to convert large numbers of IP addresses to latitude/longitude positions, and I was pretty depressed looking at Quova's rates starting at $8 per thousand queries. I was happy to lose a bit of quality for a cheaper rate, so I was overjoyed to come across MaxMind's free database of city-level IP lookups. Even better, I could install it on my own server rather than making remote API calls, which makes dealing with large amounts of lookups a lot quicker.

There was some example PHP code available, but it had PEAR dependencies I'd rather avoid, so I made some alterations and uploaded my sample code to github.com/petewarden/geoip_example with a live demo running at web.mailana.com/labs/geoip_example/

Before you can run it on your own server you'll need to install the data files, either using the one included in the package or downloading the latest from http://www.maxmind.com/app/geolitecity

Once you have the GeoLiteCity.dat file downloaded and unzipped, copy it to /usr/local/share/GeoIP, or update the code to reflect the location you've actually installed it in.

Big thanks to the MaxMind folks for making this available under the LGPL, they'll definitely be getting my business next time I need a paid geo-location service.

Is a phone book for the internet emerging?

Glasgowphonebook
Photo by Martin Deutsch

A programmer's basic instinct is to automate any manual task you find yourself doing repeatedly. That's why I'm amazed we haven't built better solutions for finding people online. Most people go through the following steps when they want to know more about someone they've just met:

1 – Type a name into Google, LinkedIn or Facebook and see what public profiles appear.

2 – Figure out which profiles are for right person based on what else you know about them, either a rough location, their job, friends you have in common or other sites they list.

As more and more information is published about individual users step two gets easier, because you can cross-check across multiple accounts. Maybe my LinkedIn profile doesn't give enough details to be sure that I'm the Pete Warden you met, but it links to my Twitter account where I'm rambling about my upcoming UK visit, and that fits with the funny accent you remember.

What's missing is a good set of tools to assist the second step. It's silly to have people wasting time doing this sort of detective work manually, when some simple automation would speed up the whole process. The data on Twitter, LinkedIn and other public profiles has some structure, it just requires some smarter indexing on the search engine side to make use of it. My Twitter profile lists data in hCard format so it's easy to figure out that http://twitter.com/petewarden is about a person called "Pete Warden" based in Boulder, CO. My LinkedIn profile also uses hCard and describes a person called "Pete Warden" in the Greater Denver Area. Why not make a wild guess and present all the profiles that are close matches like that together in the search results? Sure, the grouping will be wrong sometimes, but most of the time it will cut out a lot of messing around on the user's part to do the same process manually.

Google's Profiles would be great holders for that sort of information, but they require users to fill out yet another set of forms. Sites like 123people.com try to automate the whole process, but frankly don't do a good job and are packed with off-putting ads. 

It's the spread of services like Gist, Xobni and Rapportive that gives me hope that change is on the horizon. Data flows into them from either their own customer base or providers like Rapleaf, and they're starting to build unified pictures of people online. Just like a phone book in the old days, you should be able to enter someone's name and get whatever information they've chosen to publish about themselves.

Sheep, sex and Nazis

Sheep
Photo by Gisela Giardino

Maybe it's growing up with British tabloid headlines, but I wish Sam Apple had chosen "Sheep, sex and Nazis" as a title instead of "Schlepping through the Alps". The phrase is his own description of the world he chronicles, and is closer to the spirit of the book than the actual title which had me expecting a light-hearted travelogue in the vein of "Round Ireland with a Fridge".

Sam's the editor of The Faster Times and he got in touch to offer his sympathies after my run-in with Facebook. Since anybody who likes my blog obviously has excellent judgment, I googled him and was intrigued to see he'd written about his experiences as a young American Jew following a wandering Yiddish-singing shepherd around Austria.

I started reading unsure what to expect, but what struck me was the honesty of his descriptions. That quality sounds unremarkable, but is amazingly hard to achieve because people are so contradictory. Whenever you're telling a story rough edges get rounded off to make it flow, even if it's just omitting certain details. He manages to capture an intimate portrait of a few people he got close to, and through them of a whole country with a lingering dark past, without simplifying details to make answers to their dilemmas seem easier than they truly are.

He flies to Austria to learn more about Hans Breuer, the last shepherd to wander through the Austrian Alps with his flock. Hans' father is a non-practicing Jew, and Hans himself has become obsessed with the rich Yiddish culture from pre-war Europe, taking it on himself to memorize and perform the old songs wherever he can. Sam's family came to the US from the old world before the Nazis came to power, and he's aghast at Austria's post-war response to the Holocaust. The heart of the book is his attempt to pin down that collective failure by understanding individuals, and his honesty forces him to acknowledge the less noble sides of his own quest, Hans' faults and even the human side of the single open anti-semite he tracks down. These nuances mean you can't help but see parts of yourself in all the characters, and realize that people much like us committed and covered up horrors that are hard to imagine.

If you've ever enjoyed Orwell, I recommend you pick up Schlepping. As he put it, "To see what is in front of one's nose needs a constant struggle", and Sam's obvious struggle to be true to the reality of his subjects brings the book into the same league as "The Road to Wigan Pier" in its insights on a crucial topic.

How to download a list of your Facebook fans

Windturbines
Photo by Tochis

A friend recently pointed me to this New York Times story on web coupons and privacy. Aside from the implications of tying in your social accounts to your buying habits, what struck me was Jonathan Treiber's assertion that "when someone joins a fan club, the user’s Facebook ID becomes visible to
the merchandiser
".

One of the biggest complaints I've heard from companies involved in Facebook fan pages is that they don't know how their fans are. In traditional direct marketing they have a list of customer names and their postal or email addresses, but with fan pages they only get permission to contact their fans indirectly. Everything has to go through Facebook, and at any point the network could decide to cut them off or charge them to talk to their own customers. Presumably that control has a lot of potential value to Facebook, they don't offer an official API for page owners to get a list of their fans, but Jonathan's quote got me wondering if there was another way?

Googling around, I ran across this post from Gist's very own Adam Loving explaining how he'd managed to download all the fans for his page. The process is a bit technical, and may well violate the ToS so use it at your own discretion, but it sounds like it's already in widespread usage by page owners.

The story itself makes me think that Beacon 2.0 is likely to be run by third-party companies on top of Facebook and other social network data. There's now widespread access to public profiles across all the networks, it's going to be hard to stuff that genie back into the bottle.

Is making public data more accessible a threatening act?

Megaphone
Photo by Altemark

One of the most interesting questions to come out of the Facebook debate was about making public data more easily accessible. Everything I was looking at releasing was available through a Google search and through many other commercial companies, so in a simplistic view it was already completely public and releasing it in a convenient form made no difference. However that doesn't match our intuitive reactions, we are a lot more relaxed when data is theoretically available to anyone but hard to get to than when there's an easy way to access it.

One of my favorite researchers in this area, Arvind Narayanan, recently started a series of articles that try to turn this gut reaction into a usable model. I also spent a very productive lunch with Jud Valeski, Josh Fraser and Jon Fox hashing out the implications of the coming wave of accessibility, so here's a few highlights from that discussion.

Prop 8. Information about donors to political campaigns has always been public, but traditionally required a visit to city hall to dig through piles of paper. Suddenly the donors behind Prop 8 in California found themselves listed on a map anyone could access on the internet. While predictions of violence or boycotts didn't materialize, Scott Eckern ended up resigning from his job once his donation became widely known. I'm pretty certain he wasn't aware that his donation would be public knowledge, it's a clear case where the the distribution channel made the information much more powerful.

InfoUSA. Imagine a thought experiment where I downloaded the income, charitable donations, pets and military service information for all 89,000 Boulder residents listed in InfoUSA's marketing database, and put that information up in a public web page. That's obviously pretty freaky, but absolutely anyone with $7,000 to spare can grab exactly the same information! That intuitive reaction is very hard to model. Is it because at the moment someone has to make more of an effort to get that information? Do we actually prefer that our information is for sale, rather than free? Or are we just comfortable with a 'privacy through obscurity' regime?

So what's my conclusion? On the one hand, the web has created so many amazing innovations because it's a fantastic way to make information more available, and initial privacy concerns have faded into the background as people become more used to services. On the other, the jury's not back on how the revolution will end. Is everyone really going to be their own public broadcaster on Twitter, or are we going to retreat into more private forums in the wake of future freakouts? I don't know the answer, but everyone working in this area needs to be thinking about more than the technical aspects of data accessibility.