If you’re stuck in the office on Monday waiting for a compile to finish and you wish you were exploring the wilderness, check out this diary from Liz’s cousin Erin as she tackles the Pacific Crest Trail 2600 miles from Mexico to Canada. I’m really impressed she made it all the way, and I’m looking forward to seeing what she takes on next!
Monthly Archives: August 2008
Heading back to Blighty
Tomorrow me and Liz are flying to Heathrow, and the first couple of days we’ll be exploring London. During the day I’ll be introducing her to Kew Gardens, one of my favorite places on the globe, full of the most wonderful plants that the British Empire could plunder. We’ll be returning to London Walks in the evening, we’ve now done pretty much every tour they offer on previous visits, so we’ll be trying some for the second time. If you have a ghoulish streak, I highly recommend the Jack the Ripper walk, it’s pretty chilling to visit the sites of the murders, and even the pubs the victims were picked up from.
After that I’ll be in a happy whirlwind of family gatherings in my Cambridgeshire village, followed by a week in a cottage on the south-west coast of Ireland. Our main goal is to see some rain, after the last 6 months of SoCal heat. I just want to see some green grass that doesn’t rely on sprinklers.
I’ll be checking my email when I can, and probably squeezing in some development while we’re traveling, but there won’t be many updates here for a couple of weeks. My twitter account may get a bit more use if I’m able to SMS, but we’ll see if that’s too much of a technical challenge for me.
Come to Defrag!
I’ve had a lot of new visitors recently interested in my browsing history experiments. If you’re seriously excited about the possibilities of this sort of implicit data analysis, you really need to join me at the Defrag conference at the start of November. I’ve already blogged about how much I got out of last year, but it’s the only place I’ve found where everyone just gets the possibilities of this stuff. You’ll be rubbing shoulder with everyone from technologists and journalists, to potential customers and investors, all very into figuring out where we can take these ideas.
Eric has just extended the early bird pricing until the end of August, and with my ‘pete1’ code you’ll get an
extra $100 [Now with an extra $200 thanks to my speaker’s code and with no time limit, thanks Eric!] off on top. I hope to see you there.
How to pull browsing history from the image cache
I was trying to think of ways to make the browser history hack more useful. One of the limitations is that you can only tell if a user has been to an exact URL. So you can tell if someone’s recently been to the main New York Times page at http://nytimes.com/ but that won’t match if they went directly to http://nytimes.com/somestory.html . You can partially work around this by testing a lot of popular internal links (eg all the stories from the front page) but this is a lot harder.
You can download the full example from http://funhousepicture.com/imagecachetest/imagecachetest.html, but here’s the heart of the test:
return !((image.naturalHeight == 0 || image.naturalWidth == 0 || image.complete == false));
var image = new Image();
image.src = url;
There’s plenty of limitations to this approach. For one thing, the test itself pollutes the cache by loading all the images it’s testing, so you can only reliably run this once. All subsequent reloads will show every tested site as having been visited, until you clear your cache. I think I could fix this using cookies to hold the results after the first time, but I haven’t implemented that yet. You also have to identify a common image across the range of pages you’re testing, and with redesigns that URL is likely to change every few months at least. It’s also highly-dependent on how long an image remains in the cache.
It’s exciting to be able to pull out this sort of history information, it’s a good complement to the link style checking, and brings some of the possibilities of the implicit web a little closer to realization.
Santa Monica Mountain trailheads now on Google Maps
It took her several days, but Liz has just finished off her map of the trailheads in the Santa Monica mountains. There’s descriptions for each of the locations, describing the trails they lead to, how much parking there is, nearby campsites, which agency owns the land and if bikes or horses are allowed. This was originally going to be just so she could easily link to the meeting points for trailwork from the SMMTC website, but it’s turned into a great resource for anyone who’s interested in getting out into the mountains.
I’m really proud of what she’s accomplished, and it demonstrates how Google’s map-building application opens the door to anyone building rich maps, in a way that just wasn’t possible before. Maybe this will help a few more people discover the beautiful wilderness we have on our doorstep here in LA.
How to speed up the history testing hack
var iframe = document.getElementById(‘linktestframe’);
var visited = ;
var isIE = iframe.currentStyle;
currentNode = iframe.firstChild;
var displayValue = currentNode.currentStyle["display"];
if (displayValue != "none")
currentNode = currentNode.nextSibling;
var defaultView = document.defaultView;
var functionGetStyle = defaultView.getComputedStyle;
currentNode = iframe.firstChild;
var displayValue = functionGetStyle(currentNode,null).getPropertyValue("display");
if (displayValue != "none")
currentNode = currentNode.nextSibling;
Can I guess your politics from your browsing history?
I’m convinced there’s fun to be had with history analysis, so I’ve created a politically focused Facebook app based on my previous tag cloud. Here’s the results of analyzing your history:
I’ll have some more details on the optimizations I did to make the analysis usable on Internet Explorer tomorrow.
Where to go if you want startup inspiration
I’m a comparative late-comer to Twitter, but I’ve started to get hooked. One of things that pleasantly surprised me is how useful it can be. You can ask questions, or respond to them, and generally do the flea-picking off each others backs that’s required to keep relationships alive, all through a very zen interface.
As someone who reads the back of cereal packets if there’s nothing else to hand, I try to direct my reading addiction into useful channels, mostly towards sources of startup advice and inspiration over the last few years. This has meant personal blogs like Brad’s, Fred’s, Don’s, or topic-based ones like VentureHacks or AskTheVC. The trouble is blog posts are time-consuming, which means there’s a big barrier to passing on a quick link, so posts only happen occasionally. That’s where Sam Huleatt has stepped in, with a use for Twitter I’d never thought of.
His new startuptweet stream is collecting a massive number of videos, stories and blog posts on things that startups care about, like a Stanford introduction to the VC process or Paul Graham discussing how to motivate great hackers. He’s already posted a large number of high-quality resources in just a few days, and I’m hopeful that the ease of posting will make it possible for him to keep up the pace. Check out the full site, and start following!
The insanity of retention policies
I was doing some more research into other companies doing enterprise document analysis, and the combination of staring at this page from PSS Systems and having just finished Bleak House made me step back and realize what a fundamentally dumb idea retention policies for legal reasons are.
The one great principle of the English law is to make business for itself. There is no other principle distinctly, certainly, and consistently maintained through all its narrow turnings. Viewed by this light it becomes a coherent scheme and not the monstrous maze the laity are apt to think it. Let them but once clearly perceive that its grand principle is to make business for itself at their expense, and surely they will cease to grumble.
Retention policy is a euphemism for deletion policy. Emails over a certain age are deleted, even from backups, usually after 6 or 12 months. The sole reason for this is so that if you’re sued, you aren’t able to hand over older documents, and there’s no question that you deleted them specifically out of a guilty conscience, it’s just your blanket policy. As one of Dicken’s lawyers says:
Being in the law, I have learnt the habit of not committing myself in writing.
There’s no good technical reason for deleting old emails. You’ve made those backup tapes, it’s actually more work to make sure that old ones are destroyed. You also have to make sure you do keep any messages that relate to currently active lawsuits, which is where PSS Systems comes in by semantically analyzing documents to spot those that might be needed in discovery.
Email is the collective memory of an organization, and removing old emails is deliberate corporate amnesia. It’s needed because so many recent court cases have hinged on ‘incriminating’ memos, and with thousands of messages written every day, it’s almost certain that somebody’s dry sarcasm could be painted as deadly serious in front of a jury.
Why does this matter? You’re losing the history of the company. Unless you have explicitly copied them, all those old conversations and attachments you might need to refer back to one day are gone. It’s like putting a back-hoe through an archaeological site, you can never get that information back. Just like archeology, I’m convinced that there will be new techniques in the future that can pull more information out of that data than we can today. Old email should be an asset, not a liability. Unfortunately as long as the legal climate keeps companies terrified of a losing the litigation lottery, they’ll keep deleting.
Just a good little pointless thing?
Robert posted a comment on my BrainCloud post saying that "its a good little pointless thing thats always fun". That’s a pretty fair description for what it does right now, it’s basically a lava lamp for the internet. So why am I so interested in the technology behind it?
The promise of the implicit web is based on knowing information about your users without requiring them to manually enter it. It seems silly that you have to type in all your friends to Facebook when your email inbox makes it pretty clear who most of them are. If I knew which products you’d bought, or which sites you’d visited, I could figure out which to recommend in the future.
There’s a pretty wide consensus that there’s lots of interesting applications we could write based on data like that. The trouble is security concerns make it almost impossible to gather it unless you’re the owner of a well-used site. Amazon can offer recommendations because they have information on all their customers buying habits. No startup can build that application or anything like it without the data, so there’s a barrier to entry that favors the big incumbents.
One approach to get over the barrier is breaking out of the security sandbox with a browser extension. Medium is taking that route, and offering some interesting new search tools thanks to all the data they can gather. It’s really, really hard to get people to install anything though, which makes it a time-consuming and expensive route to follow.
That’s why my eyes lit up when I saw Mike’s social history hack. For the first time, there’s a way of gathering some implicit data without either being a big site owner or requiring installation. There isn’t a killer app for it yet, but I’m hopeful once we all poke at the technique’s limitations, we can figure out some compelling uses.