Heading back to Blighty

747
Photo by Caribb

Tomorrow me and Liz are flying to Heathrow, and the first couple of days we’ll be exploring London. During the day I’ll be introducing her to Kew Gardens, one of my favorite places on the globe, full of the most wonderful plants that the British Empire could plunder. We’ll be returning to London Walks in the evening, we’ve now done pretty much every tour they offer on previous visits, so we’ll be trying some for the second time. If you have a ghoulish streak, I highly recommend the Jack the Ripper walk, it’s pretty chilling to visit the sites of the murders, and even the pubs the victims were picked up from.

After that I’ll be in a happy whirlwind of family gatherings in my Cambridgeshire village, followed by a week in a cottage on the south-west coast of Ireland. Our main goal is to see some rain, after the last 6 months of SoCal heat. I just want to see some green grass that doesn’t rely on sprinklers.

I’ll be checking my email when I can, and probably squeezing in some development while we’re traveling, but there won’t be many updates here for a couple of weeks. My twitter account may get a bit more use if I’m able to SMS, but we’ll see if that’s too much of a technical challenge for me.

Come to Defrag!

Defrag08header_01

I’ve had a lot of new visitors recently interested in my browsing history experiments. If you’re seriously excited about the possibilities of this sort of implicit data analysis, you really need to join me at the Defrag conference at the start of November. I’ve already blogged about how much I got out of last year, but it’s the only place I’ve found where everyone just gets the possibilities of this stuff. You’ll be rubbing shoulder with everyone from technologists and journalists, to potential customers and investors, all very into figuring out where we can take these ideas.

Eric has just extended the early bird pricing until the end of August, and with my ‘pete1’ code you’ll get an extra $100 [Now with an extra $200 thanks to my speaker’s code and with no time limit, thanks Eric!] off on top. I hope to see you there.

How to pull browsing history from the image cache

Tracks
Photo by PigDump

I was trying to think of ways to make the browser history hack more useful. One of the limitations is that you can only tell if a user has been to an exact URL. So you can tell if someone’s recently been to the main New York Times page at http://nytimes.com/ but that won’t match if they went directly to http://nytimes.com/somestory.html . You can partially work around this by testing a lot of popular internal links (eg all the stories from the front page) but this is a lot harder.

That got me wondering if there was some common property that all the pages on a site are likely to share, something that leaves a trace I can test for. Most websites have a logo image that’s used on most of their pages, and I realized that if I could tell if an image was cached by the browser, I’d have proof that the user had visited some page there recently. How could I tell if an image was cached? Well, if it is in the cache, it should take a lot less time to create it than if it has to be fetched from the network. I gave this idea a quick test, and found that cached images were indeed created synchronously in Javascript, whereas uncached ones took some time. Rather than doing any complex callbacks, I checked the .complete property of each image immediately after creation, and rather to my surprise, this seemed reliable. Here’s an example of it in action, checking for a few common sites:

You can download the full example from http://funhousepicture.com/imagecachetest/imagecachetest.html, but here’s the heart of the test:

function isImageLoaded(image)
{
    return !((image.naturalHeight == 0 || image.naturalWidth == 0 || image.complete == false));
}

function isImageInCache(url)
{
    var image = new Image();
    image.src = url;
    return isImageLoaded(image);
}

There’s plenty of limitations to this approach. For one thing, the test itself pollutes the cache by loading all the images it’s testing, so you can only reliably run this once. All subsequent reloads will show every tested site as having been visited, until you clear your cache. I think I could fix this using cookies to hold the results after the first time, but I haven’t implemented that yet. You also have to identify a common image across the range of pages you’re testing, and with redesigns that URL is likely to change every few months at least. It’s also highly-dependent on how long an image remains in the cache.

It’s exciting to be able to pull out this sort of history information, it’s a good complement to the link style checking, and brings some of the possibilities of the implicit web a little closer to realization.

Santa Monica Mountain trailheads now on Google Maps

Trailheadmap

It took her several days, but Liz has just finished off her map of the trailheads in the Santa Monica mountains. There’s descriptions for each of the locations, describing the trails they lead to, how much parking there is, nearby campsites, which agency owns the land and if bikes or horses are allowed. This was originally going to be just so she could easily link to the meeting points for trailwork from the SMMTC website, but it’s turned into a great resource for anyone who’s interested in getting out into the mountains.

I’m really proud of what she’s accomplished, and it demonstrates how Google’s map-building application opens the door to anyone building rich maps, in a way that just wasn’t possible before. Maybe this will help a few more people discover the beautiful wilderness we have on our doorstep here in LA.

How to speed up the history testing hack

Speedometer
Photo by Abed Dodokh

The original browser history Javascript ran very slowly in Internet Explorer. When it needed to check thousands of sites, like for the gender test or my tag cloud, it could take several minutes. If it was going to be generally useful, I needed to speed it up a lot. The first thing I did was move the test link creation over to the server side, so there was a prebaked html div containing all the links, rather than building it on the fly. This didn’t make much difference though, so I started poking at the testing code. What I found was that switching from array accessing to go through all the links towards grabbing the next sibling of an element seemed to make a massive difference. I’ve included the function below, and it now only takes a couple of seconds to check thousands of URLS:

function getVisitedSites()
{
    var iframe = document.getElementById(‘linktestframe’);

    var visited = [];

    var isIE = iframe.currentStyle;
    if (isIE)
    {
        currentNode = iframe.firstChild;
        while (currentNode!=null)
        {
            if (currentNode.nodeType==1)
            {               
                var displayValue = currentNode.currentStyle["display"];
                if (displayValue != "none")
                    visited.push(currentNode.innerHTML);
            }
            currentNode = currentNode.nextSibling;            
        }
    }
    else
    {
        var defaultView = document.defaultView;
        var functionGetStyle = defaultView.getComputedStyle;

        currentNode = iframe.firstChild;
        while (currentNode!=null)
        {
            if (currentNode.nodeType==1)
            {       
                var displayValue = functionGetStyle(currentNode,null).getPropertyValue("display");
                if (displayValue != "none")
                    visited.push(currentNode.innerHTML);
            }
            currentNode = currentNode.nextSibling;
        }
    }

    return visited;
}