How to pull browsing history from the image cache

Tracks
Photo by PigDump

I was trying to think of ways to make the browser history hack more useful. One of the limitations is that you can only tell if a user has been to an exact URL. So you can tell if someone’s recently been to the main New York Times page at http://nytimes.com/ but that won’t match if they went directly to http://nytimes.com/somestory.html . You can partially work around this by testing a lot of popular internal links (eg all the stories from the front page) but this is a lot harder.

That got me wondering if there was some common property that all the pages on a site are likely to share, something that leaves a trace I can test for. Most websites have a logo image that’s used on most of their pages, and I realized that if I could tell if an image was cached by the browser, I’d have proof that the user had visited some page there recently. How could I tell if an image was cached? Well, if it is in the cache, it should take a lot less time to create it than if it has to be fetched from the network. I gave this idea a quick test, and found that cached images were indeed created synchronously in Javascript, whereas uncached ones took some time. Rather than doing any complex callbacks, I checked the .complete property of each image immediately after creation, and rather to my surprise, this seemed reliable. Here’s an example of it in action, checking for a few common sites:

You can download the full example from http://funhousepicture.com/imagecachetest/imagecachetest.html, but here’s the heart of the test:

function isImageLoaded(image)
{
    return !((image.naturalHeight == 0 || image.naturalWidth == 0 || image.complete == false));
}

function isImageInCache(url)
{
    var image = new Image();
    image.src = url;
    return isImageLoaded(image);
}

There’s plenty of limitations to this approach. For one thing, the test itself pollutes the cache by loading all the images it’s testing, so you can only reliably run this once. All subsequent reloads will show every tested site as having been visited, until you clear your cache. I think I could fix this using cookies to hold the results after the first time, but I haven’t implemented that yet. You also have to identify a common image across the range of pages you’re testing, and with redesigns that URL is likely to change every few months at least. It’s also highly-dependent on how long an image remains in the cache.

It’s exciting to be able to pull out this sort of history information, it’s a good complement to the link style checking, and brings some of the possibilities of the implicit web a little closer to realization.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: