One of the things that’s driven me to experiment with Java applets, Firefox extensions and now IE BHOs, is the conviction that the only way we’ll get to a richer and more usable web is through client-side code.
There’s lots of data stored on servers across the world, data that could be used to create some really interesting applications. For example, if there’s a URL that my facebook friends have tagged in delic.ious, or dugg, or stumbledupon, I want any links to that page marked with a gold star as I browse the web. As a human user, I have access to all the data I need to do that. Ignoring the digital life aggregation issue until another post, there’s a fundamental problem that’s blocking this.
Can’t get the data!
That data is mostly only accessible as human-readable web pages, or as an RSS XML feed. This means there’s no reliable way for a computer to take those text representations of the data, and figure out what they actually mean. It’s a weak AI problem, but even weak AI problems are hairy. It’s not quite as bad as trying to turn a hamburger back into a cow, but it’s close.
This is something that GoogleHotKeys has to do, to extract the search result links from all of the other links on a typical results page. Even for the comparatively simple and uniform format used for those pages, it took some time to get right, and it’s still vulnerable to small changes on Google’s side, such as when they added Orkut to their menu.
How do we fix this?
It would be much better if all sites gave access to their data directly, in a computer-readable form through an API. Facebook has had great success doing just that, Amazon has had a flash-based API available for a couple of years, and Google used to give you access to search results. There isn’t much standardization in what these APIs return though, apart from usually being XML (apart from some JSON mavericks). They’re also usually subject to a lot of restrictions by the site owners, can be shut down like Google’s, and easily run foul of security restrictions like the cross-domain policy unless you run through a proxy.
The semantic web is a different approach to solving this problem, it allows you to specify the meaning of the text representations, eg this link is a result link, this link is a friend link. Unfortunately, there’s no point in site owners going to the extra work of embedding that information into their web pages if there’s no compelling applications that use it. And since no sites implement it, there’s no compelling applications! It’s chicken and egg.
What this means for client-side tools
This is why client tools are so important. They aren’t subject to the draconian security restrictions of server-side code. They have access to the same data as users, without the arbitrary restrictions site owners apply to APIs. This sounds like an invitatation to a DoS attack, but they can still block an individual machine if you start being obnoxious and flooding their servers. With a server-based proxy, they can easily block you on a whim, with client-side they can’t even distinguish you from a regular user as long as you’re well-behaved.
It’s a great environment for innovating in ways that just aren’t possible on the server. This is where I think we can build some compelling and practical semantic web tools, cutting through the gordian knot that’s blocking development. I’m convinced we’ll see some appear within the next couple of years.
Why aren’t there more client-side tools?
There’s two big obstacles. Client-side web tools are hard to write. Development environments are poorly documented, buggy and don’t have big user communities. This is something I’m trying to do my small bit to help with, by documenting how to develop for IE. I don’t see this changing in the near-term, but if there were more commercial demand, developers would overcome this.
More serious is the distribution problem. It’s hard to persuade users to download and install your tool, when server-side tools can give near-instant satisfaction, and don’t raise security concerns. I don’t have a good answer to this one yet, but believe me, I’m working on it!