A search engine is designed to take some keywords, and return web pages that match them. What fascinates me is that that mapping of words to pages could easily be done in reverse. Given a particular web page, tell me what keywords are most likely to find it. My hunch is that that set of words, maybe presented as a tag cloud, would give a pretty good summary of what the page is about.
The closest example I’ve found out there is this blog entry. It’s got what appears at first to be a fairly random list of keywords, but digging into them, it looks like Darrin is a Vancouver-based Titanic fan who’s posted about the beautiful agony art project and has done a lot of wedding posts.
What’s really interesting about this is that the search terms that show up aren’t just based on textual frequency within the site, they’re also the product of how often people search for particular words at all. Essentially it’s giving a lot more weight to terms people actually care about, rather than just all terms that are statistically improbable.
At the moment the only way to implement this is to process an individual site’s visitor logs to pull out the frequency of keyword searches that lead to a visit. However search engines know the historical frequency of particular queries terms up front, so it would be possible for them to take an arbitrary new page and simulate which searches would be likely to land on it. You could do something similar for a mail message, essentially you’d be filtering statistically improbable phrases to get statistically improbable and interesting phrases instead.