If implicit data’s so great, why did DirectHit die?


DirectHit was a technology that aimed to improve search results by promoting links that people both clicked on, and spent time looking through. These days we’d probably describe it as an attention data algorithm, which places it firmly in the implicit web universe. It was launched to great excitement in the late 90’s, but it never achieved its promise. There was some talk of it lingering on in Ask’s technology, but if so it’s a very minor and unpromoted part. If the implicit web is the wave of the future, why did DirectHit fail?

Feedback loops.
People will click on the top result three or four times more often than the second one. That means that even a minor difference in the original ranking system between the top result and the others will be massively exaggerated if you weight by clicks. This is a place where the click rate is driven by the external factor of result ranking, rather than the content quality that you’re hoping to rate. This is a systematic error that’s common whenever you present the user with an ordered list of choices. For example, I’d bet that people at the top of a list of Facebook friends in a drop-down menu are more likely to be chosen than those further down. Unless you randomize the order you show lists, which is pretty user-unfriendly, it’s hard to avoid this problem.

Click fraud. Anonymous user actions are easy to fake. There’s an underground industry devoted to clever ways of pretending to be a user clicking on an ad. The same technology (random IP addresses, spoofed user agents) could easily be be redirected to create faked attention data. In my mind, the only way to avoid this is to have some kind of trusted user identification associated with the attention data. That’s why Amazon’s recommendations are so hard to fake, you need to not only be logged in securely but spend money to influence them. It’s the same reason that Facebook are pushing so hard for their Beacon project, they’re able to generate attention data that’s linked to a verified person.

It’s a bad predictor of quality. Related to the feedback loop problem, whether someone clicks on a result link and how much time they spend there don’t have a strong enough relationship to whether the page is relevant. I’ll often spend a lot of time scrolling down through the many screens of ads on Expert Exchange on the off-chance they have something relevant (though at least they no longer serve up different results to Google). If I do that first and fail to get anything, and then immediately find the information I need on the next result link I click, should the time spent there be seen as a sign of quality, or just deliberately poor page design. This is something to keep in mind when evaluating attention data algorithms everywhere. You want to use unreliable data as an indicator and helper (eg in this case you could show a small bar next to results showing the attention score, rather than affecting the ranking), not as the primary controlling metric.

SEO Theory has an in-depth article on the state of click management that I’d recommend if you’re interested in more detail on the details of the fraud that when on when DirectHit was still alive.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: