When I was 16, I got a copy of The New Hacker’s Dictionary, aka The Jargon File. An entry that stuck in my head was for the AI term Wall Follower. Harvey Wallbanger was an entry in an early AI contest, where the contestants had to solve a maze. The other robots all had sophisticated algorithms with 1000’s of lines of code; all Harvey did was keep moving forward and turning so that his finger was always on the left wall. Of course, he beat them all.
Whenever I fall too deeply in love with the technology I’m building, I try to remember Harvey. Often a little Brute Force and Cunning will produce better results than something more intellectually challenging.
I was thinking of that when I read this paper, on email categorization using statistics. The authors are clearly off-the-charts smart, and they present some promising techniques, but it feels like their goal is unrealistic. Nobody will accept their incoming email being unreliably placed into folders, even if it’s right 90% of the time. I think it’s much more interesting to use the same techniques to present information to the user, by applying a bunch of approximate tags based on the content that aid the user’s email searching and browsing. They’re trying to build something like Yahoo’s web directory; I’d much rather have an imperfect but useful and scalable service like Google’s web search for email.