EXHIBIT – This lightweight library for creating visualizations by harvesting data embedded in pages looks like a great way of encouraging people to add semantic structure to their HTML. Which is good for me, since it makes my crawling a lot easier. Via Carole Goble
ScraperWiki – Another find from my search for better scraping tools, what impresses me about this site is their active community. There's been a lot of attempts at bringing tools like this to the masses, but I'm hopeful the time is right for this one to succeed. Via Dan Armstrong
AddToIt – An enterprise take on converting unstructured text into useful information. There's a whole quiet world of commercial companies offering scraping services, which is partly what gives the field a shady reputation. Their promise to handle cases where "The data you would like to scrape is protected" certainly adds to that impression. That's a shame, because I bet there's a lot of interesting technology behind these approaches that will never see the light of day.
Turbo Encabulator – This is probably what I sound like when I talk to normal people about my day at work. A games artist friend would always mutter 'glib-glob.cpp' when I started to get too jargonified, after a source code file name I mentioned once that caused him to go into hysterics. Hey, it's a GameLIBrary-GLOBals C Plus Plus module, made sense to me! After they caused most of our early Motion bugs, 'Pbuffers' became a shorthand codeword for technical nonsense talk in our team at Apple, since they sounded so made up but were the engineer's explanation for everything that went wrong.