Try a secret new search engine


Well, I’m not sure about secret, but it sure is mysterious. showed up in my visitor logs, and visiting the site it looks like rather a nifty visual search interface. It’s got thumbnails of the top results, and automatically generated keywords sorted by type down the left side:


The interesting part is that most of the site is returning 404 or authorization errors, which makes me wonder if they might still be in stealth mode? Unfortunately email messages to their public address bounce, they’ve got a private domain registration so I can get any contact details from that, and Google searches don’t get me any more information, so I can’t check with them before mentioning it here.

They’re using snap for the thumbnails, and I’m not sure how they’re pulling out the tags. The keywords definitely look automatically generated, rather than user driven. I’d love to know more about their work, so if anyone has more details or a way to contact them, email me or add a comment.

Welcome hackszine readers!


Jason Striegel over at hackszine, the blog of Maker magazine, has been a big supporter of my hacking with Google, and has just published an update on my IE porting work. He mentions the wiki I’ve set up to shed light on the obscure world of IE plugins, and you can look forward to lots of other fun stuff on the Facebook API here as I learn more about it. Thanks for the mention Jason!

GoogleHotKeys is Go!


I’ve officially launched Google Hot Keys. Check out the new site for the Internet Explorer and Firefox versions, help and screenshots.

There’s already been a lot more Firefox downloads since the name change, now I’m going to work hard to promote it to IE users too. So far, I’ve started the submission process for CNet’s, at the confusingly named! It looks like this will take around three or four weeks to go through their submission process, though I could speed it up if I paid $250, or went for a $9 a month package. One nice bonus is that it will appear on Windows Marketplace once CNet accepts it.

I’ll be reaching out to some of the people who’ve shown a past interest in PeteSearch, since I’m finally happy that I’ve got a product that will deliver a great experience! There’s an audience out there for this, and I will be pushing hard to get it in front of them.



This ZDNet article really reminded me why I started PeteSearch. It’s from 2004, and laments the lack of progress in searching, and not much has changed since then! Sure, there’s been iterative improvements, more flexible term matching, search history and the like, but nothing that a Google user from 2000 would be surprised by.

Part of the problem is that Google can copy most foreseeable outside innovations if they turn out to be popular. It’s really hard to make a business case for funding a company that would in effect be providing them with free R&D, with little prospect of a return. Google themselves are experimenting with new models, but without real competition, they’re in no rush to cook their golden goose. Ask have been the most innovative with their search UI, but even that is still based around the same basic layout.

One phrase really struck me from the article: "Don’t expect users to apply more than the basic tools and techniques to acquire information from a search engine." The stats show that only three percent of people used quotes or other advanced syntax.

A lot of people have concluded that this means people don’t want anything better, and there’s no point trying to improve the page-of-links presentation of results. My firm belief, and the reason I’m experimenting, is that I think it’s just a local-maximum in the space of possible UIs. Why not have a grid of 25 thumbnails, with the positions of the terms marked on each? Or live snippets of the actual rendered page below each link, not just the text? Or a micro-fiche-style view, where you cycle through all the pages at speed in full-screen? Sure, these are random examples, but are people in 2030 really still going to be using plain pages of links?

First PeteSearch IE beta build released

I just completed the installer, and so I’m now releasing the first public build of PeteSearch for IE:
It requires Windows 2000 or later, and Internet Explorer 7. Please give it a try, and let me know how you get on. I’ve updated the source repository, and I’ll be adding an article on how I built the installer soon.

More posts on porting Firefox add-ons to IE

Defrag Conference

I’ve just signed up for Defrag, a conference focused on the implicit web. In their own words:

Defrag is the first conference focused solely on the internet-based tools that transform loads of information into layers of knowledge, and accelerate the “aha” moment.

People often talk about information overload, and trying to cut down the amount of data people have to deal with. That approach leads to solutions where a computer tries to do part of the user’s mental processing for them, which is a slippery slope towards talking paperclips.

I want to give people more information, but in a form they can digest. I want to present something that all our wonderful pattern-matching circuitry can sink its teeth into. We’ve had millions of years of adaption to spotting pumas in the undergrowth, we should take advantage of that.

It feels like a lot of the Defrag folks are thinking along similar lines, so I’m hoping to meet some interesting people who are working at the same coal-face, and get advice and inspiration. Plus I’ve never been to Denver, so maybe me and Liz can combine it with a vacation!

PeteSearch now on

PeteSearch has finally made it through Mozilla’s review process, and is now on the main add-on site. It’s great to see it on such a high-profile site, but one of the really nifty things is that you now don’t have to go through the two-step process of adding as a trusted site in Firefox to install the addon, since is trusted by default.

It was a long and rocky road getting approval, since they recently added a new sandbox system, and require user reviews before they’ll allow an addon onto the public site. This does make sense, but unfortunately there’s almost no users writing reviews in the sandbox, I think at least partly because it’s tough to get to even if you know about it, and almost impossible to discover if you don’t.

Luckily Pavel Cvrček, Mike Shaver and Shawn Wilsher came to my assistance, and helped me work out how to get the user reviews from news sites and blogs taken into account when they’re evaluating my addon for publication. Shawn has a post explaining the policy in more detail, but in simple terms you need to add links to the external reviews in the  ‘Notes to Reviewer’ section before you nominate it. You get to that section by clicking on the  version number link, eg

Thanks Pavel, Mike and Shawn, I really appreciate your help!

Porting Firefox extensions to Internet Explorer – Part 4

In my previous articles, I described how to set up the compiler, build a basic BHO, work with regular expressions and fetch pages using XMLHttpRequest’s. I’ve now combined those components into the first working version of PeteSearch in IE. The source code is available through Sourceforge, or you can download it as a zip archive.

This version detects search pages from Google, Live, Ask, Technorati and Exalead, pulls out the search result links, fetches the contents of those pages to ensure they’re still active and relevant, and updates the search page to show each link’s status. I’ll be covering the remaining tasks in later articles, which include implementing hot-keys, adding a split-screen view to IE and writing an installer.

More posts on porting Firefox add-ons to IE

XMLHttpRequest in C++ on Windows Example

In the first three parts I covered how to get a simple IE extension built. For PeteSearch I need to be able to fetch web pages, so the next step is to figure out how to do that in Internet Explorer.

Firefox lets extensions use XMLHTTPRequest objects to fetch pages. It’s quite lovely; well documented and tested since it’s the basis of most AJAX sites, and with an easy-to-use interface. The first thing I looked for was an IE equivalent.

There’s a COM interface called IXMLHTTPRequest that looked very promising, with almost the same interface, but it turned out to involve some very gnarly code to implement the asynchronous callback in C++. It was also tough to find a simple example that didn’t involve a lot of ATL and MFC cruft, and it involved using a pretty recent copy of the MSXML DLL, and there were multiple different versions. All-in-all, I ruled it out because it was just sucking up too much time, and I dreaded the maintenance involved in using something so complex.

There’s also the newer IWinHttpRequest object,  but that’s only available on XP, 2000 and NT4.0, and seems far enough off the beaten track that there’s no much non-MS documentation on it.

I finally settled on a really old and simple library, WinINet. It’s a C-style API, and lower-level than XMLHttpRequest, and is a bit old-fashioned with some situations that require CPU polling, but it offers a full set of HTTP handling functions. It’s also been around since 1996, so it’s everywhere, and there’s lots of examples out on the web. Since I liked the XMLHttpRequest interface, I decided to write my own C++ class implementing the same methods using WinINet under the hood.

Here’s the code I came up with
. It implements a class called CPeteHttpRequest that has the classic XMLHttpRequest interface, with a simple callback API for async access. I’m making it freely available for any commercial or non-commercial use, and I’ll cover my experiences using it with PeteSearch in a later article.

Edit – It turns out that WinInet is actually very prone to crashing when used heavily in a multi-threaded app. You should use my WinHttp based version of this class instead.

More posts on porting Firefox add-ons to IE

PeteSearch and the semantic web applied

PeteSearch is a semantic web application, it’s taking web pages designed to be read by humans and turning them into data that can be processed by software. It’s a pretty specialized application, focused purely on pages that list external sites associated with particular search terms, but the wide range of sites I’m able to support using the same code shows that my approach is robust.

The model I use for search pages is that they must contain three pieces of information:

  • A list of search terms, embedded in the URL
  • A list of external sites associated with those terms
  • A link to the next page of results

All of the recognition of these is data-driven, using a definition for each engine that includes

  • What is the host name and action used by the engine, so we can tell what’s a page of search results. For google that’s
  • What precedes the search terms in the URL, eg for google that’s q=
  • Which external sites are linked to, but not part of the results, eg google links to for definitions of words
  • Which words indicate an external link that isn’t part of the results, eg google links to the cached results on numbered servers using Cached as the link’s text
  • Which word is used for the link to the next page of results. For English that’s almost always Next but I also support other languages

You can experiment with this by editing the SearchEngineList.js file inside PeteSearch, it contains an array of these definitions, and an explanation of the exact format they’re stored in. It’s pretty straightforward to add a new engine that fits into this pattern, and most of them do.

The only way that the semantic web is going to progess beyond proofs of concept is if there’s some concrete, practical and commercial application for it. I’ve seen this with AI, its applications in robotics and games have pushed the field forward much more than pure research. The semantic web is stuck in a chicken-and-egg situation; nobody builds applications because nobody builds sites that are data sources,
because nobody builds applications, etc.

I’m not the only one to notice this, Piggy Bank is an MIT project that’s much more ambitious, and works like Grease Monkey in that it provides a framework to support data capture from many different sites using plugin scripts.

My goal is to demonstrate that a semantic web application can be useful today, in the real world, by creating a compelling tool based on my approach. I’m worried that unless somebody can show something useful, it’s going to succumb to the AI curse and remain the technology of tomorrow indefinitely!