Do you ever stare at a blank document wondering where to start?

Blankpaper
Photo by Mark78_xp

When I’m coding, designing a web page or writing up a document, it’s often helpful to start with an existing example. I’ll usually finish up completely rewriting it, but having a guide to the expected structure and main points to hit makes the process much faster. For business and legal documents, that’s where DocStoc comes in.

They offer a platform for users to share free templates for things like business agreements, wills and expense worksheets. It’s focused on professional documents, which separates it from services like Scribd, and has a flash-based interface for browsing through the material. They have a rating system that’s designed to help you find the most useful content. One thing I’d love to see is an official seal of approval on some of the legal documents, at the moment I would be nervous using it for something like a will without some reassurance.

They’re run out of Los Angeles, and recently announced a $3.25 million Series B funding round with Rustic Canyon Partners. I met Jason Nazar, the entrepreneur behind DocStoc, when he gave a talk at the Entrepreneurs Mentor Society last year. Back then it was still in the early stages, and it’s great to see it turn into such a local success story. I do wonder if the same idea could be applied to a company intranet, so that commonly used document templates could be shared in a central location?

What does Ask’s failure mean for search?

Fail
Photo by Fridgeuk

I was really sad to hear about the layoffs and change of strategy at Ask. They were working really hard to do something different with search, and this can only help fuel the belief that search doesn’t need to change. I didn’t always like the end result, and I never switched to using them fulltime, but they were the only mainstream engine that was moving the technology forward.

I won’t go into the history behind their retreat, other people have already done better jobs than I ever could. Danny Sullivan has the best overall roundup, and Jonathan Salem Baskin did a prescient piece on their marketing problems back in January. What I’m interested in is where it leaves the alternative search engine industry?

Ask had some great advantages over a startup. They had a large group of existing users to test new ideas on. Their history and contacts helped them publicize their new technology, and if they achieved a breakthrough their user base would mean a lot of word-of-mouth buzz. They also had a massive marketing budget, though it ended up being spent in some strange ways. My hope was that they would succeed by either come up with a killer feature in-house, or combining with one of the promising startups. Showing that users really do want more out of search would then trigger a technology arms race with the big guys, and we’d all benefit from some progress forward.

Instead, any challenger to Google’s crown will now have to organically build a user base, gather contacts and marketing resources. I’m still confident that there is a better way to search than a flat list of links, but this removes one of the paths to proving that. I hope it doesn’t make the investment climate harder for those small horizontal search engines too.

An easy way to create your own search plugin for any site

Mycroft
AltSearchEngines recently explained how to find over a thousand plugins that add new engines to the search box in the top right of your browser. If the one you want isn’t there, or you need one for your own site, I’m going to show how you can create your own search plugin for Firefox. You don’t have to write any code, all you need is an example URL.

I recently installed Lijit on my blog, and I’d like to offer a search box plugin for searching on my site. The first hurdle is finding an example URL to base the plugin on. Lijit usually displays its results in an overlay, with no change in the address bar, but I spotted the permalink button that goes to a normal web page.

Ligitpermalink

For the search engine you’re using, do a search for a single term (in my case "camping"), and make a note of the full URL given for the result page. In the case of my Lijit blog search, the permalink version is

http://www.lijit.com/pvs/petewarden?q=camping&pvssearchtype=site&preserved_referer=http%3A%2F%2Fpetewarden.typepad.com

To start creating your plugin, go to the Mycroft Projects Search Plugin Generator. You’ll see a form with a series of fields to fill out. Luckily you will be able to ignore most of these, and I’ll show you what you need to do for the others. Once all the right information is in there, submitting the form will write the plugin code for you!

Ligiturl

The most crucial part of the form is the top "Query URL". This tells Firefox how to generate the right address for the search engine you’re using. The generator takes an example search engine URL, and works out how to build links that search for any keywords.

The generator needs to know where the search terms are supposed to be in the URLs for this engine, so in the Query box below I tell it the term I was looking for, "camping".

Mycroftscreen2

Below that, enter a URL for the home page of the search engine you’re using.

Ligitmain

Leave the CharSet entry as None, and leave the Categories section blank. The next section, Results, is tricky. Some obscure parts of Firefox want to extract the search result links using the information from here, but all we want to do is direct the user to the right page. We should be able to leave this blank, but unfortunately the generator fails. Instead, fill in the first four boxes with "Dummy Entry", so the generator has some entries to work with.

Mycroftscreen4

You can leave the remaining entries in the Results section blank. Moving down to the Plugin part, there are three final boxes you need to fill.

Mycroftscreen5

You should enter your name and email address in angle brackets, since you’re the author. The name is what appears in the drop-down menu for the search box, and the description should be a short explanation of what the plugin is for.

That’s all the information you need to enter, so hit the "Send" button at the bottom of the form. Mozilla then analyzes the information you’ve submitted, and tries to create the right code for your plugin. You should see a couple of new sections appear at the bottom of the page. The first box is the HTML that the engine returned for your example search, which isn’t that interesting. The crucial part is the lower section of text, titled Plugin Source.

Mycroftscreen6

This contains the actual code you need for your plugin. I’ve uploaded the example that the generator creates for searching this blog with Lijit here . To create your own file, cut and paste everything that’s in a typewriter font inside the light grey box into your favorite text editor like NotePad or TextEdit. Make sure you’re in plain text mode if it supports fonts or colors. Save the file as the name of your search engine, with the .src extension, for example petesearch.src.

Now you have two choices for how to install the plugin. If you just want to use it on your own machine, you can copy it to the directories described on this page. On Linux it’s /usr/lib/Mozilla/searchplugins , for OS X use /Applications/Mozilla.app/Contents/MacOS/Search Plugins/ and Windows is C:\Program Files\Mozilla.org\Mozilla\searchplugins\ .

If you want to put it on a website for other people to install, you’ll need a small section of JavaScript. Here’s a cut-down version that will install it when it’s clicked on.


<a href="http://petewarden.typepad.com/searchbrowser/files/petesearch.src" onclick="window.sidebar.addSearchEngine(this.href, ”, ‘PeteSearch’, ”); return false;">Install</a>

This will work fine on Firefox, but if you want to gracefully fail on other browsers you’ll need some more complex code to detect if the plugin format is supported. Here’s a page from Mozilla that explains what you’ll need to do. Alternatively you can just label the link as Firefox-only.

This guide shows how to create a Sherlock plugin which will work with all versions of Firefox. There’s also a new standard called OpenSearch which works with Firefox 2 and Internet Explorer. It has some nifty features like being able to add your plugin to the search box whenever a user is visiting a site, but no user-friendly generator.

It’s a wild, wild web

Viewfour
While browsing my visitor logs, I came across viewfour.com. It’s an interesting site, it does something similar to my old SearchMash Java applet and ManagedQ’s much more advanced engine, displaying live previews of search results. It does suffer from a problem with frame-busting sites unfortunately, for example this search for Pete Warden winds up with the toolfarm preview taking over the parent frame. That was a big reason why you either need some decent script-blocking code, or deploy it as a browser extension where you can prevent child frames from taking control.

I was curious to discover that there weren’t any organic reviews for the site that I could find, and the copyright was 2005. Most of the Google results pointed to download pages. It also includes a link to ViewSmart, a spyware/malware blocker, which seemed like an odd combination to go with a search engine. In fact, the only user-created review I found in the first few pages was this negative one from a spyware information site. I don’t recommend paying too much attention to anonymous posters, but if you do try out the search site, it would be prudent to avoid the additional download until I can find out more information about it. I’ll see if I can get more information directly from the author, SSHGuru.

Now you can try ManagedQ for yourself

Explosion

My anonymous friends over at ManagedQ have left their private beta and opened their search service to everyone. I already covered how helpful their regular expression in-page searching can be, and they have a lot more to offer too, like their entity extraction and the most accurate thumbnails I’ve seen. You can see more reviews on AltSearchEngines and thenextweb.com.

I’ve been having some fun using the regular expressions I posted a few days ago with ManagedQ. To see their power, follow these steps:

1) Go to managedq.com and enter your main search terms (eg pete warden)
2) On the results page, start typing to bring up the inpage search box
3) Delete anything that’s already in there and enter the following regular expression:
/([0-9]{3})[^0-9a-z]*([0-9]{3})[^0-9a-z]*([0-9]{4})/

This should highlight any phone numbers in the results pages. I made the expression a bit more restrictive than my previous version to exclude letters as phone number seperators.

Managedqnumbers

Why ManagedQ’s in-page searching is so useful

Mqlogo

After stumbling across ManagedQ last week and giving them an unplanned launch, I wasn’t expecting a warm reception from the team. Thankfully it turned out that I already knew one of the founders, which explained why they’d appeared in my visitor logs. They were even kind enough to invite me onto their beta program!

I’m a long-time advocate of unbundling search engines and presentation, so I’m naturally pretty excited about how they overlay a deeply interactive UI on top of Google search results. There’s a lot of features I could talk about but I’ll focus on one of the most novel, the in-page searching.

In ManagedQ, search results show up as a grid of images, each showing a snapshot of the page. Unlike other thumbnail search engines, these are live HTML frames not just pre-canned images. The power of this is pretty obvious once you start trying to narrow down your search. To start with, you can just start typing a word anywhere on the page and all occurrences of that word will show up within each thumbnail.

For example, if you do a search on "Peter Thiel", and then want to narrow it to results that talk about PayPal, you type in the term and the thumbnails instantly either show you where the word is in the page:

Paypalinpage

or indicates that the term isn’t there:

Paypalmissing

As it stands, this is powerful stuff. I rely heavily on the summaries Google shows below every result to understand what’s on each page, now I can create custom summaries to find out more about a whole set of results at once. The in-page query stays active as you move through the results, so you can power-search by rapidly browsing through all the pages.

Where it gets even more interesting is when regular expressions are added to the mix. RE’s are the building blocks of most text processing languages, and offer a very flexible way of describing patterns of letters and numbers. For example you can describe some text that contains a dollar sign, followed by a number, followed by a whole word, with /\$\d* \w/

If you type that as your in-page search for Peter Thiel, you’ll get results that look like this:

Digitlarge

In detail, each thumbnail now shows every place that a dollar amount is followed by a word, which pulls out all of the fund figures that are mentioned in connection with Peter.

Digitsmall

This is very useful if you’re doing heavy research. By crafting different REs you can match all sorts of useful patterns, like C function calls with /\w*\(/ , or find a gene in a particular context. Since regular expressions just look like a cat walked across your keyboard to most of the world, the team is planning on offering shortcuts for common queries like dollar amounts.

To my mind, the big advance here is in the workflow. Traditionally you do a search and then click through to the results pages, eyeballing each one for the information you want. If the results aren’t good enough, you’ll go back and refine your query, doing a complete new search. With ManagedQ, you’ve suddenly got an interactive refinement stage that lets you poke and prod the result set and easily get a lot more information. You can instantly narrow your search by ignoring bad results that don’t contain terms you want, without throwing away all the others that could be interesting. You can get a quick feel for whether the results are worth exploring by throwing in good indicator terms that are likely to be in the ones you want. And as I mentioned at the start, you’ve suddenly got the ability to pull out your own summaries rather than relying on Google’s.

Expect to hear more from me on ManagedQ as I dig into its feature set. The concept of breaking out search presentation from the indexing engine has a lot of promise. Even this early version is a powerful demonstration of how far that approach can take you.

Try a secret new search engine

Mqlogo

Well, I’m not sure about secret, but it sure is mysterious. http://alpha.managedq.com/ showed up in my visitor logs, and visiting the site it looks like rather a nifty visual search interface. It’s got thumbnails of the top results, and automatically generated keywords sorted by type down the left side:

Mqscreenshot

The interesting part is that most of the site is returning 404 or authorization errors, which makes me wonder if they might still be in stealth mode? Unfortunately email messages to their public inquiries@managedq.com address bounce, they’ve got a private domain registration so I can get any contact details from that, and Google searches don’t get me any more information, so I can’t check with them before mentioning it here.

They’re using snap for the thumbnails, and I’m not sure how they’re pulling out the tags. The keywords definitely look automatically generated, rather than user driven. I’d love to know more about their work, so if anyone has more details or a way to contact them, email me or add a comment.