SearchMash FAQ

Faq

What is SearchMash for?

SearchMash makes searching faster and more reliable by weeding out dead or stale results.

Why do I get a security warning?

SearchMash uses a securely signed Java applet to process and display the web pages. Because signed applets have these powerful functions, the browser wants to make sure you trust one before it’s run.
You should trust SearchMash because it’s fully open-source, highly secure, and certified by Verisign. It’s source code is open to scrutiny on Sourceforge, and its full security model is described on this blog.

Why do some pages show up incorrectly in the preview?

There are some bugs with the preview page loading that cause some images and CSS style sheets to fail to load. I made some fixes recently, but there are still some remaining issues. I will be addressing these as soon as possible.

How do I report a bug?

The easiest way is to just email me at searchbrowser@gmail.com. Any details you can give me about the operating system, browser and Java versions you used would be very helpful. If you’re comfortable using Sourceforge’s bug tracker, you can also go here to enter one directly.

Where can I get more information?

This blog has regular updates on the latest SearchMash news.
The Squidoo lens has articles on how to use SearchMash.
For the technically minded, the SourceForge project page gives you access to all the source code, as well as a list of current bugs and feature requests.

Preview polished

Eye
I’ve been talking with Philipp Lenssen from the excellent Google Blogoscoped, and he had some great feedback. In particular he called out the problems with some things not showing up correctly in the preview window.

This is something I noticed too, and have a bug filed in sourceforge’s bug tracker on, but it hadn’t made it to the top of my list.

I spent a bit of time on it this morning, and managed to greatly improve it. First off, I was changing the HTML to all lower case as part of my security measures, which messed up any case-sensitive resource paths. I’m doing all my search and replace case insensitively now, so I was able to remove that.

I also discovered that the BASE tag I was adding to resolve relative resources should actually point to the full URL of the page. I was trying to be too clever, and setting it to the URL up until the last /, assuming it had to be a folder name. Just setting it to the full URL made a lot more pages show up correctly.

There’s still some pages that have problems so I’m leaving the bug open, but these two changes seem to have fixed most of the issue.

Philipp also asked why I don’t just set the URL of the frame, rather than writing in the HTML into a local one. That’s a good question, since the current functionality would still work. I’m planning on adding something to help find search terms in the page in the future though, highlighting and scrolling to the right words, and that wouldn’t be possible just setting the location. I like to make life difficult for myself!

Improved security foundation

Rainbow

I’ve checked in two big changes to the applet that prevent malicious scripts from causing harm, even if they make it through the script blocking, by removing cookies and blocking sites that don’t show up in a google search. The ability to access cookies and intranet sites is the pot of gold at the end of the rainbow for malicious hackers. Now the applet blocks that access, there hopefully shouldn’t be much point in trying to crack the other layers of security, such as script blocking.

First, I’ve written my own HTTP handler based on TCP/IP sockets, rather than using Java’s high-level functions. This means there’s no chance of cookies being sent with page requests. The high-level functions would automatically add them but my version doesn’t even have access to the cookie jar. So that removes the possiblity of accessing personal or secure information through cookies.

Second, I’ve limited the page requests to those either those that start with http://www.google.com/search? or that are contained in the last search results page returned from google. Since local intranet servers should not show up on google search results, this should ensure that only public servers are accessed.

These changes close down the two main dangers attributed to cross-domain XMLHttpRequests, sending private information with the requests, and intranet access. Chris Holland has an article talking about an informal RFC for cross-domain requests, and these are the key points he brings up. There’s also a good mailing list discussion he references that seems to agree.

My main difference with Chris is his assumption that the only way to ban intranet access is to rely on all accessible servers to implement a new protocol, explicitly allowing cross-domain access. I believe the technique I use, verification of the domain’s existence in a trusted directory of public servers, such as google’s search database, should be enough to exclude intranet sites.

I think there’s another reason for the general preference for an opt-in approach to cross-domain access, that pulling pages from another site without explicit permission is a Bad Thing. This is something I’ve given a lot of thought to (see Why the hell are you sucking my bandwidth for your mashup?), and my conclusion was that it depends on the context. I’m sticking pretty closely to how the original publishers intended their pages to be shown in a browser, ads and all, just providing a different work flow for searching.

I think this ability to remix and mashup external web-pages is something that can be abused, but also has huge potential to enrich user’s lives with new creations. It’s disruptive, and a bit subversive, but I think the world will be a better place with more mashups.

Microsoft’s script debugger

Can
I discovered MS’s script debugger whilst investigating the preview frame sticking problem in Internet Explorer. I’m used to developing on Safari with just error messages to go by, so having even a basic debugger was a pleasant surprise. I also hear good things about Firefox’s debugger. My only niggle so far is that I can’t figure out how to tell what the error was that caused it to invoke the debugger, but I’m sure that’s just an RTFM issue.

I started off by turning off “disable script debugging” in IE’s preferences, but unfortunately that didn’t do very much. I discovered you had to manually install the debugger for the option to do anything, which made sense once I realized. I downloaded it from a link in an MSDN blog.

Prettification

Lipstick
I’ve made some cosmetic changes to SearchMash, so that you don’t just see a blank screen while it’s loading. You now see “Loading…” in the results frame, and a quick primer on how to use SearchMash in the preview. The primer is there because I had a lot of feedback that the initial screen was confusing to first-time users, since it’s just the google start page in a frame. Hopefully the extra information should make it easier to get started.

I also worked on the problem with the preview window sometimes causing script errors in IE, filed as Preview window can stop working after changing pages and Previewing missing sites on IE stops window in Sourceforge’s bug tracker. I added a try/catch around the access to the preview frame’s document, and tried to reset security by setting it back to the original local ‘src’ if there was an exception. I still see the exception occuring (I’m using MS’s very handy script debugger, which I’ll cover soon), but I haven’t been able to reproduce the problem with the window getting stuck, I hope my changes will unwedge it if it’s in that state.

Browser Compatibility

Error
One of the design goals of SearchMash was to work in a lot of different browsers, which is why I tried to avoid using browser-specific technology.

Both the LiveConnect technology that allows Java and JavaScript to call each other, and the Javascript document object model (DOM) that I use to edit HTML in a page, are well supported by the major browsers. The biggest hurdle is having a recent version of Java installed. I’ve seen figures of around 80-90% of desktops having some version of Java, but I haven’t seen a break-down by version, I suspect many of those are 1.1, which is why I hope to back-port MashProxy to that version. It’s a respectable deployment, but not as high as Flash (I’ve seen 98% figures for some version of that).

My main development platform is OS X, so Safari and Firefox get the most testing, and SearchMash works without problems on both. I’ve also tried some of the other Mac browsers, such as Opera or IE 5.1, but these are not widely used, so I haven’t spent the time to ensure SearchMash works on them. I did hit some quirks in Safari’s LiveConnect implementation, since Java strings don’t seem to be converted to JavaScript ones when they’re passed back to the script, but stay as wrapped Java objects. This means you can call Java string functions on them from the script, but JavaScript functions don’t work. Since there’s some overlap in JavaScript and Java’s string functions, it took me a while to figure out what was happening, but once I did, I was able to force a conversion by creating a new JavaScript string variable from the one that was passed back.

Since most of the world uses a PC, making sure that SearchMash works on Firefox and Internet Explorer was a priority. I only have one machine with Windows XP available, but I’ve been able to make sure it runs on IE 6 and 7, and on Firefox 1.5.

I’ve found IE’s Java to JavaScript connection to be a bit more picky than the other browsers. I’ve just found a bug that can cause the applet to crash when I change pages in IE for example, it looks like using an out-of-date JavaScript DOM object can cause a security exception.

I’ve done no Linux testing, because I don’t have a system set up to run it. I didn’t hit any big differences between Firefox on the Mac and Windows, so I’m hopeful it will just work on Linux/Firefox too.

Supported Browsers

  • Internet Explorer 6.0 (Windows)
  • Internet Explorer 7.0 (Windows)
  • Safari 2.0 (OS X)
  • Firefox 1.5 (Windows and OS X)

Untested

  • Internet Explorer 5.0 (Windows)
  • Opera (Windows)
  • Netscape
  • No Linux testing has been done

Tested, but not working

  • Internet Explorer 5.1 (OS X)
  • Opera (OS X)

Bugs and features

Ladybird
One of the nice things that sourceforge offers is a feature and bug tracking system. You can check out the bugs here and the features here, or click on the links from the main project page.

I’ve started things off with some bugs and features I’ve had on my mental list for a while, but you should feel free to jump in and add your own requests.

Bugs

  • Make MashProxy Java 1.1 compatible
  • MashProxy doesn’t really need to use any new Java features, but it relies on a couple just because it was written in a 1.4 environment. I’d like to remove those dependencies, so it’ll run even on really old versions of Java.

  • Previewing missing sites on IE stops window
  • I noticed this while using my parent’s PC on vacation, I normally develop on a Mac/Safari, where I don’t see the problem. Moving the mouse over a missing link, invoking the preview, causes the preview window to stop responding to any further requests to show pages, even valid ones.

Features

  • Improve appearance of status indication
  • I think either icons or specially formatting of the title would be better than the current (found) or (missing) text that’s added after each link.

  • Provide ask.com as an alternative to google
  • I normally only use google, but it seems like it wouldn’t be too hard technically to parse ask.com’s results too, and offer users a choice.

  • Check for search terms in the page
  • This was one of the big features I wanted to help my searching, but that I ran out of time to implement before the first release. It would catch out sites that do ‘cloaking’ (showing one set of results to google to get the search terms, but another to normal users).

  • Show multiple search pages
  • Ten search results to a page sometimes feels a bit stingy, and it seems like it wouldn’t be too hard to concatenate multiple pages of results, one above the other. I’m not sure how many to show at once, but I’d probably try four pages, and forty results, and see how that feels.