Browsing speed boost

Eye2_5
I’ve added a “Show Next Result” button to the search page. This moves the preview forward to the next link in the results, and if you’re at the end of a results page, it fetches the next one. The beauty of this is that you can go through links very quickly without having to hunt through the results page.

I was also hoping to get the up and down keys to also cycle through results, but this looks like a cross-browser minefield. I had it working in Safari, but only after you manually set the focus by clicking in the window, I couldn’t get it working with Firefox, and I haven’t even looked at IE yet!

Another addition I want is some indication of which link is being previewed. I’m thinking about setting the table border of the snippet below the result to 1, so you get a selection box.

One disturbing thing I noticed while testing my latest changes on the PC is that youtube links in the results seem to hang IE when they’re previewed. This seems to be new, I’m worried there’s been some flash changes, and it might affect a lot of sites. Firefox and Safari still seem fine, but I’ll be trying to work out what’s happening tomorrow. It’s filed as Previewing YouTube sites hangs IE.

A relaxing intermission

Fallbig

Robert Seidel creates some of the most mesmerizing work, and he’s just put out another video online. Try to check out the really high-rez version if you’ve got the bandwidth, you’ll be rewarded, though there is a youtube version if you can’t face an 80 mb download.

There’s something about the movement and distortion in his animation, it’s twisted and disturbing, but very appealing. I love the mess and organic gunk that’s all over everything he does. My years programming 3D graphics have taught me to appreciate how much depth that dirt and imperfection can add to an image. I’m tired of hygenic abstraction.

If you like that video, check out Grau, my first introduction to him.

HTTP socket code fixes

Wires

I was chasing up the cause of the bug Socket based fetches can fail, and I tracked it down to an error in the way I was forming my initial HTTP GET request. I was putting the full URL as the argument on the first line, rather than just the path part. Interestingly, most servers accepted this, but typepad.com didn’t. I’ve updated the applet to do the right thing in this case, and also added some logging code that helped me track down the problem.

Something else that was really useful was the LiveHTTPHeader addon for Firefox. It gives you a full dump of the header information that’s passed back and forth between Firefox and the hosts it contacts. I’ve tried to absorb the RFC, but nothing beats being able to see a working example. I was also impressed by how easy it was to install, it makes me wonder how hard it would be to create a Firefox plugin version of SearchMash in the future.

I also fixed another bug that would sometimes prevent pages from loading, and removed the seperate status request check I did before, that used URLConnection. Now I just make a single socket connection for every page, that should be faster and easier to maintain.

Added a firefox search plugin

Firefox

I just finished a plugin for Firefox, so you can now add SearchMash to your search toolbar, just like the standard Yahoo, Google and EBay ones. To install it, go to http://mashproxy.com/search/previewframe.html, and click on the “Add SearchMash to Firefox’s toolbar” link.

Once it’s there, you can do searches with SearchMash by clicking on the top right toolbar, that shows a ‘G’ and does Google searches by default, and select ‘S’ for SearchMash instead.

StumbleUpon

Crowd

I noticed a lot of traffic after the Programmable Web listing. Some of it was directly from there but mostly from stumbleupon.com. Welcome to everyone who discovered SearchMash through that, and thanks to ChaseLightning and DaBug for the recommendation.

I hadn’t heard of StumbleUpon before, which probably shows my ignorance, but I’m very impressed by what I’ve learnt. I’ve added myself as petewarden, and I’ll be having some fun with it. On a business model note, it was very slick that the referring URL in my logs directed me to a customized page asking me if I wanted to create an ad campaign through StumbleUpon. If I was a commercial organization it would be very tempting, since they’ve already proved they can send traffic to you. From seeing that, I’d bet that SU has a profitable future ahead, and I’m happy for them since they also provide such a neat service for users.

You may also like… (aka the competition!)

Grin

ask.com and snap.com both offer preview images of some websites in their search results. Unlike SearchMash, these are pre-rendered thumbnails, so you can’t click on them, they only have them for the most popular sites, and they may be out of date.

Google Preview is a free Firefox plugin that provides similar functionality to ask and snap’s thumbnails, but pulling the images from thumbshots.com and alexa rather than a proprietary database.

Browster is an ad supported browser plugin, and the only other one that does give you a live preview of web pages. As a browser plugin, it does require an install, doesn’t do any search term checking, and is only available for the PC. The upside is that as a plugin, the integration is tighter than SearchMash.

For more mashups, ProgrammableWeb has a great directory, and recently gave SearchMash four stars.

Avoiding cloaking – big upgrade

Magnifyingglass

One of my main goals with SearchMash has been to save search time by skipping pages that don’t actually have the terms I want, despite what they tell Google. Typically, these are subscription sites that require registration and login to see the information Google’s indexed, and that’s normally too big a barrier for me.
To avoid those pages, SearchMash now checks the web pages it gets for the search terms, and tells you if it doesn’t find some of the terms.
I also upgraded the status display of results over the weekend, so you’ll see a clearer indication of any problems getting the page. I now draw a line through pages that couldn’t be fetched, and color any links that had errors as red.

SearchMash FAQ

Faq

What is SearchMash for?

SearchMash makes searching faster and more reliable by weeding out dead or stale results.

Why do I get a security warning?

SearchMash uses a securely signed Java applet to process and display the web pages. Because signed applets have these powerful functions, the browser wants to make sure you trust one before it’s run.
You should trust SearchMash because it’s fully open-source, highly secure, and certified by Verisign. It’s source code is open to scrutiny on Sourceforge, and its full security model is described on this blog.

Why do some pages show up incorrectly in the preview?

There are some bugs with the preview page loading that cause some images and CSS style sheets to fail to load. I made some fixes recently, but there are still some remaining issues. I will be addressing these as soon as possible.

How do I report a bug?

The easiest way is to just email me at searchbrowser@gmail.com. Any details you can give me about the operating system, browser and Java versions you used would be very helpful. If you’re comfortable using Sourceforge’s bug tracker, you can also go here to enter one directly.

Where can I get more information?

This blog has regular updates on the latest SearchMash news.
The Squidoo lens has articles on how to use SearchMash.
For the technically minded, the SourceForge project page gives you access to all the source code, as well as a list of current bugs and feature requests.

Preview polished

Eye
I’ve been talking with Philipp Lenssen from the excellent Google Blogoscoped, and he had some great feedback. In particular he called out the problems with some things not showing up correctly in the preview window.

This is something I noticed too, and have a bug filed in sourceforge’s bug tracker on, but it hadn’t made it to the top of my list.

I spent a bit of time on it this morning, and managed to greatly improve it. First off, I was changing the HTML to all lower case as part of my security measures, which messed up any case-sensitive resource paths. I’m doing all my search and replace case insensitively now, so I was able to remove that.

I also discovered that the BASE tag I was adding to resolve relative resources should actually point to the full URL of the page. I was trying to be too clever, and setting it to the URL up until the last /, assuming it had to be a folder name. Just setting it to the full URL made a lot more pages show up correctly.

There’s still some pages that have problems so I’m leaving the bug open, but these two changes seem to have fixed most of the issue.

Philipp also asked why I don’t just set the URL of the frame, rather than writing in the HTML into a local one. That’s a good question, since the current functionality would still work. I’m planning on adding something to help find search terms in the page in the future though, highlighting and scrolling to the right words, and that wouldn’t be possible just setting the location. I like to make life difficult for myself!

Improved security foundation

Rainbow

I’ve checked in two big changes to the applet that prevent malicious scripts from causing harm, even if they make it through the script blocking, by removing cookies and blocking sites that don’t show up in a google search. The ability to access cookies and intranet sites is the pot of gold at the end of the rainbow for malicious hackers. Now the applet blocks that access, there hopefully shouldn’t be much point in trying to crack the other layers of security, such as script blocking.

First, I’ve written my own HTTP handler based on TCP/IP sockets, rather than using Java’s high-level functions. This means there’s no chance of cookies being sent with page requests. The high-level functions would automatically add them but my version doesn’t even have access to the cookie jar. So that removes the possiblity of accessing personal or secure information through cookies.

Second, I’ve limited the page requests to those either those that start with http://www.google.com/search? or that are contained in the last search results page returned from google. Since local intranet servers should not show up on google search results, this should ensure that only public servers are accessed.

These changes close down the two main dangers attributed to cross-domain XMLHttpRequests, sending private information with the requests, and intranet access. Chris Holland has an article talking about an informal RFC for cross-domain requests, and these are the key points he brings up. There’s also a good mailing list discussion he references that seems to agree.

My main difference with Chris is his assumption that the only way to ban intranet access is to rely on all accessible servers to implement a new protocol, explicitly allowing cross-domain access. I believe the technique I use, verification of the domain’s existence in a trusted directory of public servers, such as google’s search database, should be enough to exclude intranet sites.

I think there’s another reason for the general preference for an opt-in approach to cross-domain access, that pulling pages from another site without explicit permission is a Bad Thing. This is something I’ve given a lot of thought to (see Why the hell are you sucking my bandwidth for your mashup?), and my conclusion was that it depends on the context. I’m sticking pretty closely to how the original publishers intended their pages to be shown in a browser, ads and all, just providing a different work flow for searching.

I think this ability to remix and mashup external web-pages is something that can be abused, but also has huge potential to enrich user’s lives with new creations. It’s disruptive, and a bit subversive, but I think the world will be a better place with more mashups.