Microsoft’s script debugger

Can
I discovered MS’s script debugger whilst investigating the preview frame sticking problem in Internet Explorer. I’m used to developing on Safari with just error messages to go by, so having even a basic debugger was a pleasant surprise. I also hear good things about Firefox’s debugger. My only niggle so far is that I can’t figure out how to tell what the error was that caused it to invoke the debugger, but I’m sure that’s just an RTFM issue.

I started off by turning off “disable script debugging” in IE’s preferences, but unfortunately that didn’t do very much. I discovered you had to manually install the debugger for the option to do anything, which made sense once I realized. I downloaded it from a link in an MSDN blog.

Prettification

Lipstick
I’ve made some cosmetic changes to SearchMash, so that you don’t just see a blank screen while it’s loading. You now see “Loading…” in the results frame, and a quick primer on how to use SearchMash in the preview. The primer is there because I had a lot of feedback that the initial screen was confusing to first-time users, since it’s just the google start page in a frame. Hopefully the extra information should make it easier to get started.

I also worked on the problem with the preview window sometimes causing script errors in IE, filed as Preview window can stop working after changing pages and Previewing missing sites on IE stops window in Sourceforge’s bug tracker. I added a try/catch around the access to the preview frame’s document, and tried to reset security by setting it back to the original local ‘src’ if there was an exception. I still see the exception occuring (I’m using MS’s very handy script debugger, which I’ll cover soon), but I haven’t been able to reproduce the problem with the window getting stuck, I hope my changes will unwedge it if it’s in that state.

Browser Compatibility

Error
One of the design goals of SearchMash was to work in a lot of different browsers, which is why I tried to avoid using browser-specific technology.

Both the LiveConnect technology that allows Java and JavaScript to call each other, and the Javascript document object model (DOM) that I use to edit HTML in a page, are well supported by the major browsers. The biggest hurdle is having a recent version of Java installed. I’ve seen figures of around 80-90% of desktops having some version of Java, but I haven’t seen a break-down by version, I suspect many of those are 1.1, which is why I hope to back-port MashProxy to that version. It’s a respectable deployment, but not as high as Flash (I’ve seen 98% figures for some version of that).

My main development platform is OS X, so Safari and Firefox get the most testing, and SearchMash works without problems on both. I’ve also tried some of the other Mac browsers, such as Opera or IE 5.1, but these are not widely used, so I haven’t spent the time to ensure SearchMash works on them. I did hit some quirks in Safari’s LiveConnect implementation, since Java strings don’t seem to be converted to JavaScript ones when they’re passed back to the script, but stay as wrapped Java objects. This means you can call Java string functions on them from the script, but JavaScript functions don’t work. Since there’s some overlap in JavaScript and Java’s string functions, it took me a while to figure out what was happening, but once I did, I was able to force a conversion by creating a new JavaScript string variable from the one that was passed back.

Since most of the world uses a PC, making sure that SearchMash works on Firefox and Internet Explorer was a priority. I only have one machine with Windows XP available, but I’ve been able to make sure it runs on IE 6 and 7, and on Firefox 1.5.

I’ve found IE’s Java to JavaScript connection to be a bit more picky than the other browsers. I’ve just found a bug that can cause the applet to crash when I change pages in IE for example, it looks like using an out-of-date JavaScript DOM object can cause a security exception.

I’ve done no Linux testing, because I don’t have a system set up to run it. I didn’t hit any big differences between Firefox on the Mac and Windows, so I’m hopeful it will just work on Linux/Firefox too.

Supported Browsers

  • Internet Explorer 6.0 (Windows)
  • Internet Explorer 7.0 (Windows)
  • Safari 2.0 (OS X)
  • Firefox 1.5 (Windows and OS X)

Untested

  • Internet Explorer 5.0 (Windows)
  • Opera (Windows)
  • Netscape
  • No Linux testing has been done

Tested, but not working

  • Internet Explorer 5.1 (OS X)
  • Opera (OS X)

Bugs and features

Ladybird
One of the nice things that sourceforge offers is a feature and bug tracking system. You can check out the bugs here and the features here, or click on the links from the main project page.

I’ve started things off with some bugs and features I’ve had on my mental list for a while, but you should feel free to jump in and add your own requests.

Bugs

  • Make MashProxy Java 1.1 compatible
  • MashProxy doesn’t really need to use any new Java features, but it relies on a couple just because it was written in a 1.4 environment. I’d like to remove those dependencies, so it’ll run even on really old versions of Java.

  • Previewing missing sites on IE stops window
  • I noticed this while using my parent’s PC on vacation, I normally develop on a Mac/Safari, where I don’t see the problem. Moving the mouse over a missing link, invoking the preview, causes the preview window to stop responding to any further requests to show pages, even valid ones.

Features

  • Improve appearance of status indication
  • I think either icons or specially formatting of the title would be better than the current (found) or (missing) text that’s added after each link.

  • Provide ask.com as an alternative to google
  • I normally only use google, but it seems like it wouldn’t be too hard technically to parse ask.com’s results too, and offer users a choice.

  • Check for search terms in the page
  • This was one of the big features I wanted to help my searching, but that I ran out of time to implement before the first release. It would catch out sites that do ‘cloaking’ (showing one set of results to google to get the search terms, but another to normal users).

  • Show multiple search pages
  • Ten search results to a page sometimes feels a bit stingy, and it seems like it wouldn’t be too hard to concatenate multiple pages of results, one above the other. I’m not sure how many to show at once, but I’d probably try four pages, and forty results, and see how that feels.

Getting a Certificate

Certificate
As I mention in my post on building your own MashProxy applet, you’ll need to sign the applet you build with an RSA-Signed Certificate. Once you’ve got a certificate, the process of signing is fiddly but pretty well documented, so I’m going to focus on acquiring one.

Self-signed

For testing purposes, using a self-signed certificate is good enough, and creating them is easy. The downside is that there’s no verification of any information you put in the certificate, for example you could claim that you’re Bill Gates at Microsoft. The point of signing the applet is that they’re a guarantee that the code is from a known and verified person and organization, since self-signed certificates don’t offer that guarantee, many browsers won’t run them, or will only run them after the user clicks ok on scary security dialogs.

Trusted Third Parties

Firms like Verisign, Thawte and others are what is known as ‘Trusted Third Parties’ (TTPs). They do the work of checking that people who want a certificate are actually who they claim to be, by checking phone numbers, addresses and official documentation, and once they’re satisfied, they’ll issue a certificate containing that information. This certificate is itself signed by their certificate, which will be shipped along with all browsers. The chain of trust is that the browser publisher believes in the TTP’s procedures, so that anyone they sign is also treated with a higher level of trust.

In practice this means less scary security warnings, and the ability to run on even high security settings.

The downside is that the TTP’s checking procedures can take a long time, and need a lot of documentation, and are also fairly costly (several hundred dollars for a year). I used Verisign, and I was very happy with their service, though they’re not the cheapest. Cynthia Klocke dealt with my order very efficiently, if you mail me, I can give you her contact details. Be aware, you’ll need a registered business name, a number in the phone book for that business that they can reach you at, they don’t register individuals, though I’ve heard Thawte will. Here’s a quick description of Verisign’s procedures.

Cross-domain Choices

Horror

There have been a lot of different ways tried of fetching web pages from third-party sites. The biggest division is between server-based methods, and those that run purely on the client.

Server methods

Server-based methods all rely on the privileged status of requests that come from the same domain as the web-page the script is on. The usual security restriction is to only allow data to be read from the script’s domain, so these route external page requests through the script’s host. They use the server as a proxy, acting as a middleman passing requests on from the client to the external site, and then passing the results back to the client. Jason levitt has a good article on some different ways of implementing server proxies, but they all have a lot in common.

  • No client setup needed
  • The proxy will work with almost any browser, and it’s very painless for the end-user

  • You need to configure your server
  • All of the methods involve some degree of either fiddling with Apache config files, or setting up CGI scripts to do the redirection. This can be a problem if you don’t have the access or experience to set that up on the server.

  • You can only access what your server can see through its connection
  • This helps security, because it means there’s no chance of a malicious peek at intranet servers, and you don’t have access to the user’s cookies. It can be a problem though if you want to check the availability and contents of a site as it appears to the user. For example with SearchMash, I wanted to bypass cloaking, and deal with what the user sees, rather than what the server gives search engines.

  • All traffic goes through your server
  • I’ve seen arguments that this is a good thing ethically, because you’re sharing the bandwidth pain with the site you’re fetching from. It seems a bit inelegant and wasteful though, since you’re using more network resources than if the fetch was being handled directly, and I’d prefer to handle throttling explicitly, rather than relying on a server’s bandwidth. SearchMash has no throttling, it’s something I’ll need to consider if traffic grows.

Client methods

There are two existing ways to do cross-domain fetches without using a server proxy; signed scripts and FlashXMLHttpRequest. I haven’t used either, since they both had limitations that made them unsuitable for SearchMash, but I’ll summarize what I understood from my research.

FlashXMLHttpRequest

For full information on FlashXMLHttpRequest, check out Julian Couvreur’s blog. It uses Flash’s HTTP library to do the fetching. It’s a great package, it has a JavaScript API that’s just like XMLHttpRequest, Flash is available on almost all machines, and there’s no scary security warnings that the user has to click through. Flash will only fetch from sites that explicitly allow it though, so mashing from arbitrary domains is not possible. This makes security less of a headache, but didn’t support the sort of access I needed for SearchMash.

Signed Scripts

I didn’t find a definitive article on signed scripts, but here’s a few of the articles that I found useful. Using signed scripts allows cross-domain page fetches using the standard XMLHttpRequest API, but it brings up a ‘do you trust this signed script?’ security dialog, and only works on Mozilla browsers, which ruled it out for me.

MashProxy

This blog has most of the information on my implementation. It’s very similar in concept to FlashXMLHttpRequest, but using Java’s HTTP functions rather than Flash’s. The big difference for my purposes is that it supports fetching from any web site, which makes security a lot harder to implement, but opens up a lot of unctionality. It requires a certificate, brings up a security window and has an API that’s not familiar to users of XMLHttpRequest. Java doesn’t seem to be as widely deployed as Flash, but it’s on around 80-90% of desktops. The current implementation doesn’t work with Java 1.1 (it uses some simple Swing thread functions), but I hope to back-port it in the future. It does also require a visible applet to run in IE, though this can be small.

Vacation

Swan
I’m back in the UK for the next week, my first trip home in three years! We’ve just headed up to my parents in Cambridge, after two days in London. The visit to the US embassy for the routine visa stuff was the usual nightmare of arbitrary bureaucracy, but I survived. Liz spent the afternoon in Regent’s Park, and we went on a guided walk around London in the evening. She’s got some photos up at http://lizbaumann.com/Britain2006.html, the hotel room is really something to be seen, very Austin Powers.

It’s raining and blowing a gale here at the moment, and I’m loving it after the complete lack of weather in LA! A walk around the my home village of Over was muddy, but looking around the 800 year old church and graveyard always gives me a real sense of wonder.

After a weekend of marmite, tea and roast dinners here at my parents, we’re heading up to Keswick in the lake district for a few days of hiking and warm beer. My brother, sister and sister-in-law are all coming too, and it’ll be the first time we’ve gone away together since we were kids.