MashProxy’s implementation

Cog

MashProxy is a very small Java applet, only 385 lines long and 12Kb when it’s compiled. It’s job is to recieve URLs from the JavaScript of the page it’s running in, and return the HTML contents of those pages.

The first thing it has to do is call back the page it’s contained in when it’s initialized, so that the script knows it’s safe to start asking it to fetch pages. Calling an applet before it’s initialized doesn’t work, and this was the most reliable way to discover once it’s active. I also tried checking alive state and other methods, but this was the most reliable across different browsers.

It’s in the init function that the current document’s location is checked, and if it isn’t on the whitelist, or being run from a local file, the applet silently fails.

If it’s determined that it’s ok to run, the applet stores the current JavaScript window and document objects, which it will use for future callbacks, and then calls the SB_NotifyAppletLoaded_Forward() JavaScript function in the current document. This is hardcoded to this function name, it’s got a _Forward suffix because the SearchMash implementation runs the applet in a frame, and so the JSWindow calls back into that frame, and the function is just a stub that calls the real code in the main frame.

It also starts up a seperate thread to handle incoming requests. This is because callbacks into signed applets from JavaScript don’t get the same security privileges, so it’s necessary to just pass the information onto a different thread from such a function, and have that trusted thread do the actual work.

As far as I can tell from Sun’s comments on the need for this, it is to prevent accidental exposure of signed applet’s functions to JavaScript, and it does ensure that you have to explicitly enable any trusted code before you can call it. However, it may be that they were hoping to prevent JavaScript invoking trusted Java code at all, which is not the effect they achieved, they just made the code to do it more complex.

The main entry point to the applet is the pageRequest() function. This takes a URL, and a string containing the JavaScript function name to call back. For now I’ve disabled being able to set the callback, and hardcoded the call back to SB_PageRequestDone_Forward() in the applet’s frame, to make it tougher to use the applet for malicious third-party scripts loaded in other frames.

The callback then pushes the arguments into member variables of the applet, and signals the trusted thread to handle a request by waking it. The trusted thread reads in the arguments, and does the actual work.

It first fires off a header request using HTTPURLConnection to the web page to get the status code, and quickly discover if it’s missing or moved. After that, it requests the full page. There’s also a timeout thread on these that kicks in after 20 seconds of no response, and sets the status to HTTP_CLIENT_TIMEOUT.

When the results are returned, the pageRequestDone() function is called, either with null for the contents if the page couldn’t be found, or a BufferedReader representing the contents, the source URL and the status code. The reader is converted into a string, and the JavaScript callback function (hardcoded currently to SB_PageRequestDone_Forward) is called with the results.

The applet is able to handle multiple concurrent requests because of its threading model. One quirk is that I had to include a JSObject jar to enable the ‘liveconnect’ functionality I needed to be able to call back and forth between JavaScript and Java.

Security

Lock

No security is perfect, but there’s a number of important restrictions built into MashProxy and SearchMash.

The security in MashProxy is based around the Java applet being digitally signed. This signature confirms that the code has not been tampered with, and was written by the author. I’m using a certificate from Verisign, which I had to give solid proof of my identity to obtain.

This proof of identity means that it’s very easy to track someone down who did write malicious code. It reduces the decision of ‘Do I want to run this code on my machine?’ to ‘Do I trust the author of the code?’.

I’m an established and reputable open source programmer, as a search on my name will verify, and I’ve been distributing widely used executable code (such as FreeFrame plugins for many years with no security problems. This should give you some assurance that my intentions are good.

The other question is whether I’ve done a competent job safeguarding the security in my implementation? I believe I have, and to back that up I’ll cover the details of what I’ve done below. The source code is also freely available through the SourceForge project for anyone who wants to examine it for themselves.

The basic foundation of MashProxy is that signed Java applets running within a web browser are able to fetch pages from anywhere on the web, and aren’t restricted by the same-domain policy that controls most page access functions.

My initial attempt at a search mash ran entirely within a signed applet, but I discovered that support for parsing and rendering HTML was too limited in Java, and I wasn’t happy with what I could achieve.

Taking what I’d learnt, I looked into other possible ways of implementing the functionality I wanted. JavaScript fitted the bill, since it has great support for page display and parsing, is well supported and documented, and is cross-browser. The one thing it didn’t have is the ability to load pages from other domains.

However, it was able to call into a signed applet, and the applet could then load the pages from any domain. So, I took my original code for the full applet, and reduced it to just a single public function to load a web page. I was then able to call it from JavaScript and build SearchMash.

Now, since the applet doesn’t know what JavaScript is calling it, this opens up a lot of possibilities for abuse. I set out to engineer some restrictions into the applet to prevent malicious usage:

Applet Safeguards

The major safeguard is that I only allow the applet to run from pages on the petewarden.com or mashproxy.com, sites I control. This ensures that the applet can’t be used with my signature on other people’s sites. I also allow the applet to be run from pages that are on the local machine (checking for file as the protocol), with the assumption that local files are trusted. This means that development using my signed applet will still be possible by third parties using local files, but to deploy they’ll need to get a certificate and rebuild their own applet with their site added to the whitelist of trusted domains.

To check the page I’m running on, I call JApplet’s getDocumentBase() function, which returns the page that invokes the applet (not the location of the applet, that would still allow untrusted third parties to call it).

I also make sure that all requests begin with “http://”, (even though the HTTPURLConnection code that I call in Java shouldn’t allow anything else), to restrict any other protocols from being invoked.

The one thing I wasn’t able to do was prevent the passing of cookies automatically. I would prefer to always work without cookies, to avoid the possibility of accessing personal data, but these seem to be added automatically by the browser to all requests made through HTTPURLConnection.

SearchMash JavaScript

These are the measures I’ve taken to ensure the applet can’t be used maliciously by someone else. I’ll now cover what the JavaScript that implements SearchMash actually does, step by step, to provide evidence it won’t leave you vulnerable.

When the main index.html is loaded, it contains three frames, the one for results, the one for the preview, and a small one to hold the applet. The small one is necessary because some browsers don’t want to run invisible applets, which seems sensible.

The onload function calls code that Internet Explorer needs to run the applet automatically, because of patent issues. Then, the applet is loaded, and calls back to SB_NotifyAppletLoaded(), which triggers the first fetch of the google search page.

When a search page is loaded, I add JavaScript hooks into the page links so the user can navigate through pages without leaving the mashup. I also add mouse-over hooks to activate the preview frame. I parse all the links in the page, and put in page requests to the applet for all that appear to be external to Google.

The applet calls back to JavaScript when a page is loaded or determined to be missing. If it’s an external page, the current search page is searched for the link, the status text is updated, and the contents are stored for later use in the preview frame. If it’s a search results page, then the link hooks are added, and the search frame is updated.

At no point in the code is any information from the fetched pages passed back to my site, or any other external site. Those pages never leave your local machine, which ensures that your information is kept private.

Remaining Vulnerabilities

Being able to read from arbitrary domains opens up some malicious possibilities:

  • Reading from web pages using the browsers’ cookies to access personal information
  • Reading from web pages that are within a private intranet
  • Passing any information obtained back to the attackers website by encoding the information in a URL

Because the HTML from the remote sites that SearchMash fetches is loaded into frames that have access to the applet, any scripts inside the HTML loaded from external sites could use the applet’s functions.

I’ve added some checks to try and remove scripts from loaded pages, but it’s notoriously tough to parse out all scripts (see the myspace samy exploit).

Something that would limit the threat would be removing access to cookies. I’m looking into using a lower level functions in Java, or a third-party library that would do let me disable them.

Introducing MashProxy

Wavehello

Hi, I’m Pete Warden (that’s not me in the photo!), and as a fun project, I decided to create a search mashup that let me search the way I wish I could.

While I was doing that, I discovered that there was a big restriction on using XMLHttpRequest and AJAX. You can only request pages from the same server you’re on, as a security measure. This obviously makes doing a mashup of pages on other servers much more difficult.

The standard ways of working around this involve setting up a way to use your server to fetch external web pages. There were several reasons I wanted to avoid this:

– It doesn’t scale with the number of users, since everything has to go through your server.
– A big goal of the project was to discover if the pages found in the search were accesible to the user. This isn’t possible if it’s a remote server doing the checking.
– Setting up the server to act as a proxy requires at least some knowledge of scripting and Apache

The main reason that client-side proxies haven’t been done before is the potential for security holes that it opens up. Chris Shiflett has a great article that covers the problems if XMLHttpRequest were opened up to allow cross-domain requests, which is equivalent to what MashProxy allows.

Julian Couvreur also has helped my understanding of the issues. He’s written something similar using Flash rather than Java, FlashXMLHttpRequest.

I’ll discuss the security policy I adopted in my next post, including the safeguards against abuse I’ve implemented and possible remaining problems.

In short, MashProxy is a Java applet that lets JavaScript request web pages, just like XMLHttpRequest, but without the same domain restriction. This let me build the SearchMash project, implementing my ideas on a better interface to search results. It’s open source, and up on SourceForge, and my hope is that other developers will use it as an easier way to create mashups. I want to see more mashing, and I think the server proxy requirements have been holding things back.