MashProxy’s implementation

Cog

MashProxy is a very small Java applet, only 385 lines long and 12Kb when it’s compiled. It’s job is to recieve URLs from the JavaScript of the page it’s running in, and return the HTML contents of those pages.

The first thing it has to do is call back the page it’s contained in when it’s initialized, so that the script knows it’s safe to start asking it to fetch pages. Calling an applet before it’s initialized doesn’t work, and this was the most reliable way to discover once it’s active. I also tried checking alive state and other methods, but this was the most reliable across different browsers.

It’s in the init function that the current document’s location is checked, and if it isn’t on the whitelist, or being run from a local file, the applet silently fails.

If it’s determined that it’s ok to run, the applet stores the current JavaScript window and document objects, which it will use for future callbacks, and then calls the SB_NotifyAppletLoaded_Forward() JavaScript function in the current document. This is hardcoded to this function name, it’s got a _Forward suffix because the SearchMash implementation runs the applet in a frame, and so the JSWindow calls back into that frame, and the function is just a stub that calls the real code in the main frame.

It also starts up a seperate thread to handle incoming requests. This is because callbacks into signed applets from JavaScript don’t get the same security privileges, so it’s necessary to just pass the information onto a different thread from such a function, and have that trusted thread do the actual work.

As far as I can tell from Sun’s comments on the need for this, it is to prevent accidental exposure of signed applet’s functions to JavaScript, and it does ensure that you have to explicitly enable any trusted code before you can call it. However, it may be that they were hoping to prevent JavaScript invoking trusted Java code at all, which is not the effect they achieved, they just made the code to do it more complex.

The main entry point to the applet is the pageRequest() function. This takes a URL, and a string containing the JavaScript function name to call back. For now I’ve disabled being able to set the callback, and hardcoded the call back to SB_PageRequestDone_Forward() in the applet’s frame, to make it tougher to use the applet for malicious third-party scripts loaded in other frames.

The callback then pushes the arguments into member variables of the applet, and signals the trusted thread to handle a request by waking it. The trusted thread reads in the arguments, and does the actual work.

It first fires off a header request using HTTPURLConnection to the web page to get the status code, and quickly discover if it’s missing or moved. After that, it requests the full page. There’s also a timeout thread on these that kicks in after 20 seconds of no response, and sets the status to HTTP_CLIENT_TIMEOUT.

When the results are returned, the pageRequestDone() function is called, either with null for the contents if the page couldn’t be found, or a BufferedReader representing the contents, the source URL and the status code. The reader is converted into a string, and the JavaScript callback function (hardcoded currently to SB_PageRequestDone_Forward) is called with the results.

The applet is able to handle multiple concurrent requests because of its threading model. One quirk is that I had to include a JSObject jar to enable the ‘liveconnect’ functionality I needed to be able to call back and forth between JavaScript and Java.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: