No security is perfect, but there’s a number of important restrictions built into MashProxy and SearchMash.
The security in MashProxy is based around the Java applet being digitally signed. This signature confirms that the code has not been tampered with, and was written by the author. I’m using a certificate from Verisign, which I had to give solid proof of my identity to obtain.
This proof of identity means that it’s very easy to track someone down who did write malicious code. It reduces the decision of ‘Do I want to run this code on my machine?’ to ‘Do I trust the author of the code?’.
I’m an established and reputable open source programmer, as a search on my name will verify, and I’ve been distributing widely used executable code (such as FreeFrame plugins for many years with no security problems. This should give you some assurance that my intentions are good.
The other question is whether I’ve done a competent job safeguarding the security in my implementation? I believe I have, and to back that up I’ll cover the details of what I’ve done below. The source code is also freely available through the SourceForge project for anyone who wants to examine it for themselves.
The basic foundation of MashProxy is that signed Java applets running within a web browser are able to fetch pages from anywhere on the web, and aren’t restricted by the same-domain policy that controls most page access functions.
My initial attempt at a search mash ran entirely within a signed applet, but I discovered that support for parsing and rendering HTML was too limited in Java, and I wasn’t happy with what I could achieve.
The major safeguard is that I only allow the applet to run from pages on the petewarden.com or mashproxy.com, sites I control. This ensures that the applet can’t be used with my signature on other people’s sites. I also allow the applet to be run from pages that are on the local machine (checking for file as the protocol), with the assumption that local files are trusted. This means that development using my signed applet will still be possible by third parties using local files, but to deploy they’ll need to get a certificate and rebuild their own applet with their site added to the whitelist of trusted domains.
To check the page I’m running on, I call JApplet’s getDocumentBase() function, which returns the page that invokes the applet (not the location of the applet, that would still allow untrusted third parties to call it).
I also make sure that all requests begin with “http://”, (even though the HTTPURLConnection code that I call in Java shouldn’t allow anything else), to restrict any other protocols from being invoked.
The one thing I wasn’t able to do was prevent the passing of cookies automatically. I would prefer to always work without cookies, to avoid the possibility of accessing personal data, but these seem to be added automatically by the browser to all requests made through HTTPURLConnection.
When the main index.html is loaded, it contains three frames, the one for results, the one for the preview, and a small one to hold the applet. The small one is necessary because some browsers don’t want to run invisible applets, which seems sensible.
The onload function calls code that Internet Explorer needs to run the applet automatically, because of patent issues. Then, the applet is loaded, and calls back to SB_NotifyAppletLoaded(), which triggers the first fetch of the google search page.
At no point in the code is any information from the fetched pages passed back to my site, or any other external site. Those pages never leave your local machine, which ensures that your information is kept private.
Being able to read from arbitrary domains opens up some malicious possibilities:
- Reading from web pages using the browsers’ cookies to access personal information
- Reading from web pages that are within a private intranet
- Passing any information obtained back to the attackers website by encoding the information in a URL
Because the HTML from the remote sites that SearchMash fetches is loaded into frames that have access to the applet, any scripts inside the HTML loaded from external sites could use the applet’s functions.
I’ve added some checks to try and remove scripts from loaded pages, but it’s notoriously tough to parse out all scripts (see the myspace samy exploit).
Something that would limit the threat would be removing access to cookies. I’m looking into using a lower level functions in Java, or a third-party library that would do let me disable them.