Facebook security hole revealed by open-sourcing their platform

Whitehat
Photo by Jek in the Box

I uncovered a way to run arbitrary Javascript within Facebook Applications while looking through the open platform source. I reported it to Facebook, and they’ve now fixed it, so here’s the details.

1 – Load up Internet Explorer (tested on 7.0, but should be valid on other versions, but not on Firefox or Safari)

2
– Go to the FBML test console at http://developers.facebook.com/tools.php?fbml . This tool allows you to preview how your app’s HTML will work after Facebook has run it through its security system.

3
. In the large box, copy and paste the following line:
<input type="text" onfocusin="alert(‘foo’);"/>
4. Click on the Preview button below the large text area
5. You should now see an empty text box appear below "Facebook FBML Test Console" on the right-hand side of the screen

6
. Click on the text box

Result: You should see an alert box appear containing the text "foo".

Why
is this significant?

Facebook carefully designs its applications system
to restrict Javascript to a very small set of safe functions. With full
access to Javascript, app developers can access all of the information
in your profile, can trigger actions such as mailing friends or adding
applications without permissions, and generally cause mayhem. The code
above just shows an alert box, but it can run any code you want, since
it’s not recognized as Javascript by Facebook’s security screening code.

How was this discovered?

I was looking through the recently
released Facebook Open Platform source code, and noticed that only a
small number of event attributes were being checked for in lib/fbml/wrapper.php. I had a
security bug in some of my own code that was caused by not catching all
the possible attributes, so that made me suspicious that their
production code might miss these too. Microsoft in particular has a
large number of little-known attributes for Internet Explorer
documented here: http://msdn.microsoft.com/en-us/library/ms533051(VS.85).aspx

I tried a few at random in the FBML console, and discovered that
onfocusin wasn’t scrubbed. It’s likely that more of those from the list
are missed too.

As I know to my own cost, it’s a massive
headache to catch all the possible ways of hiding a script within HTML.
I’ve spent a long time staring at the cross-domain scripting (XSS)
cheat sheet at http://ha.ckers.org/xss.html
which documents all the possible ways of fooling a parser, it might be
a good idea to test some of the additional methods they specify too.

How did Facebook respond?

The good news is they fixed the problem a couple of days after I reported it. It took a lot of work to find someone to approach, they don’t have any apparent process for this which is worrying. Instead I just trawled through their blog until I found someone who’d posted about security. He sent me an initial reply asking for more details, but then didn’t respond any further to my emails. All in all, not a very professional setup, they could learn a lot from more established companies. Even Microsoft has a program designed to help people report security issues! Now they’re responsible for a lot of sensitive personal information, they need to think very carefully about security.

Security hole in Facebook Footprints example

Blackhat
As I was working on Funhouse Photo last night, I ran into a bug where a database insertion operation was failing. I realized that this was because I had copied the Footprints example, and created the query using string concatenation, eg:

$res = mysql_query('SELECT `from`, `to`, `time` FROM footprints WHERE `to`=' . $user . ' ORDER BY `time` DESC', $conn);

In my case, it was an internally generated string that was causing the problem, because it contained an unescaped single quote. I realized that I was also passing in strings I’d received from the user via POST. This means that someone could easily run an SQL injection attack, and get full access to my database!

An injection attack works by passing a user input string that contains a quote and semi-colon to prematurely finish the query command the host thinks they’re running. The rest of the string contains some other arbitrary command that the attacker wants to run. In the query above, if $user was actually passed as

12345'; DROP TABLE footprints; --

then the complete query would read

SELECT `from`, `to`, `time` FROM footprints WHERE `to`='12345'; DROP TABLE footprints; --'ORDER BY `time` DESC

In this example a table’s deleted, but you could insert almost any SQL statement.

(Edit – PatMan pointed out in the comments that mysql actually has a nice security feature that only allows a single statement in a PHP mysql execute. This is reassuring, it definitely makes it harder to do any damage, though it still leaves the door open to plenty of data-access exploits)

There are two main ways to avoid this. In PHP4 and earlier, magic quotes is turned on by default, which means all posted strings have their quotes auto-magically prefixed with backslashes. This turned out to be a dodgy solution for a lot of reasons, and it was turned off by default in PHP5, which is what I’m running.

A better fix, and the one I ended up using, is to call mysql_real_escape_string() on any strings before you concatenate them into a query. You could also use something other than concatenation to create your query.

Now, I’m a PHP/mysql newbie, but it seems like a pretty Bad Thing that Facebook’s example app contains this security hole. Newbies like me are also the ones least likely to already know that it’s a security problem, and so will copy it blindly into their own apps.

Hopefully the guys over at Facebook will be able to update the example, before it infects too many other apps, and gives hackers access to all the personal data that’s being stored. I have posted on the developer discussion board, but it’s unclear if there’s any other way to alert the team.

Funhouse Photo User Count : 5. I worked on the ‘send a photo to a friend’ feature last night, that’s still not working (a story for another post) so I’ve held off publicizing it. It is in the queue for the directory now though.

Last modified, better preview, and ask

Clock_1
The status line now shows when a site was last modified, which can be useful to know if you’re searching for time-sensitive information. Not all sites return the header I use to find that out though.

I’d noticed there were some sites still showing up with missing pictures or bad formatting in the preview. I spent some time debugging some of the problem cases, and realized there were some bugs in the code I was using to insert the tag that resolves relative URLs. I put in fixes for those problems, which made a lot of CSS based sites appear much better.
It also allowed some scripts to run that hadn’t before, so I then had to fix the bugs in my script blocking that let those through. The end result is that preview works a lot more reliably.

I’ve also been experimenting with offering ask.com as an alternative to Google. I managed to get something working pretty quickly, but hit problems with their use of Javascript in the results pages. It works fine in Safari and Internet Explorer, but Firefox throws a lot of errors. I’ll be returning to debug that some more when I have a chance, for now I’ve left the code in, but I’m not providing any obvious links to it. If you want to experiment, use http://mashproxy.com/search?ask to invoke it.

Improved security foundation

Rainbow

I’ve checked in two big changes to the applet that prevent malicious scripts from causing harm, even if they make it through the script blocking, by removing cookies and blocking sites that don’t show up in a google search. The ability to access cookies and intranet sites is the pot of gold at the end of the rainbow for malicious hackers. Now the applet blocks that access, there hopefully shouldn’t be much point in trying to crack the other layers of security, such as script blocking.

First, I’ve written my own HTTP handler based on TCP/IP sockets, rather than using Java’s high-level functions. This means there’s no chance of cookies being sent with page requests. The high-level functions would automatically add them but my version doesn’t even have access to the cookie jar. So that removes the possiblity of accessing personal or secure information through cookies.

Second, I’ve limited the page requests to those either those that start with http://www.google.com/search? or that are contained in the last search results page returned from google. Since local intranet servers should not show up on google search results, this should ensure that only public servers are accessed.

These changes close down the two main dangers attributed to cross-domain XMLHttpRequests, sending private information with the requests, and intranet access. Chris Holland has an article talking about an informal RFC for cross-domain requests, and these are the key points he brings up. There’s also a good mailing list discussion he references that seems to agree.

My main difference with Chris is his assumption that the only way to ban intranet access is to rely on all accessible servers to implement a new protocol, explicitly allowing cross-domain access. I believe the technique I use, verification of the domain’s existence in a trusted directory of public servers, such as google’s search database, should be enough to exclude intranet sites.

I think there’s another reason for the general preference for an opt-in approach to cross-domain access, that pulling pages from another site without explicit permission is a Bad Thing. This is something I’ve given a lot of thought to (see Why the hell are you sucking my bandwidth for your mashup?), and my conclusion was that it depends on the context. I’m sticking pretty closely to how the original publishers intended their pages to be shown in a browser, ads and all, just providing a different work flow for searching.

I think this ability to remix and mashup external web-pages is something that can be abused, but also has huge potential to enrich user’s lives with new creations. It’s disruptive, and a bit subversive, but I think the world will be a better place with more mashups.

Security wishlist

Notebook

Writing SearchMash, there’s a lot of security features I wish I had access to.

  • One way access to frames
  • My big headache is running external, untrusted HTML in a frame that has full access to my applet. There should be a way to set a frame with programmatic content, but not allow any scripts in it access to other frames. I believe that the security=restricted frame attribute might allow this in IE, but it’s not supported by the other browsers.

  • Disabling scripting for frames
  • I’d be happy if I could turn off scripting entirely for a particular frame. This is what my blacklisting code does, but it seems like it would be a lot easier and more robust to do it at the browser level.

  • Turn off cookies for Java
  • I don’t want my page fetches to send cookies, but there doesn’t seem to be any way to disable this when running an applet inside the browser.

  • Signed scripts
  • These apparently allow JavaScript scripts the same privileges as signed Java applets. I say apparently because they’re only supported by the netscape family, so since I care about supporting IE, I haven’t tried them. If MS supported signed scripts, it would remove the need for a Java runtime. I don’t think it would solve any of the real security isses though.

    Most of these requests are for more finegrained control over the security restrictions within the browser. It’s mostly things that are exposed to the user as global switches anyway (like disabling cookies or JavaScript), so it doesn’t seem like it should be problematic to allow increased restrictions when required. I think they would make writing a secure Ajax app a lot easier.

Script blocking updated

Stop
I’ve checked in some changes to the SearchMash JavaScript, intended to entirely remove any script content from displayed pages. Previously I was just removing &lt script&gt tags, now I look for javascript: urls, eval calls and events (onload, etc). I’ve also added some ‘canonicalization’ (there should be a better word for that) steps to try and avoid some of the common workarounds for regex blocking, like inserting newlines or spaces.

I also changed over the way I work out if a link in a results page is a search link (and so should open in the left frame). This change should restrict the pages that get opened as search results to those on the google domain, previously it would have been possible for someone to set up a http://www.notgoogle.com domain and I’d open it there.

For both of these changes, I switched over to using regular expressions, since that’s a lot easier to understand than my previous logic.

I feel a bit better about the security of running external html in my domain now, I might try and get back to some feature work after this.

Now, I think I need to spend my remaining weekend with a margarita and a hot tub, and prepare for my real job tomorrow.

Security

Lock

No security is perfect, but there’s a number of important restrictions built into MashProxy and SearchMash.

The security in MashProxy is based around the Java applet being digitally signed. This signature confirms that the code has not been tampered with, and was written by the author. I’m using a certificate from Verisign, which I had to give solid proof of my identity to obtain.

This proof of identity means that it’s very easy to track someone down who did write malicious code. It reduces the decision of ‘Do I want to run this code on my machine?’ to ‘Do I trust the author of the code?’.

I’m an established and reputable open source programmer, as a search on my name will verify, and I’ve been distributing widely used executable code (such as FreeFrame plugins for many years with no security problems. This should give you some assurance that my intentions are good.

The other question is whether I’ve done a competent job safeguarding the security in my implementation? I believe I have, and to back that up I’ll cover the details of what I’ve done below. The source code is also freely available through the SourceForge project for anyone who wants to examine it for themselves.

The basic foundation of MashProxy is that signed Java applets running within a web browser are able to fetch pages from anywhere on the web, and aren’t restricted by the same-domain policy that controls most page access functions.

My initial attempt at a search mash ran entirely within a signed applet, but I discovered that support for parsing and rendering HTML was too limited in Java, and I wasn’t happy with what I could achieve.

Taking what I’d learnt, I looked into other possible ways of implementing the functionality I wanted. JavaScript fitted the bill, since it has great support for page display and parsing, is well supported and documented, and is cross-browser. The one thing it didn’t have is the ability to load pages from other domains.

However, it was able to call into a signed applet, and the applet could then load the pages from any domain. So, I took my original code for the full applet, and reduced it to just a single public function to load a web page. I was then able to call it from JavaScript and build SearchMash.

Now, since the applet doesn’t know what JavaScript is calling it, this opens up a lot of possibilities for abuse. I set out to engineer some restrictions into the applet to prevent malicious usage:

Applet Safeguards

The major safeguard is that I only allow the applet to run from pages on the petewarden.com or mashproxy.com, sites I control. This ensures that the applet can’t be used with my signature on other people’s sites. I also allow the applet to be run from pages that are on the local machine (checking for file as the protocol), with the assumption that local files are trusted. This means that development using my signed applet will still be possible by third parties using local files, but to deploy they’ll need to get a certificate and rebuild their own applet with their site added to the whitelist of trusted domains.

To check the page I’m running on, I call JApplet’s getDocumentBase() function, which returns the page that invokes the applet (not the location of the applet, that would still allow untrusted third parties to call it).

I also make sure that all requests begin with “http://&#8221;, (even though the HTTPURLConnection code that I call in Java shouldn’t allow anything else), to restrict any other protocols from being invoked.

The one thing I wasn’t able to do was prevent the passing of cookies automatically. I would prefer to always work without cookies, to avoid the possibility of accessing personal data, but these seem to be added automatically by the browser to all requests made through HTTPURLConnection.

SearchMash JavaScript

These are the measures I’ve taken to ensure the applet can’t be used maliciously by someone else. I’ll now cover what the JavaScript that implements SearchMash actually does, step by step, to provide evidence it won’t leave you vulnerable.

When the main index.html is loaded, it contains three frames, the one for results, the one for the preview, and a small one to hold the applet. The small one is necessary because some browsers don’t want to run invisible applets, which seems sensible.

The onload function calls code that Internet Explorer needs to run the applet automatically, because of patent issues. Then, the applet is loaded, and calls back to SB_NotifyAppletLoaded(), which triggers the first fetch of the google search page.

When a search page is loaded, I add JavaScript hooks into the page links so the user can navigate through pages without leaving the mashup. I also add mouse-over hooks to activate the preview frame. I parse all the links in the page, and put in page requests to the applet for all that appear to be external to Google.

The applet calls back to JavaScript when a page is loaded or determined to be missing. If it’s an external page, the current search page is searched for the link, the status text is updated, and the contents are stored for later use in the preview frame. If it’s a search results page, then the link hooks are added, and the search frame is updated.

At no point in the code is any information from the fetched pages passed back to my site, or any other external site. Those pages never leave your local machine, which ensures that your information is kept private.

Remaining Vulnerabilities

Being able to read from arbitrary domains opens up some malicious possibilities:

  • Reading from web pages using the browsers’ cookies to access personal information
  • Reading from web pages that are within a private intranet
  • Passing any information obtained back to the attackers website by encoding the information in a URL

Because the HTML from the remote sites that SearchMash fetches is loaded into frames that have access to the applet, any scripts inside the HTML loaded from external sites could use the applet’s functions.

I’ve added some checks to try and remove scripts from loaded pages, but it’s notoriously tough to parse out all scripts (see the myspace samy exploit).

Something that would limit the threat would be removing access to cookies. I’m looking into using a lower level functions in Java, or a third-party library that would do let me disable them.