Security hole in Facebook Footprints example

Blackhat
As I was working on Funhouse Photo last night, I ran into a bug where a database insertion operation was failing. I realized that this was because I had copied the Footprints example, and created the query using string concatenation, eg:

$res = mysql_query('SELECT `from`, `to`, `time` FROM footprints WHERE `to`=' . $user . ' ORDER BY `time` DESC', $conn);

In my case, it was an internally generated string that was causing the problem, because it contained an unescaped single quote. I realized that I was also passing in strings I’d received from the user via POST. This means that someone could easily run an SQL injection attack, and get full access to my database!

An injection attack works by passing a user input string that contains a quote and semi-colon to prematurely finish the query command the host thinks they’re running. The rest of the string contains some other arbitrary command that the attacker wants to run. In the query above, if $user was actually passed as

12345'; DROP TABLE footprints; --

then the complete query would read

SELECT `from`, `to`, `time` FROM footprints WHERE `to`='12345'; DROP TABLE footprints; --'ORDER BY `time` DESC

In this example a table’s deleted, but you could insert almost any SQL statement.

(Edit – PatMan pointed out in the comments that mysql actually has a nice security feature that only allows a single statement in a PHP mysql execute. This is reassuring, it definitely makes it harder to do any damage, though it still leaves the door open to plenty of data-access exploits)

There are two main ways to avoid this. In PHP4 and earlier, magic quotes is turned on by default, which means all posted strings have their quotes auto-magically prefixed with backslashes. This turned out to be a dodgy solution for a lot of reasons, and it was turned off by default in PHP5, which is what I’m running.

A better fix, and the one I ended up using, is to call mysql_real_escape_string() on any strings before you concatenate them into a query. You could also use something other than concatenation to create your query.

Now, I’m a PHP/mysql newbie, but it seems like a pretty Bad Thing that Facebook’s example app contains this security hole. Newbies like me are also the ones least likely to already know that it’s a security problem, and so will copy it blindly into their own apps.

Hopefully the guys over at Facebook will be able to update the example, before it infects too many other apps, and gives hackers access to all the personal data that’s being stored. I have posted on the developer discussion board, but it’s unclear if there’s any other way to alert the team.

Funhouse Photo User Count : 5. I worked on the ‘send a photo to a friend’ feature last night, that’s still not working (a story for another post) so I’ve held off publicizing it. It is in the queue for the directory now though.

Facebook app statistics

Chart

One of the things I really like about having an app on Facebook is that I can easily see how many people are using it. This may sound trivial, but for my Firefox and IE extensions, I’m stuck trying to estimate usage based on downloads. Now I’m on the main Firefox add-on site, I can get stats on downloads from there, but they’re a bit opaque:
Firefoxstats

It’s good to see I’m approaching 4000 downloads in the past few weeks, but until recently that total had stayed at 867 for a long time. And that per-week total has always shown 1. So I’m not convinced they’re very up-to-date, though I’m pretty sure they aren’t over-counting downloads, just slow in showing them.

As an aside, these stats indicate that focusing my message really paid off. PeteSearch had only around 200 downloads after several weeks, on the same site.

For my IE plugin, I have to go off my web site logs. I’ve used some of the built-in traffic-monitoring tools, but they all seem geared towards overall site statistics rather than the particular measure I want: "How many people with agents you recognize (eg not robots) downloaded this file over a particular period of time? How many since the beginning of time, and were they repeat customers (ie upgrading)?"

What I end up doing is just looking at the raw latest visitors log, and getting a qualitative impression of downloads. What I see is that Firefox users outnumber IE by at least ten to one. There are several possible explanations for this:

  • My promotion’s been a lot more effective in the Firefox market.
  • People who are early-adopters interested in something like GoogleHotKeys are also more likely to have adopted Firefox.
  • IE users don’t have much of an add-on selection, so they’re unused to installing extensions, and are probably more wary of the security risks.

Even once you know the download totals, unless you build in an unpopular phoning-home capability, it’s hard to know how many people are actually using it. I have a help link added to every Google page that users are free to click on, so that is at least an existence proof for usage; if I see it showing up, I know somebody’s gone ahead and installed it.

Facebook gives both the developer and the rest of the world a simple and up-to-date view of how many people are using an app. This should help me understand what’s working and what isn’t, and learn from my customers very quickly. It’s also a good motivating score-card!

To provide some edutainment, I’ll include the current number of Funhouse Photo users in every post, along with a short explanation of anything that’s happened to explain them. Here’s the first report:

5 users – This includes my sister! I’ve sent out links to a few people to test it, but it’s not in the directory, and I haven’t tried to promote it in any other ways yet, since there’s still some bugs to iron out.

Funhouse Photo launched

Clown

As I mentioned yesterday, I’ve been interested in doing server-side image processing using ImageMagick. After a couple of evenings of hacking, I’ve now created my first Facebook app, Funhouse Photo!

It allows you to apply fun effects to your portrait, along with a caption, and have that show up in your profile. There’s currently twenty effects, with some preset captions that you can customize.

Funhousescreenshot

I’m also planning on adding the ability to play with your friend’s pictures, and send them on. I’m keeping it out of the public directory until it’s had a bit of testing, but if you’re on Facebook, give it a whirl!

Here’s roughly how it works:

  • The current user portrait URL is retrieved from the facebook API
  • wget is used to pull that image down to my server
  • I then run the presets, stored as command-line ImageMagick scripts, with the portrait as the input image
  • Both the retrieved portrait and the processed images are cached on-disk, so subsequent calls won’t involve any processing
  • The user picks one of the effects, and the choice is stored in a mysql database
  • If they then customize the caption, that’s also stored
  • When both choices have been made, the FBML of the profile frame is set to point to the processed image’s URL

One of the things I’m most interested to see is how the performance holds up with multiple users. I can make a rough estimate of its CPU usage, but there’s also a lot of other key factors in overall performance. In particular, pulling down the original image using wget seems like an unusual use of a hosting server, and I assume they’re set up for serving data, rather than pulling it, so I’ll be curious to see how that works out. The image processing itself is obviously CPU intensive, but I’m hoping with such small images (100×75), it won’t be too bad.

ImageMagick review

Imlogo

I’ve known about ImageMagick for a while, and I’ve been looking for a chance to try it out. If you haven’t run across it, it’s a command-line image processing tool, and having finally played with it for a facebook app, I’m very impressed!

It’s installed everywhere

I was very pleasantly surprised to find that my dreamhost account already had version 6.2 pre-installed. Even more surprising, my old and usually-sparse WebHSP server does too! This meant no messing around trying to build it from source, or install the binaries and deal with dependencies. The only disappointment was finding that it wasn’t built in to OS X on my home machine, though there are binaries available of course.

It’s heavily used

The framework is in its sixth version, and it’s obviously being used by a lot of people. This gives it two big advantages; there’s been a lot of testing to catch bugs, and there’s some great documentation available. I particularly love the usage documentation. It’s bursting at the seams with examples solving practical problems, and that’s the way I learn best.

It’s both elegant and full of features

It must have been a tricky balancing act to get through six versions, with all the change in people and requirements, and keep the conventions for all the commands consistent, whilst constantly adding more. The way the image flow is specified is necessarily pretty hard to visualize, since it’s inherently a tree structure that’s been compressed into a line of text, but since all the operations work with the flow in the same way, it’s possible to figure out what’s going on without referring to the documentation on specific options.
The features include a wide and useful range of built-in filters, and a variety of expansion mechanisms. The only thing I missed was a filter plugin SDK, it would be nice to add to the current set using third-party operations.

It’s fast

I haven’t measured performance quantitatively, but I’ve been experimenting with doing heavy operations on large arrays of images across the web, and I haven’t seen any lag yet.

As an example, here’s how to call ImageMagick to generate a thumbnail through PHP. I stole this from the excellent dreamhost support wiki, and you’ll need to make sure you can exec() from your version of PHP if you’re on another hosting provider.

<?php
       $location='/usr/bin/convert';
       $command='-thumbnail 150';
       $name='glass.';
       $extfrm='jpg';
       $extto='png';
       $output="{$name}{$extto}";
       $convert=$location . ' ' .$command . ' ' . $name . $extfrm . ' ' . $name . $extto;
       exec ($convert);
       print "<img src=" . $output . ">";
?>

Welcome hackszine readers!

Pylon

Jason Striegel over at hackszine, the blog of Maker magazine, has been a big supporter of my hacking with Google, and has just published an update on my IE porting work. He mentions the wiki I’ve set up to shed light on the obscure world of IE plugins, and you can look forward to lots of other fun stuff on the Facebook API here as I learn more about it. Thanks for the mention Jason!

The spectacular Angeles mountains

Angeles1

I just got back from a gruelling hike up Mount Baden-Powell in the Angeles mountains, the range that surrounds the north of Los Angeles. They’re a different beast than the Santa Monicas, higher, steeper and more rugged, pure heaven for someone who grew up in a place flatter than Kansas. In the winter, I often see snow on the peaks above me as I’m driving through a heat-wave in the Valley!

Yesterday, me, Liz and a few friends from work did a 2,800′ climb over four miles, to get to the peak at 9,400′. That was a killer, especially with the elevation gain. We, of course, ran into the obligatory old bloke near the top, who had hiked twelve miles to get there from the other end, and was happy to tell us about all the 17,000′ one-day elevation gains he’d done. My excuse is that all some of these retired folks do is go tearing around the mountains all week. How am I meant to compete with that, sitting in front of a computer for ten hours most days?

On a clear day, you can look down on the desert 8000 feet below to the north, and out to the Channel Islands in the Pacific to the north. The only down-side is that it’s so wind-swept and snow-covered in the winter, that only a few hardy plants survive at higher elevations, mostly gnarled pines and a few tough shrubs. The Santa Monicas are still my favorite haunt, with all their wild-flowers and waterfalls, but I love having such a spectacular range on my doorstep.

Angeles2

“I’ll tell you what to do.”

Charles
"Go that way, really fast. If something gets in your way, turn." – Charles, describing how to ski the ‘impossible’ k12 in Better Off Dead.

I had a big response to my last post on web annotation. One sentence in there sticks out as similarly accurate, but useless advice: "you can build it, and they won’t come, unless you get marketing and distribution right too." So here’s some more concrete analysis of why people don’t come.

There are three reasons someone might use your product; because they like playing with new technology, because they have a painful problem they need to solve, or because it looks like fun.

Geeks are driven by a lust for technology, business customers want to stop the pain, and consumers want to kill boredom. OneNote is the only tool that’s totally focused on the business market, the others are after consumers, but actually seem to be mostly limited to geeks so far.

I did find another good summary of the annotation market from techcrunch. One related tool I didn’t mention before was Stumbleupon, and compared to the others it’s got a stellar adoption rate, with around two million registered users in April.

What’s different about Stumbleupon? It’s instant fun. All of the other services require the user to invest some time and effort before they get a reward. With SU, you install, answer a few quick questions, and it starts showing you sites straight away. That’s fun! That’s what pulls in consumers.

One of my first jobs was helping to port the PC game Diablo to the Playstation 1. Gary Liddon, a programming god from the 8-bit days, explained to me a big reason why it was so successful; every monster you kill explodes in a big, flashy, noisy shower of gold and goodies. It’s like an infinite series of pinatas, people respond to quick and frequent rewards.

Diigo is a beautifully crafted tool, but it doesn’t offer a reward until you’ve spent some time setting things up, linking with friends, and creating your own content. That’s fine for us geeks, or people who have a strong enough need to invest that time to solve a problem, but it’s not going to be something that grabs the idle masses, even though a lot of them would like it once they had put the effort in!

Trailfire takes a bit more of a consumer-focused approach, with popular trails right on its front-page, giving an immediate example of what reward you’ll get. Diigo does have a ‘what’s hot’ page, but it’s deeper in the site.

This may sound like I’m treating consumers like idiots, but the truth is that most normal people don’t enjoy playing with new technology, they need a strong reason to try it, and they find any setup or installation really annoying, even things we’d not think twice about.

PC games don’t sell, apart from a few blockbusters, and even they have tiny sales figures compared to pretty much any franchise sports game. A lot of people who buy consoles already have a PC with comparable power, so why do they bother? The difference is, you (or your granny buying your christmas present) makes sure the sticker on the game’s box says Wii in the store. Then you put the disc in the console, turn it on and play. People buy the game because they anticipate a quick and reliable reward.

With a PC, you take a notebook full of specs to the store to make sure you reach the minimum requirements, then you sit through eight screens of installation mumbo-jumbo and requests to make choices about indecipherable technical details. Then you realize there’s a DLL conflict, and Office no longer runs. There’s no certainty of a reward at the end, and it takes a long time to get there, so most people don’t bother.

Most Web 2.0 tools seem doomed to a geek ghetto, unless we can get the time-to-reward down to seconds rather than minutes or hours.

Comments and notes and annotations, oh my!

Highlight
In the real world, people add their own layer of information on top of printed documents by scribbling notes on them, and highlighting passages. They can share the modified document with their friends, and pass on their opinions and insights.

This is really hard to do on the computer. Distribution formats like HTML pages and pdfs are designed to deliver information from the producer to the consumer. You can’t even add your own notes, let alone share them with others.

With the real-world analogy so glaring, there’s been a lot of attempts to fix this. The wikipedia article on web annotation alone includes twenty-two active projects! In a different direction, MS Office now includes OneNote, which is a tool for creating free-form notes, and sharing them with friends. It focuses on pulling content into a separate notebook, rather than adding notes on top of the content.

So why haven’t you heard of most of these web annotation tools? Casey has a post covering the history of web annotation, from 2004. There’s been a few successes since then, but most of the annotators have struggled to get the critical mass of users they need to be useful. One of my favorites is JumpKnowledge, which has a focus on letting you email pages with your comments on top, using their AWE Firefox plugin.

Reading their blog is pretty painful. When Yaakov describes trying to get coverage for their system as "like banging my head against a wall" I can sympathize. He’s got some great reviews, from some big names, but it seems like even they’re no longer in active development.

[Update- I just heard from Yaakov, and they have been working on some interesting new features, which is great news! See his comment on this post]

This is a good reminder you can build it, and they won’t come, unless you get marketing and distribution right too.

Diigo seems like the most successful annotator out there at the moment. It’s in active development, supports a lot of cool features like sharing with groups, and some neat search functionality for things like looking for inbound links to the current page, or searching on the same site. Trailfire is another very active annotation tool, that does some content analysis of the page you’re on to figure out if there are related pages in its database.

It still feels like the killer app for web annotation is missing. Maybe people really don’t want to scribble on web pages as much as us techies think, or maybe we just haven’t found the right way to do it. I thought that JumpKnowledge‘s focus on mailing web pages together with annotations had a lot of potential, but it doesn’t seem to have caught on. Diigo has tools for easily posting annotated pages to a blog, which also seems like it could be really popular, but I don’t see too many blogs being created that way.

Take that, Google!

Kick

Mike Gunderloy over at WebWorker Daily included GoogleHotKeys in their weekend roundup, and I got a real kick out of his "Take that, Google!" title! Thanks too to Anne Zelenka for passing that on to Mike. Her blog has some really thoughtful articles on a wide range of subjects, including whether to focus on a niche for your blog or just post about everything that interests you. That’s definitely a struggle I can sympathize with!

The photo’s one I discovered through Stock Exchange, by Linden Laserna. Check out his Deviant Art page for more eye-catching photography.

BHOs and threads

Threads2

Vlad Simionescu asked me some questions about how BHOs behave with threads. This isn’t an area either of us have been able to find documentation on, so I’ll just have to give a description of what I’ve seen in practice. I’ll be posting a request to the MSDN forum to see if anybody in a position to give a definitive answer can correct anything I’ve got wrong.

Internet Explorer uses a single-threaded apartment model for threading, as I discovered in a post from Tony Schreiner. Exactly what this means, I have no idea, since I’m just used to the plain unix pthreads, but I’m sure some googling would resolve the differences between the various windows thread models.

In practice, it appears that each window (pre-IE7) or each tab (IE7) has its own thread. Since there’s a BHO instance created for every browser instance, this means that there’s a one-to-one mapping between each instance of your BHO, and a thread. It seems like every Invoke() call to an instance of a  BHO is made on the same thread, and that thread is the one associated with the browser tab/window.

This is important, because as Tony’s post explains, you can’t use COM interfaces on different threads without jumping through some hoops. This allows you to store pointers to COM objects that will be used from descendants of your Invoke() method, without losing sleep about possible threading errors. It seems like you only have to worry about making your code thread-safe if you create your own threads. You do need to be able to cope with multiple BHO instances running simultaneously on different threads, but this should be trivial as long as you’re avoiding global variables.

This one-thread-per-BHO behavior is implicitly relied on in both Sven Groot’s examples and my work on hooking into the windows messaging procedure. We both use the current thread to work out which BHO we should pass events onto, since there’s no other way to map a window procedure call with a plugin.

More posts on porting Firefox add-ons to IE