The best barbecue and butchers in Los Angeles

Greenacreschicken

Driving along Simi Valley’s main street near lunch time, you may spot a cloud of smoke in the distance billowing from a sidewalk. As you get closer, going past the motor bike and hot tub stores, something will start to smell really, really good. That’s Green Acres market, where every day they have a massive outdoor barbecue with the tenderest chicken I’ve ever tasted.

The first time I came across Green Acres was at my local Vons meat counter. It was the second place I’d visited that day on the quest for a nice joint of beef on the bone for a traditional Sunday roast. The old butcher behind the counter shook his head at my request, that’s not something they had any demand for, straightforward cuts and ground beef were all they carried. Seeing the disappointment on my face he looked from side to side and then leant over the counter. "Try Green Acres market on Los Angeles Avenue, between Sycamore and Tapo Canyon".

Walking in, it wasn’t much bigger than a 7-11, but the whole of one long wall was taken up with fresh meat. One end started off with more styles of steak than I knew existed, ready to be cut to your liking. After that there was everything from pork, chicken and fish to pre-mixed marinaded selections for all sorts of dishes. Talking to the butcher serving me, he was able to help me pick out exactly the cut and size I wanted. After handing over a lovely prime rib joint, he reached under the counter and gave me a printed note card with their house recipe for perfect prime rib. It surprised me with a low temperature only 200 degrees farenheit for the last couple of hours, after an initial browning phase, but the result was the best roast I’ve ever cooked, with the meat melting in your mouth.

Their lunchtime barbecue is made with the same fresh meat they sell, and expertly cooked. Biting into their chicken sandwich is a beautiful experience, though probably one you want to keep a secret from your cardiologist thanks to their buttered rolls. The chicken is light and soft, with a barbecue sauce that still lets you taste the meat. I’ve never used their catering service myself, but it’s top of my list for the next time I food at an event.

Greenacresmenu

How to time mysql queries in PHP

Clockeye
Photo by BadBoy69

When you run a mysql query in the console, you get a line of information telling you how long it took to run. I was hoping to pull the same information in PHP to help me profile my database usage, but unfortunately there isn’t any way to access that directly through the API.

What you can do instead is time the mysql_query() call itself, by recording a time stamp before and after, and subtracting to get the total. This isn’t ideal since it will include a small amount of overhead for things like the socket connection to the database, but it will be good enough for most purposes. This is the code I’m using, as seen in phpMyAdmin:

list($usec, $sec) = explode(' ',microtime());

$querytime_before = ((float)$usec + (float)$sec);


// your query goes here

   

list($usec, $sec) = explode(' ',microtime());

$querytime_after = ((float)$usec + (float)$sec);


$querytime = $querytime_after - $querytime_before;

$strQueryTime = 'Query took %01.4f sec';

echo sprintf($strQueryTime, $querytime);

Facebook security hole revealed by open-sourcing their platform

Whitehat
Photo by Jek in the Box

I uncovered a way to run arbitrary Javascript within Facebook Applications while looking through the open platform source. I reported it to Facebook, and they’ve now fixed it, so here’s the details.

1 – Load up Internet Explorer (tested on 7.0, but should be valid on other versions, but not on Firefox or Safari)

2
– Go to the FBML test console at http://developers.facebook.com/tools.php?fbml . This tool allows you to preview how your app’s HTML will work after Facebook has run it through its security system.

3
. In the large box, copy and paste the following line:
<input type="text" onfocusin="alert(‘foo’);"/>
4. Click on the Preview button below the large text area
5. You should now see an empty text box appear below "Facebook FBML Test Console" on the right-hand side of the screen

6
. Click on the text box

Result: You should see an alert box appear containing the text "foo".

Why
is this significant?

Facebook carefully designs its applications system
to restrict Javascript to a very small set of safe functions. With full
access to Javascript, app developers can access all of the information
in your profile, can trigger actions such as mailing friends or adding
applications without permissions, and generally cause mayhem. The code
above just shows an alert box, but it can run any code you want, since
it’s not recognized as Javascript by Facebook’s security screening code.

How was this discovered?

I was looking through the recently
released Facebook Open Platform source code, and noticed that only a
small number of event attributes were being checked for in lib/fbml/wrapper.php. I had a
security bug in some of my own code that was caused by not catching all
the possible attributes, so that made me suspicious that their
production code might miss these too. Microsoft in particular has a
large number of little-known attributes for Internet Explorer
documented here: http://msdn.microsoft.com/en-us/library/ms533051(VS.85).aspx

I tried a few at random in the FBML console, and discovered that
onfocusin wasn’t scrubbed. It’s likely that more of those from the list
are missed too.

As I know to my own cost, it’s a massive
headache to catch all the possible ways of hiding a script within HTML.
I’ve spent a long time staring at the cross-domain scripting (XSS)
cheat sheet at http://ha.ckers.org/xss.html
which documents all the possible ways of fooling a parser, it might be
a good idea to test some of the additional methods they specify too.

How did Facebook respond?

The good news is they fixed the problem a couple of days after I reported it. It took a lot of work to find someone to approach, they don’t have any apparent process for this which is worrying. Instead I just trawled through their blog until I found someone who’d posted about security. He sent me an initial reply asking for more details, but then didn’t respond any further to my emails. All in all, not a very professional setup, they could learn a lot from more established companies. Even Microsoft has a program designed to help people report security issues! Now they’re responsible for a lot of sensitive personal information, they need to think very carefully about security.

How to speed up your website with Yslow

Snail
Photo by Ezu

One of the downsides of the increase in widgets and customization over the last few years is that they often result in a web page that takes seconds to load. Thanks to my desktop app heritage, I’m really sensitive to this, since poor responsiveness in an application destroys the user experience. The emotional response to waiting is frustration, and both gives users a subconscious motivation to avoid it and a chance to get distracted by something else and abandon your service.

That’s made me very wary of installing new widgets on this blog, since I sometimes see long loading times even now, and I’ve never quite been sure why. I wanted a new discussion service though, and Intense Debate looked very appealing, so I resolved to install it and also figure out how to profile my site.

Firebug is the best tool for getting under the hood and understanding what Firefox is up to when you load a page, but it’s more aimed at debugging script, CSS and markup problems rather than understanding performance issues. That’s where Yslow, a free plugin for Firebug from Yahoo, comes in.

It’s based on some principles of website optimization that Yahoo have worked out. It applies these rules programatically to your page and then gives you a report card giving details of problems in each are. My site received an F. There’s a whole lot of improvements I’ll be looking at implementing, but one that’s interesting is setting a long expiry time for external objects like scripts and images. This is inconvenient when you change a resource on the server, since you also need to change the name, but Yahoo estimate that 80% of fetches can be avoided in a typical scenario if you set an expiry header that allows the browser to cache the resource locally. I’ll be poking some of my widget providers to see if that’s possible.

I highly recommend giving Yslow a shot on your site, you’ll learn a lot about what your page loads actually involve, and probably get some ideas on improving performance.

Does the opening of Facebook’s source reveal anything?

Streaker
Photo by Paste Magazine

The first thing I discovered when I looked over Facebook’s recent platform code release was a security flaw that lets you run malicious Javascript through applications, bypassing their security, but I won’t be blogging any details until the team has implemented a fix.

When they recently released some of their platform code as open source (link seems to be temporarily down, but you can download the source directly here) it led to a lot of discussion on the strategic significance of the move, aimed at keeping Facebook’s lead in the application space against competitors like OpenSocial, and on the implications of the unusual CPAL license chosen.

I’m much more interested in the technical lessons you can learn about Facebook’s code and architecture from the source. From looking through it, I’m confident this is drawn from their actual production code, so it’s a rare glimpse inside the implementation of a web application battle-tested with millions of users. I’ve uploaded a version with an Xcode project for easy browsing if you want to explore for yourself on a Mac.

There’s a disappointing lack of swearing in the comments, though I did find one "omg this is so retarded" in typeaheadpro.js. With that fun out of the way, a good place to start after the main README is to search for "FBOPEN:" in all files, since this brings up comments that were added to document the parts the developers thought would be interesting to users of the open version.

Examining the basic structure confirms that Facebook are still basically a LAMP shop. The only part that I wondered about was the M of Mysql, since that’s traditionally been tough to scale., but all of the database access here is through raw SQL strings. They’re known for their use of memcache to speed up data fetching, but there’s no sign of it in the code they’ve released. I was hoping for some heavy-weight examples of how to handle snooping on updates to invalidate memcache entries, but no such luck. They do have an interesting pattern of assembling their query strings using printf style format strings and varargs, rather than directly appending, which results in cleaner-looking code. If you want to look at the implementation, that’s in lib/core/mysql.php.

One component I hadn’t seen before was Thrift, Facebook’s open source framework for building cross-language APIs and data structures. It takes an interface definition file, and then creates a lot of the glue you need to implement the methods and data structures in PHP, Java, C++, Ruby and Erlang. I was interested because I’ve found I need a lowest-common-denominator data definition and code generation framework as I end up bouncing between C++, PHP and SQL tables. They don’t address the database storage side, which I hit problems with too since some basic data structures like lists inside structures don’t translate into a relational database unambiguously.

They look like they hit similar illegal character problems to my XML parsing woes, since they’ve got a call to iconv(‘utf-8’, ‘utf-8//IGNORE’, $str) that they use to sanitize their input in strings.php.

What makes software patents so damaging?

Trespassing
Photo by Indrani Soemardjan

I have a visceral aversion to software patents, after several incidents in my career where I’ve been forced to deliver a lower quality product to the customer purely because of absurd patents. We ended up having to cut out not only a ghost car mode from one game, but any way of comparing your current time against your previous laps since that was also covered. I was also involved in finding prior art for a crazy patent for retiming video based on audio output, something that had been done for years but still got granted.

I know I’m not alone, pretty much every engineer I know agrees the system is broken, but I’ve had a hard time explaining that to people outside the industry. I recently came across a new book that might help explain the impact, Patent Failure by James Messen and Micheal J Muerer. Tim Lee has an extended review in three parts (1, 2, 3) on Megan McArdle’s blog, and for me the most fascinating evidence were these two graphs:

Patentcosts_chemical

Patentcosts_nonchemical_2

The dotted lines on each graph are the profits made from licensing patents, and the solid lines are the total costs to alleged infringers. The top shows the chemical and pharmaceutical industries, where the costs of defending against patent litigation are a small fraction of the licensing fees. By contrast, in all other industries a lot more money changes hands to defend infringement actions than is made by licensing.

The graph only goes up to 2000, and if my experience is anything to go by, the gold rush by patent trolls to sue deep pocket companies has only increased since then. This seems like a pretty clear and quantitative sign that something is rotten in the state of non-chemical patents, since infringement penalties were meant to be a deterrent to trespassing on someone else’s IP, not the main way of making money from inventions. Either the non-chemical industries are full of bad actors stealing other people’s ideas, or it’s simply not possible to avoid being vulnerable to patent infringement actions, since there’s obviously a big financial incentive to not infringe and it’s still happening.

Tim digs up a couple of reasons why there might be such a difference between the two sectors. With chemical compounds, there’s a very standard way of describing them in a formula, and so it’s a simple process to search patents and see if what you’ve discovered has already been claimed. By contrast, with software patents it’s literally impossible to first find all the relevant patents, and then tell if you’re infringing.

Searching patents for software and expecting to find all or even most is like searching the web for every page that talks about a subject. There’s an almost infinite set of natural language variations that might find what you’re after, and any search will also bring up a large number of irrelevant results. Worse, it’s not obvious even to patent attorneys if a given piece of engineering infringes a patent. Tim gives the great example of the point-of-sale kiosk patent that suddenly got accepted as applying to websites with no physical presence. Not even an attorney would have imagined that was relevant until the court decision accepted its broad scope.

The original rationale behind patents was to distribute information about new inventions in exchange for a temporary monopoly for the inventor. Software patents are almost entirely useless as an information source, and it’s pretty obvious from these figures that their main effect is as an employment boon for lawyers and patent trolls who produce nothing that helps society. The damage to the software industry in stifled innovation is huge, when patents were supposed to do exactly the opposite.

Where to hike and camp in Big Sur

Salmoncreek
Photo by Jon Iverson

I was preparing my own article on some of the trails and camps I went to last month in Big Sur, but then I discovered the wonderful http://hikinginbigsur.com. This site is beautifully designed with some breathtaking photos, clear, witty and concise hike directions, and well-drafted maps.

On my last trip I car-camped at Limekiln State Park, but we took a day hike to check out some of the walk-in campgrounds nearby in southern Big Sur. These are always handy if you want to camp out at a busy time of the year, or on short notice, since walk-ins rarely require reservations and are generally lightly used thanks to the hurdle of backpacking everything in.

We started off at the Salmon Creek trailhead, near one of my favorite waterfalls, and headed towards Spruce and Estrella campgrounds. Jon has a great description and map of the hike, and we checked out the camps along the way. The first one is Spruce, at around 2 miles up the trail, after some stiff climbing. It’s in a nice location, near the junction of two streams in a shaded grove. There’s some fire pits and flat areas for tents, but no other amenities. You’ll need a fire permit from the Forest Service to stay overnight at either of these campgrounds. It’s also a good idea to check in with a ranger station to let them know you’ll be staying overnight, and check on conditions. Here’s one of the fire pits:

Spruce

About 1.5 miles further on is the second campground, Estrella. There’s a couple of steep, slippery sections that might be tough to navigate with a backpack, though it’s probably safe if you’re careful. It’s located under some firs in a small meadow, with a stream below. I couldn’t find out if this and the one at Spruce were year-round, but it seemed likely based on the strength of the flow in May, and seeing the Salmon Creek waterfall going strong in late summer. You should always bring enough water to get you there and back in case it is dry, but otherwise relying on treating the stream seems safe. Like Spruce, there’s a few fire pits scattered around, as well as some flat sites for tents.

Estrella

If you want to find more hike-in campgrounds, check out the forest service maps. Here’s one for the Ventana Wilderness that we were hiking through:

Bigsurmap

 

When should you use sessions in PHP?

List
Photo by BrittneyBush

For anyone used to traditional desktop programming switching to the web, one of the hardest things to wrap your head around is the lack of state. There’s no inherent way of keeping information around when you’re interacting with a user. Each page request starts with a blank slate, you don’t have in-memory variables that can keep track of useful information.

If you’re working in PHP, this is where sessions look like a great solution. They’re a general-purpose mechanism built around cookies, and let you store arbitrary variables that are remembered across all page requests from a particular user. Under the hood, they set a single sessionid cookie on the user’s machine, that’s sent along with any subsequent page requests. That id is used to load a file from the server’s disk containing a list of variable names and values that are stored for that user. Any changes or additions the server makes to the data are saved into that same file.

From the programmers point of view, you call session_start() and then have access to a global associative array, $_SESSION[]. You set and read entries in this array, and they remain persistent for page requests for a given user as long as they keep sending the cookie. This all looks like a very natural model for storing state, one that traditional app programmers would feel very comfortable with. You could do something similar by setting cookies directly, but then you’re exposing a lot of information to the user, and opens the door to malicious tinkering with your internal server variables.

As you might have guessed, there’s no such thing as a free lunch, and sessions have some significant drawbacks. The data is stored in a file on the server’s disk, which means that you’re tied to a single server and can’t load balance without duplicating that file and any changes across all machines. The file is locked so it can only be accessed by one request at a time, which means that simultaneous requests get serialized, which is a serious problem if you have a long-running calculation in one of them. The locking also results in deadlocks if you’re making sub-requests within the main page request to get parts of the page, and passing the session id cookie manually. In general the behind-the-scenes nature of sessions make it tough to tell who’s connected and debug state problems.

Some of these issues are fixed if you write your own handler to back up the sessions to a database, rather than to file. You still end up locking though, and the database access makes the operation much more expensive. It also requires some planning ahead to know exactly what state you want to store, which abandons a lot of the flexibility that makes sessions so useful.

I ended up with my own API for storing and reading information about each session in a database, using a special cookie ID as a key, generated once a user logs in and is authenticated. I also have a convention where the ID is passed through POST or GET parameters to make sub-requests very easy. It isn’t that different from storing sessions in a database, but it does avoid the locking problem, and makes the database cost explicit on the programming side. The fact that it’s associated with a particular user, and can only be created by logging in, makes it harder to spoof too, and lets you limit the number of connections for a single user.

Denormalization: The forbidden optimization

Lambada

One of the key principles you learn about relational database design is always normalize your data. This gets quietly thrown out of the window once you have to scale up web applications, either to cope with large numbers of users, or large amounts of data. I’m hitting this point with the 500,000 emails in the Enron collection, and chatting with Kwin from Oblong made me realize what a dirty little industry secret this optimization is.

Normalization means storing any fact in just one location in your database, so that you avoid update anomalies where you have conflicting information in multiple places. For instance, if you have someone’s address stored in several tables, you might forget and only update one of them when it changed, but still be using their old address for queries relying on the other tables. The downside to normalization is that many queries require joins to fetch all the information you need, and these can be slow on large data sets.

Denormalization means duplicating facts across different tables so that reads can avoid joins and so run much faster. It’s dangerous because you lose the safety net of automatically robust updates, and you have to make your data-writing code more complex. For my work on mail, and for most web services, the real time is spent on reading data, which is why it’s such an appealing optimization.

It’s actually just another form of caching, on the memory-intensive end of the classic performance/memory tradeoff spectrum. Memcached is another layer of caching that works well if you’ve got a lot of repeated queries, though again it complicates the update logic. Indexing within a database is another form of caching frequently needed data, though that’s handled behind the scenes for you.

There’s some fascinating case studies out there on how sites like Ebay and Flickr have broken all the old rules to get the performance they need. Google’s BigTable doesn’t specify anything about normalization, but the fact that it’s a simple map between keys and values, with no complex queries possible, makes it very tempting to duplicate your data with keys for the common read operations.