How to time mysql queries in PHP

Clockeye
Photo by BadBoy69

When you run a mysql query in the console, you get a line of information telling you how long it took to run. I was hoping to pull the same information in PHP to help me profile my database usage, but unfortunately there isn’t any way to access that directly through the API.

What you can do instead is time the mysql_query() call itself, by recording a time stamp before and after, and subtracting to get the total. This isn’t ideal since it will include a small amount of overhead for things like the socket connection to the database, but it will be good enough for most purposes. This is the code I’m using, as seen in phpMyAdmin:

list($usec, $sec) = explode(' ',microtime());

$querytime_before = ((float)$usec + (float)$sec);


// your query goes here

   

list($usec, $sec) = explode(' ',microtime());

$querytime_after = ((float)$usec + (float)$sec);


$querytime = $querytime_after - $querytime_before;

$strQueryTime = 'Query took %01.4f sec';

echo sprintf($strQueryTime, $querytime);

How to speed up your website with Yslow

Snail
Photo by Ezu

One of the downsides of the increase in widgets and customization over the last few years is that they often result in a web page that takes seconds to load. Thanks to my desktop app heritage, I’m really sensitive to this, since poor responsiveness in an application destroys the user experience. The emotional response to waiting is frustration, and both gives users a subconscious motivation to avoid it and a chance to get distracted by something else and abandon your service.

That’s made me very wary of installing new widgets on this blog, since I sometimes see long loading times even now, and I’ve never quite been sure why. I wanted a new discussion service though, and Intense Debate looked very appealing, so I resolved to install it and also figure out how to profile my site.

Firebug is the best tool for getting under the hood and understanding what Firefox is up to when you load a page, but it’s more aimed at debugging script, CSS and markup problems rather than understanding performance issues. That’s where Yslow, a free plugin for Firebug from Yahoo, comes in.

It’s based on some principles of website optimization that Yahoo have worked out. It applies these rules programatically to your page and then gives you a report card giving details of problems in each are. My site received an F. There’s a whole lot of improvements I’ll be looking at implementing, but one that’s interesting is setting a long expiry time for external objects like scripts and images. This is inconvenient when you change a resource on the server, since you also need to change the name, but Yahoo estimate that 80% of fetches can be avoided in a typical scenario if you set an expiry header that allows the browser to cache the resource locally. I’ll be poking some of my widget providers to see if that’s possible.

I highly recommend giving Yslow a shot on your site, you’ll learn a lot about what your page loads actually involve, and probably get some ideas on improving performance.

When should you use sessions in PHP?

List
Photo by BrittneyBush

For anyone used to traditional desktop programming switching to the web, one of the hardest things to wrap your head around is the lack of state. There’s no inherent way of keeping information around when you’re interacting with a user. Each page request starts with a blank slate, you don’t have in-memory variables that can keep track of useful information.

If you’re working in PHP, this is where sessions look like a great solution. They’re a general-purpose mechanism built around cookies, and let you store arbitrary variables that are remembered across all page requests from a particular user. Under the hood, they set a single sessionid cookie on the user’s machine, that’s sent along with any subsequent page requests. That id is used to load a file from the server’s disk containing a list of variable names and values that are stored for that user. Any changes or additions the server makes to the data are saved into that same file.

From the programmers point of view, you call session_start() and then have access to a global associative array, $_SESSION[]. You set and read entries in this array, and they remain persistent for page requests for a given user as long as they keep sending the cookie. This all looks like a very natural model for storing state, one that traditional app programmers would feel very comfortable with. You could do something similar by setting cookies directly, but then you’re exposing a lot of information to the user, and opens the door to malicious tinkering with your internal server variables.

As you might have guessed, there’s no such thing as a free lunch, and sessions have some significant drawbacks. The data is stored in a file on the server’s disk, which means that you’re tied to a single server and can’t load balance without duplicating that file and any changes across all machines. The file is locked so it can only be accessed by one request at a time, which means that simultaneous requests get serialized, which is a serious problem if you have a long-running calculation in one of them. The locking also results in deadlocks if you’re making sub-requests within the main page request to get parts of the page, and passing the session id cookie manually. In general the behind-the-scenes nature of sessions make it tough to tell who’s connected and debug state problems.

Some of these issues are fixed if you write your own handler to back up the sessions to a database, rather than to file. You still end up locking though, and the database access makes the operation much more expensive. It also requires some planning ahead to know exactly what state you want to store, which abandons a lot of the flexibility that makes sessions so useful.

I ended up with my own API for storing and reading information about each session in a database, using a special cookie ID as a key, generated once a user logs in and is authenticated. I also have a convention where the ID is passed through POST or GET parameters to make sub-requests very easy. It isn’t that different from storing sessions in a database, but it does avoid the locking problem, and makes the database cost explicit on the programming side. The fact that it’s associated with a particular user, and can only be created by logging in, makes it harder to spoof too, and lets you limit the number of connections for a single user.

Denormalization: The forbidden optimization

Lambada

One of the key principles you learn about relational database design is always normalize your data. This gets quietly thrown out of the window once you have to scale up web applications, either to cope with large numbers of users, or large amounts of data. I’m hitting this point with the 500,000 emails in the Enron collection, and chatting with Kwin from Oblong made me realize what a dirty little industry secret this optimization is.

Normalization means storing any fact in just one location in your database, so that you avoid update anomalies where you have conflicting information in multiple places. For instance, if you have someone’s address stored in several tables, you might forget and only update one of them when it changed, but still be using their old address for queries relying on the other tables. The downside to normalization is that many queries require joins to fetch all the information you need, and these can be slow on large data sets.

Denormalization means duplicating facts across different tables so that reads can avoid joins and so run much faster. It’s dangerous because you lose the safety net of automatically robust updates, and you have to make your data-writing code more complex. For my work on mail, and for most web services, the real time is spent on reading data, which is why it’s such an appealing optimization.

It’s actually just another form of caching, on the memory-intensive end of the classic performance/memory tradeoff spectrum. Memcached is another layer of caching that works well if you’ve got a lot of repeated queries, though again it complicates the update logic. Indexing within a database is another form of caching frequently needed data, though that’s handled behind the scenes for you.

There’s some fascinating case studies out there on how sites like Ebay and Flickr have broken all the old rules to get the performance they need. Google’s BigTable doesn’t specify anything about normalization, but the fact that it’s a simple map between keys and values, with no complex queries possible, makes it very tempting to duplicate your data with keys for the common read operations.

How to write an Ajax update function with PHP

Fetch
Photo by Bored-Now

I’ve been writing a lot of Ajax code to request some information from a server, and then update an element on the page with the returned HTML. The basic XMLHttpRequest code to do this is pretty simple, but I’ve specialized the code to do a couple of common things. First, it always replaces the HTML of the element with the ID given in $replacename, and it takes in a Javascript variable name so you can dynamically alter the URL parameters that are passed in. The second part is really useful when you want a client-side event to trigger the fetch, you can write <select onchange="yourajaxfunction(this.value);"> in a menu, and then define the values in each menu item. Here’s the PHP code for the function body:

function add_ajax_fetch_script($fetchurl, $parametersjsvar, $replacename)
{
?>
    var xhr;
    try
    {
        xhr = new ActiveXObject(‘Msxml2.XMLHTTP’);
    }
    catch (e)
    {
        try
        {
            xhr = new ActiveXObject(‘Microsoft.XMLHTTP’);
        }
        catch (e2)
        {
            try
            {
                xhr = new XMLHttpRequest();
            }
            catch (e3)
            {
                xhr = false;
            }
        }
    }

    xhr.onreadystatechange  = function()
    {
        if(xhr.readyState  == 4)
        {
            if(xhr.status  == 200)
                document.getElementById("<?=$replacename?>").innerHTML = xhr.responseText;
            else
                document.getElementById("<?=$replacename?>").innerHTML = "Error code " + xhr.status;
        }
    };

    xhr.open("GET", "<?=$fetchurl?>"+<?=$parametersjsvar?>,  true);
    xhr.send(null);

<?php
}

To use this code, you’d write out the signature and name of your Javascript function, call add_ajax_fetch_script() and then terminate the JS function with a closing curly brace. Eg:

function yourajaxfunction(urlsuffix)
{
<?php
add_ajax_fetch_script("http://someurl.com&quot;, "urlsuffix", "someelementid");
?>
}

How to set up Linux for web development through Parallels on OS X

Redhat
Photo by CarlosLuis

I wanted a setup for web development that matched my production server, but let me do local development on my MacBook Pro. I’m a big fan of Parallels running Windows, so I set out to get Red Hat Fedora server running too. My requirements were that I should easily be able to install extensions that aren’t standard on OS X PHP like IMAP and GD, and that I could save files to my local drive and immediately run them through the server without having to copy anything.

Getting started with Fedora was painless thanks to this ready-made Parallels disk image. I downloaded that and it loaded immediately with no setup required. I then ran yum to update the system to the latest patches, and I was in business. I had to remove yum-updatesd before I could open any of the desktop software installer applications, but once that was working I could run the Add Software application. There I chose the parts I needed, like mysql PHP and other random web development additions.

After that was going, I created a test.php file containing <?php phpinfo(); ?> and placed it in /var/www/html/. Loading up Firefox inside Fedora and pointing it to http://localhost/test.php gave me the expected information dump. Everything was going smoothly, so I should have known there was trouble in store.

The only remaining part was adding a link back to my OS X filesystem within Linux, so that Apache could access my files without having to do any copying. Parallels offers a great bridge between the Windows and mac file stores, but I couldn’t find anything that easy for Fedora. What I did run across was sshfs, which uses fuse and ssh to create a virtual folder inside Linux that points back to a directory on a system accessed through the network.

I went through all the steps to get that set up, but spent a very long time getting 403 Permission Denied errors every time I tried to access OS X files through Apache. After a lot of hair-pulling, I figured out how to make everything play nicely together. It involves loosening the permissions model, so I don’t recommend doing this on a production server for security reasons, but it should be fine for local development. Here’s the steps:

  • On Linux, make sure you have sshfs, and you’ve added your current Linux user to the fuse group with su -c ‘gpasswd -a <username> fuse’
  • Again on Fedora, get fuse running with service fuse start and add it to the startup sequence with echo ‘service fuse start’ >> /etc/rc.local
  • On OS X, go to Preferences->Sharing and turn on remote login. Make a note of the IP number it displays on that window.
  • To test the remote login, in a Linux terminal window do ssh <mac user name>@<mac IP number>, eg ssh petewarden@10.0.1.196 . If this doesn’t work, you’ll need to stop and check the IP number and your Parallels network setup.
  • Now you can try to set up the filesystem connection through SSH. On your Fedora terminal, type sshfs -o allow_other,default_permissions <mac user name>@<mac IP number>:<Path to your mac folder> /var/www/html/testbed , eg sshfs -o allow_other,default_permissions petewarden@10.0.1.196:/Users/petewarden/Sites/testbed /var/www/html/testbed . The magic bit here are the extra options for allowing other users to access the folder. Without these, the apache user won’t be able to read the files and you’ll get the 403 errors. Be warned though, this is necessary to fix them, but you’ll need to follow the steps below to remove all the permission problems.
  • Because the files show up with a user ID and group ID that’s unknown to the Linux filesystem, Apache’s strict security settings will refuse to display them. To get around this, you need to make the security settings less restrictive. First you’ll need to disable Security Enhanced Linux, aka SELinux. To do this through the GUI, go to the Security preferences and click the SELinux tab. Then choose "Disabled" from the dropdown menu. You could also do this on a per-program basis, but I wanted to keep it simple.
  • Next you need to remove suEXEC, another security feature of Apache. To do this just move the file itself, on my system at /usr/sbin/suexec to another location eg mv /usr/sbin/suexec /usr/bin/suexec_disabled
  • Finally restart apache with service restart httpd and try navigating to one of your pages. With any luck you’ll now be able to save from OS X and immediately see them in Firefox within Fedora.

One of my main reasons for this setup was to easily install extensions. After going through these steps I was able to just run yum install php-gd to get the gd graphics library, a project that had previously taken me hours of fiddling on OS X even with fink.

Update: I’m now using the Parallels Linux instance directly from OS X. To do that I had to enable HTTP in the firewall settings on Fedora, and then ran ifconfig to work out the IP address that it had acquired. After that I can navigate to http://x.x.x.x/ in my OS X copy of Firefox and access my files, without having to ever switch to the Linux desktop.