Queuing up commands in unix

Queue

I’m trying to lay down some decent foundations for processing images for a few thousand users from a single server. Because of the thrashing problem, one of the building blocks has to be a single queue for the CPU intensive image processing commands. Thrashing is when you run out of physical RAM, and you end up spending the majority of your time swapping virtual memory blocks on and off disk. This usually happens when there’s multiple memory-hungry processes competing, and as they’re time-sliced, they repeatedly push each other’s memory out of RAM, and pull in their own back. The result is that the actually take a lot longer to all finish than if they’d each run in sequence.

Well, this is an old, old problem, so I was hopeful that there’d be a tried and trusted unix mechanism to deal with it. The simplest way to avoid thrashing for my case is to make sure such operations do run one at a time, by keeping a single queue, and only running the next in the queue after the previous one is complete.

After some digging, I did discover batch, and it looked very promising. I ran into a few wrinkles. It expected a script file on disk as an input, rather than the dynamic commands I’d be feeding it, but I was able to avoid that by piping the output of an echo "<command to execute>" into the batch command. More serious was error reporting; it will sendmail the results of batch executions, but there doesn’t seem to be any other way of accessing that data. My simple tests worked, judging by the results I saw on disk, but some of the more complex ones failed, and I haven’t been able to work out why yet, because none of my server accounts have a mail set up.

Once I’ve got that working, my plan is to put all image processing commands into the queue, rather than executing them synchronously with the PHP page generation as I am at the moment. Then I’ll need to have a timeout call in the HTML itself, to refresh the page frequently until all the images have been completed. With all that in place, I should be able to show users some evidence that things are happening even when the server’s under a heavy load, rather than the blank page they see at the moment.

Funhouse Photo User Count: 268, active 78. Good to see it still climbing, and I’m not losing too many people. The active count is down, but that’s still around 40% of the users at the time the count was taken, so a good base to build on.

2 responses

  1. Hey Pete, one idea to deal seamlessly with asynchronously loading images with high latency in HTML docs is to use the onerror handler in the img tag. You can put a setTimeout in the handler that changes the src attribute to force an image refresh, usually appending some arbitrary get string to ensure that a persistent browser cache doesn’t interfere with your plans. The onerror will keep getting tripped as long as the server keeps returning a 404, so each image loops on its own until it loads properly.
    If you want to ensure a fresh source image you can make the load of the php page hosting the images kick off the batch process that creates the images on disk and the onerror handler just loops every second replacing the src attribute after a one second pause until the onerror handler stops getting tripped again.
    Or if you really want to conserve resources, you can set up the first trip of the onerror handler to ajax the php script that uses the ImageMagick to create the dynamic image.
    This pattern works requires your ImageMagick script to write out the image to a predictable disk location, as opposed to a printing a png mime-type directly as the output.
    The beauty of the pattern is you can let’s the browser+web server’s built in 404 detection drive the php ImageMagick code optimally, so no image is calculated twice, and there’s no unnecessary delay between the user and the image.
    Good luck, and I’m looking forward to hearing more about your progress.
    -Stephen

  2. Thanks for all that Stephen, you’ve obviously thought this through pretty deeply!
    Part of my current approach is to encode the source URL and the image operations into the result image URL, so it’s both at a unique location, and is a recipe for producing the final image (eg go and fetch the image at URL foo, and then apply operation with the name bar). The image is created if it’s not already present, but if it’s there, the pre-baked image is served up.
    One of the ideas I have been thinking about for distributing the image processing amongst multiple servers is having the processing be triggered directly by an image fetch, but I need to develop that thought a bit more…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: