How to speed up key/value database operations using a RAM cache

Ram
Photo by Olduser

In my previous post I gave raw timings for a typical analysis job on top of various key/value stores. In practice I use another trick to speed up these sort of processes; caching values in RAM for the duration of a run and then writing them all out to the store in one go at the end. This helps performance because there's some locality in the rows I'm accessing, so it's worth keeping previously-fetched or written data in memory and reducing the amount of disk IO needed. If this helps you will depend on your database usage patterns, but I've found it invaluable for my analysis of very large data sets.

The way I do this is by creating a PHP associative array mapping keys to values, populating it as I fetch from the store, and delaying writes until a final flushToDisk() call at the end of the script. This is very inelegant, it means you have to watch PHP's memory usage since the default 32MB max is easy to hit, and ensuring that final flush call is made is error-prone. The performance boost is worth it though, here's the figures using the same test as before, but with the cache enabled:

Null: 22.1s

Ram: 23.9s

Redis domain: 27.5s

Memcache: 27.9s

Redis TCP: 29.6s

Tokyo domain: 29.9s

Mongo 31.1s

Tokyo TCP: 33.6s

MySQL: 182.9s

To run these yourself, download the PHP files and add a -r switch to enable the RAM cache, eg

time php fananalyze.php -f data.txt -s mongo -h localhost -p 27017 -r

They're all significantly faster than the original run with no caching, and Redis using domain sockets is approaching the speed with no store at all, suggesting that the store is not the bottle-neck for this test. In practice, most of my runs are with hundreds of thousands of profiles, not 10,000, and the RAM cache becomes even more of a win, though the space used expands too! I've included the code for the cache class below:

<?php

// A key value store interface that caches the read and written values in RAM,
// as a PHP associative array, and flushes them to the supplied disk-based
// store when storeToDisk() is called

require_once('keyvaluestore.php');

class RamCacheStore implements KeyValueStore
{
    public $values;
    public $store;
   
    public function __construct($store)
    {
        $this->store = $store;
        $this->values = array();
    }

    public function connect($hostname, $port)
    {
        $this->store->connect($hostname, $port);
    }
   
    public function get($key)
    {
        if (isset($this->values[$key]))
            return $this->values[$key]['value'];

        $result = $this->store->get($key);
        $this->values[$key] = array(
            'value' => $result,
            'dirty' => false,
        );
       
        return $result;
    }

    public function set($key, &$value)
    {
        $this->values[$key] = array(
            'value' => $value,
            'dirty' => true,
        );
    }
   
    public function storeToDisk()
    {
        foreach ($this->values as $key => &$info)
        {
            if ($info['dirty'])
            {
                $this->store->set($key, $info['value']);
                $info['dirty'] = false;
            }
        }
    }
   
}

?>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: