How to run CURL fetches in parallel in PHP

Brightthreads
Photo by Incurable Hippie

PHP is my workhorse language, for a lot of reasons I'll need to blog about soon. One problem I keep running into though is how serial it is, especially when it comes to making CURL calls to access web APIs. FindByEmail is a great example; I'm calling over a dozen different APIs, and I have to wait for each one to finish before I can call the next. It can easily take 20 seconds to run the script, where almost all the time is spent idly waiting for a single API call to complete.

There is a solution for this sort of problem; curl_multi_exec() lets you fire off multiple CURL requests at once. Unfortunately the interface is awful, it feels like I might as well be writing in C, which is unsurprising since it's a thin layer over the underlying C library. Typically I'm going through some inputs and fetching URLs as I go, so to speed up those sort of tasks I wrote a much simpler interface on top of the curl_multi engine. ParallelCurl lets you just specify a URL and a callback function, and handles all the mechanics of running your requests simultaneously.

To give you an idea of the possible speedup, the test script makes 100 calls to Google's search API, takes nearly two minutes without parallelization, and runs in 11 seconds with 20 requests running at once. Of course you need to be careful not to overwhelm your target server!

The code's up on github, and using it is as simple as calling:

$parallelcurl->startRequest('http://example.com', 'on_request_done', array('something'));

One response

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: