Generalized PHP Caching Solution

I recently became responsible for maintaining service level of a Joomla! website. The
site is hosted on a CentOS VPS instance with NGINX and PHP running with FastCGI. The site had been previously hosted on a shared server, but was forced to upgrade to a VPS when the load expanded.

Unfortunately, even on the VPS, the site was getting crushed by traffic. It was taking upwards of 10 seconds to render a page through Joomla!, and the site wasn’t even under *that* heavy of a load (approx 1200 unique visitors per day). I don’t know if there is something wrong with the Joomla instance or if the server has actually just reached the limit of what it can handle.

Obviously 10 seconds to render a page is unacceptable, so this problem needed to be solved one way or another. Faced with a *live* website that needed to have performance fixed asap (like yesterday) I didn’t have the luxury of experimenting with configuration variables in Nginx and just hoping. I needed to do something quickly that would have a high probability of success.

I chose to write a generalized page cache in PHP.

When implementing a caching strategy you really need to be aware of how the site works so that you can be sure that you’re not breaking things. Luckily, this site is pretty static. Only a few select people need to log in and add content. The rest of the public just needs to view the content. This is a good situation, because it means I don’t have to fiddle with a separate cache for every user. I could use the cache for the public, and skip the cache for users who are logged in (or trying to log in).

Note that there are a number of plugins available for Joomla that might help improve performance, but, not being a Joomla expert, I decided to treat Joomla as a black box, and implement the cache in front of Joomla. All that was necessary was to add small hook to the beginning of the index.php file that checks the file-system cache for a cached version of the currently requested page. If the page is there, then we return the page and stop execution without even loading Joomla. If the page is not there, then we register a shutdown hook and allow the request to continue to Joomla. At the end of the request, the shutdown hook will run and save the generated page to the cache, as well as outputting it to the client.

The code goes roughly as follows:

The call to ob_start() causes PHP to write all output to a buffer rather than to the client. When script execution completes, it will pass this buffered content to the flushBuffer() function, that we define. This flushBuffer() function is responsible for saving the content to the cache so that next time it can be retrieved there.

Now, let’s look at some of the secret sauce inside the checkCache() and flushBuffer() functions.

checkCache() would look something like:

My actual code includes quite a few more options than this to account for whether the user is logged in and maintaining certain headers in the cache, but this gives you an idea of the workflow.

The flushBuffer() function would look something like:

All this does is saves the buffered content to a file in the filesystem, and returns
the buffer contents so that they will be displayed to the user.

Just this tiny addition helps the server load dramatically. It will buy us some time to refine the caching strategy further.

Some small additions that you’ll likely want to make to this setup include:

  1. Allow a particular header or cookie to refresh a page on demand.
  2. Skip the cache if the user is accessing a login page or is already logged.
  3. Save certain headers of the original request, and pass them back in the cache. (Note: you should not retain cookies and other headers that could contain confidential information.. Just content-type, and other stateless headers).
  4. Adjust the expiry time as appropriate for your setup.

Unfortunately our site still has a problem. We can’t increase the expiry time any longer than 24 hours because information is being refreshed to frequently, and one change can potentially update any page in the site. This combined with the fact that there are thousands of articles on our site, means that there is a hight likelihood of cache misses. If the site is being crawled by a couple of search engines, it is still very likely that the Joomla would have to handle multiple simultaneous requests – and this could still crush the server.

One solution, whose implementation I’ll describe in the next article, is to not refresh the cache immediately when an expired page is requested, but rather to just return the expired page to the client, and queue the page for refresh. Then have a daemon run in the background to refresh one page at a time, based on the refresh queue. This will ensure that Joomla is never overloaded.

comments powered by Disqus