Home » Featured » Quick and Dirty PHP Caching

Quick and Dirty PHP Caching

Caching your database-driven website pages has a plethora of benefits, not the least of which being improved speed and reduced server loads. This article will explain how to set up a simple caching system, and will also address when and where caching might not be appropriate.

For me, the impetus to switch to a caching method for one of my database driven sites was sparked by Mosso, since they bill by cpu cycle, and I have one site that is, well, humongous (60k+ pages), and it happens to the highest traffic site on the account. While the database queries were all very efficient, and each page had, on average, no more than 6 queries, performance and cpu cycles would both be helped quite a lot by implementing a cache. This caching solution was a temporary fix, while we switched to a new CMS that was already using a robust caching system. It’s quick, it’s dirty, but it got the job done for the interim.

We’ll walk through how to execute a simple PHP cache, and then I’m going to explain how doing so without a little forethought will screw you right in the ear. Note that this is called a Quick and Dirty solution for a reason. There are more complex, more efficient methods available, but this covers some basics.

Using output buffering, caching pages is incredibly easy. Simply put, output buffering allows you to control when output is sent from the script. This is particularly handy if you’re using cookies or sessions or some other process that sends headers to the browser before the page loads (as anyone who has gotten those pesky “headers already sent” errors can tell you.)

Please note that this article assumes your cache files will be created in a directory called ‘cache’ – and that this cache directory must be writable by the webserver.

Please also note: the syntax highlighter was made of fail for this article and was double, sometimes triple converting HTML entities. I have fixed it a dozen times, and then every time I edit the post, I have to fix it all over again. So if you notice any funky characters that don’t look like they belong in the script snippets, they probably don’t. Let me know and I’ll fix them, yet *again*.

The basic stuff

In all its 6-lines of glory, this is actual, working caching code.

// TOP of your script
ob_start();   // start the output buffer
$cachefile ="cache/cachefile.html";
// Your normal PHP script and HTML content here
// BOTTOM of your script
$fp = fopen($cachefile, 'w'); // open the cache file for writing
fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
fclose($fp); // close the file
ob_end_flush(); // Send the output to the browser

There are, of course, two major flaws with just using the script above. First, we’re always writing to cachefile.html file, which would only be useful to you if your website was only one page. And second, notice that the script writes to the cache, but never actually retrieves the cache file – it’s still running through the whole script every time. But, this is just the beginning. That’s all there is to the actual caching part – the rest of this article will deal with the when/where of caching, but the how is that right there.

Which brings us to the next step… adding the ability to check whether or not a cache file exists, and use that instead of running through the normal script. We’re going to keep using the one-page website model for now, but I’ll get into creating cache files for different pages later.

Checking for a cache file

Creating the cache file from database-driven content is easy, as we’ve seen – but it’s only useful if we actually check if a cache file exists and serve that instead of live database output. Using he modification below, we are checking to see if a cache file already exists and if it does, include it and exit instead of running through the normal PHP script.

// TOP of your script
ob_start();   // start the output buffer
$cachefile = 'cache/cachefile.html';
if (file_exists($cachefile)) {
// the page has been cached from an earlier request
include($cachefile); // include the cache file
exit; // exit the script, so that the rest isn't executed
} 

This is marginally more useful, since it actually prevents the script from executing if a cache file exists, however the way this is currently written, it will include that file for an indefinite time, never actually executing your full script again. Normally, in a cache situation, we want the ability to “expire” content after a certain time, so an updated version will be displayed and cached. You could automatically force a new page cache file to be generated by setting a cron job to automatically delete your cache files every hour/day/week/whatever – or you could handle this on the script level.

Setting cache urls

In our examples, we’ve been using cache/cachefile.html as the filename for the cache file that is generated. As I mentioned, this is great if your site is only one page, but otherwise every page this script is run on will create the same cache file, so you’ll end up serving the same cached file as content for every page on your site. Not awesome.

The easiest way to create individual cache files for each specific page is to do something like this:

$cachefile = basename($_SERVER['SCRIPT_URI']);

This takes the unique url of the page requested and and uses that as the cached filename.

But, there’s a gotcha. If your site uses pages that pass GET requests, such as a search page, etc – the SCRIPT_URI won’t see that as part of the url, so once someone does a search, all subsequent search requests will serve that same cached file unless you make the file name unique to each GET request.

In other words, if your search is located at yoursite.com/search.php, and when someone performs a search, the url looks something like yoursite.com/search.php?q=foo, PHP sees that url as search.php, regardless of the query string. So basically, it will break your search, big time.

NOTE: It may not be worth caching every GET request if your site doesn’t get a lot of traffic to files that use this. Or if disk space is a concern. Since there are a potentially unlimited number of GET strings that could be passed to your script (even bogus ones that don’t return valid results on your site), you may want to evaluate whether or not caching search pages is appropriate. In my case, it was – but it may not be for everyone. At the very least, if you opt to do this, make sure you’ve got some sanity checking in there so some asshole with a grudge can’t just sit there creating new, bogus query strings to eat up your disk space.

If you decide to cache query string data, you could do something like this:

$cachefile = basename($_SERVER['SCRIPT_URI']);
if ($_SERVER['QUERY_STRING']!='') {
$cachefile .= '_'.base64_encode($_SERVER['QUERY_STRING']);
}

This basically just grabs the file name, checks to see if there is any GET data passed and if there it, it generates a url-safe base64-encoded sting that you can use as your cache file name.

Setting an expiration

You have three basic options for expiring your cache:

  1. Set up a cron job to automatically delete all of your cache files at specified intervals
  2. Check the data source file for modification, and expire it if the source file is newer than the cache file
  3. Check the timestamp of the cache file and delete+regenerate if it is older than x

Cron Job: Setting up a cron job to delete your entire cache at specific intervals is arguably the easiest solution, but not really the most efficient, especially with very large websites. Rather than just deleting the page that’s been determined to be expired, you’re deleting (and then subsequently regenerating) a large number of files in one shot.

Data Source: Checking the data source file for modification is potentially the smartest way to handle caching, since it means the cache would never be expired if the data didn’t change. That certainly makes sense to do, since a page that hasn’t been updated doesn’t need to be regenerated, so you’re really getting the most bang for your caching buck.

The problem arises when you’re caching pages that are dynamically generated based on database records. The actual script that generates the data may not have been changed for quite some time, but the data records you’re fetching from the database may have been changed, so just checking the cache file date against the date the script was last modified will not give you what you need.

A workaround there would be that you could do a quick db query at the top of every page to find out when the record was last modified and compare that to the modification time on the cache file, but that means that every page, even your cached pages, will be performing a database hit on every page load. This may be perfectly acceptable to you, but it’s something to consider. Perhaps a better way of handling this would be to modify the content management system by which you publish content, so that the cache file is only deleted when you publish edits. This method would be the most thorough and efficient way, since your cache file would only be updated when you update something, and would be left to be served statically unless the data has changed. Although that’s outside the scope of this quick and dirty article, extending the code below to accommodate that wouldn’t take much work.

Cache Timestamp: We’re going to address the third option, since it’s the most commonly used and would serve as the foundation for the second option anyway.

// TOP of your script
$cachefile = basename($_SERVER['SCRIPT_URI']);
$cachetime = 120 * 60; // 2 hours
// Serve from the cache if it is younger than $cachetime
if (file_exists($cachefile) && (time() - $cachetime < filemtime($cachefile))) {
include($cachefile);
echo "<!-- Cached ".date('jS F Y H:i', filemtime($cachefile))." -->";
exit;
}
ob_start(); // start the output buffer 

This script gets the file name, sets a cache time, checks to see if the cache file exists, and if it does, it checks if the cache file is younger than the cachetime. If the cache is still valid, it includes the file and exists the script. If not, it will continue on to execute the script and create a new cache file. It also tacks on a comment at the very end of the cache file that tells you when the file was cached. This can be helpful in debugging, and helping you verify that the page you’re seeing is in fact a cached version, not a live version. (You can see this in action if you view the source of this page and look down at the very bottom of the source code.)

The script, the whole script and nothing but the script

Put all together, this is what our caching script looks like:

// TOP of your script
$cachefile = 'cache/'.basename($_SERVER['SCRIPT_URI']);
$cachetime = 120 * 60; // 2 hours
// Serve from the cache if it is younger than $cachetime
if (file_exists($cachefile) && (time() - $cachetime < filemtime($cachefile))) {
include($cachefile);
echo "<!-- Cached ".date('jS F Y H:i', filemtime($cachefile))." -->";
exit;
}
ob_start(); // start the output buffer
// Your normal PHP script and HTML content here
// BOTTOM of your script
$fp = fopen($cachefile, 'w'); // open the cache file for writing
fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
fclose($fp); // close the file
ob_end_flush(); // Send the output to the browser

Gee… Oh… Cache challenges

I know. Going to hell for that awful joke. Moving on…

Caching is a great way to speed things up on dynamic sites and save on server resources – however if your site has any kind of more advanced features, you need to be selective about where you apply it. The cache is not smart, so you have to be. Ideally, you’ll be building your caching system into the site as you develop the site and the content administration system – but if you end up having to add caching later, you really have to think everything through.

Examples of things that WILL break if you use caching unless you specifically work around them:

User login: “Welcome, user” logged in functionality (the first user who logs in will create the cache, and everyone else logging in will see their name instead of their own!

Voting: If you have any kind of voting functionality built into your pages, new votes will not be captured and old ratings will be displayed

Anything requiring a POST request: Same as above the first person submitting the form will get correct results, but anyone submitting it after them will get the first user’s cached results.

Geo-IP lookup: If you’re displaying geographically relevant information to the user based on their IP address, the same rules apply. The first user hitting your site will create the cache file and everyone else accessing it will see their geographic results instead of their own.

And so on…

That said, all hope is not lost. Depending on the situation and what functionality I’m trying to preserve, I usually handle this one of two ways:

Only serve cached files to users who are NOT logged in. This takes care of a lot of the issues right there – if a user has a profile preferences page, email preferences page, or whatever – all of these will be cached by the first user accessing them. The easy way around this is simply to serve live data to the user if they are logged in, cached pages if they are not. This will reduce the effectiveness of your caching system to some degree, but many users never both logging in, so you’re still getting a significant savings. (If 90% of your site’s content is only available to logged-in users, you may need to rethink your caching system though.)

Use AJAX. This is one of the few situations where AJAX really can be 100% appropriate. Since AJAX requests are asynchronous and are not cached, this is a great solution for your voting script situations. Mind you, you should make sure your solution degrades gracefully for users who have javascript turned off.

Only cache parts of your page instead of the whole thing. With a little more work, you can set up your caching system to only cache parts of your page, and not the entire page. This may reduce the effectiveness of the caching system, but may be necessary depending on your situation.

One final gotcha

You should consider a graceful way of handling database failures as well. Say you have your cache time set for 3 days – a long time by some standards, but not at all unreasonable if you have disk space to spare and your content doesn’t update that often. If your database throws an error when your cache file is being regenerated, that error will continue to be displayed for 3 days, even if the database error has been corrected. You should consider how to handle that gracefully, even if its a cheap and dirty method. For example, you could set up a website monitoring service that notifies you when your content has changed. If your page isn’t loading properly, you’ll be notified by text or email, and that will give you the opportunity to fix the error and manually blow out your cache so it can regenerate.

A note for WordPress users

If you’re using WordPress and are looking for a way to reduce server load and speed your blog up, you’re in luck. WP-Super Cache is an unparalleled caching solution for WordPress that is basically plug-and-play, no coding required.

Caching Libraries

Thanks to the fabulous comments to this article (and I genuinely do mean that), I am reminded to remind you that this method is exactly what it says it is - quick and dirty – and it makes NO attempt to be the best solution to your caching needs. It is as much an exercise in considering where caching is appropriate (and inappropriate) as much as it is anything else.

For more sophisticated (and certainly more elegant) solutions, check out PEAR’s Cache_Lite , xCache (lighthttpd), eAccelerator and Zend_Cache, and read up on APC and memcached.

Advertisement

Themeforest

Advertisement

Themeforest

flattr this!

About snipe

I’m a tech geek/dev/infosec-nerd/scuba diver/blacksmith/sword-fighter/crime fighter/ENTP/warcrafter/activist, and the former CTO and CSO at a business innovation agency in New York City. Tweet at me @snipeyhead or read more...
  • Jon

    Me again, follow up. I should have really Googled before commenting because I've got the answer. Use readfile() in place of include(). This outputs whatever is in the file as-is, no processing is done on the content (which I'm guessing is also going to be marginally quicker).

    Using readfile() instead of include() also prevents parsing problems that I had, where certain unexpected ASCII characters where causing the error:

    Warning: Unexpected character in input: '' (ASCII=16) state=1 in /some/directory/yoursite/www.example.com/cache/cachefile.html on line 255

    readfile() sidesteps this problem and lets the browser deal with the character, not PHP.

    Sorry if I went on a bit there. In a bit, Jon.

  • http://www.snipe.net snipe

    That's definitely an option, but you should still consider escaping HTML and other potentially harmful scripting before outputting it to the browser. If you're storing this in a db, you might also consider using mysql_real_escape_string():
    http://php.net/manual/en/function.mysql-real-es

  • http://codestips.com/ Lucas

    Nice tutor!!

  • http://codestips.com/ Lucas

    Nice tutor!!

  • Pingback: how to control caching in php

  • http://tlrobinson.net/ tlrobinson

    I took this idea one step further and made it into a re-usable script that can be included at the top of any .php to cache the output of that page: http://github.com/tlrobinson/cacheme.php

  • Pingback: CSS Minification on the Fly | Shiny Blog

  • http://www.elnashra.com/ dany

    Really nice article, I've written a very similar script this morning.
    I am the developer of a news website that receives more than 100k visitors a day and for such websites caching is a must but has it's down side.
    The best solution for such websites is to use tight expiry time not more than 2 minutes max.

  • Pingback: On Demand CSS Minification Trick | Naked Trout

  • mdelannoy

    a little enhancement for those interested:
    <?
    //begining of page
    // starting page load time
    $time_start = microtime(true);

    if($_SERVER['SERVER_NAME']== “localhost”)
    $cachedir = “development_server_cache”;
    else
    $cachedir = “production_server_cache”;

    $cachefile = $cachedir . $_SERVER['PHP_SELF'];

    if (
    file_exists($cachefile)
    &&
    (filemtime($_SERVER['DOCUMENT_ROOT'] . $_SERVER['PHP_SELF']) < filemtime($cachefile))
    &&
    !isset($_GET["nocache"])
    ) {
    include($cachefile);
    echo “<!– Cached “.date('jS F Y H:i', filemtime($cachefile)).” –>n”;
    $time_end = microtime(true);
    $time = $time_end – $time_start;
    echo “<!– took me $time seconds to load page–>n”;
    exit;
    }
    ob_start(); // start the output buffer
    ?>
    <?
    //end of cache
    $time_end = microtime(true);
    $time = $time_end – $time_start;
    echo “<!– took me $time seconds to server uncached page –>n”;

    if(!file_exists(dirname($cachefile)))
    mkdir(dirname($cachefile), 0777, TRUE);

    $fp = fopen($cachefile, 'w'); // open the cache file for writing
    fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
    fclose($fp); // close the file
    ob_end_flush(); // Send the output to the browser
    ?>

  • xcoder

    I build the class >> http://www.copypastecode.com/20949/

    All you need to is include it in cache::load() in the page top and cache::create() in page bottom =)

  • Pingback: PHP Caching Method

  • Pingback: 30+ PHP Best Practices for Beginners | php Snake Portfolio

  • Pingback: PHP Resources | The Michaeldon Roareth! - Mad rantings of an undiscovered dinosaur

  • Aussiedude

    Just incase it helps anyone, I’ve been doing my homework on this for a while now, so I can create an effective template/cache system for my site I’m writing right now (in PHP from scratch, it’s really more of a personal learning experience, teaching myself PHP while making a site at the same time).

    I’ve designed my template system to build the page from small templated components that are combined together to form the whole page. Such as one template for how to display dialog messages, comments, profiles, menu’s, etc.

    That way, if the page being requested was requested before, and the output is exactly the same for both, it’ll use a cached version of the entire page.

    However, if not, it’ll use cached versions of certain parts of the page, and for the rest it’ll just make it up as it’s needed.

    (However the downside to this is, for whole page caching situations, it’ll still need to build the little components first before it builds the entire thing, but at least it’ll be making the little components from the cache. Optimally, it’d be best if it could go just straight to the full page cache, but it’s better than nothing for now until I can find a solution for that.)

    My templates just use PHP and the output buffer, I didn’t write my own tag system or anything, so all my templates are just php files in a template folder I load using my website’s main “engine” (if it could be called that?). The variables for the template are just based through in a single associative array.

    Which makes it easy to tell if a cached version of the template can be used or not. I’ve got a folder called ‘cache’, where versions of each template are stored, with the name of the php file, followed by a short hash of the array’s contents, and unix timestamp.

    Example of a cache file for a template called ‘poll-view.php’:
    poll-view.d81cfdaaa2.1281634990.php

    (Format: file-name.10-character-hash.unix-time-stamp.php)

    If the same array values are passed to the same template, the output will be the same every time. Hence, if two hashes of the array are the same then it’s highly likely the two hashed arrays have the same contents. (As for hashing the array, I just implode it, use md5, then grab the first 10 characters).

    As for how long to keep it for, I also included as you can see, a unix time stamp in the cache files name. When a new cache file needs to be added, a check is done to see if the folder has too much in it, if so, the oldest cache files are removed, until there is enough space.

    Though I haven’t finished that part of the code. Not sure if I’ll make it remove a template when the folder is above a certain number of MB’s, or just based on file count. MB’s would be better I’m guessing, but it depends on how easy it is to do that over counting the number of files in the folder.

    Even still, that does mean for each template component, it’s going to do a fair bit of work to add that cache file. So I’m thinking that might be something for a cron job to do..

    The only real benefit I see myself getting from it all is if I store a lot of cached templates, and if most of the requests use existing cached versions of the templates. If not, I think it might actually make my website slower. But I’m designing it to have an on/off feature, so I can experiment, see if it speeds up the website any, and if so, I’ll use it.

    So, if you’re currently making a website that needs caching right now, hope that helps! :3

    (Sorry for the length of the post and the typos)

  • http://sven.webiny.com Sven

    Cacheing and reading from PHP is not bad, but using more advanced methods and going around PHP can bring you a much better performance. I did a benchmark on my blog there I compared reading cache from PHP and reading cache from Apache, you can read more about it here:
    http://sven.webiny.com/advanced-cache-mechanism-using-php-cpp-and-apache/

  • Anonymous

    Wow, you were inspired a little too much from the original article it seems;

    http://www.theukwebdesigncompany.com/articles/php-caching.php

    • http://www.snipe.net snipe

      Dear “[email protected]” – I’ve never seen that article before, and the two are nothing like each other. Do they cover the same things? Absolutely – it’s a freaking caching article – output buffering is sort of going to come up, and it’s going to be handled the same way every time. All ob caching articles are going to cover similar things because it’s done the same way every time. Duh.

  • http://www.facebook.com/teguh.budimulia Teguh Budimulia

    Awesome. I’ll take the code :)

  • Youwontgetmyemail

    Hey, tried it – really cool!
    I have an addition to your list of caching libraries: Zoocache is very easy to integrate into an application and you can choose a method to store your contents in. Also nice: It provides a blacklist using RegExp and you can write your own function to generate the storage key. This makes it possible to deliver different versions of the page for e.g. different webbrowsers! Hosted at: http://gihub.com/marcelklehr/zoocache.php

    • http://www.snipe.net snipe

      Great, I’ll definitely check that out, thanks!

      Side note, I really hate when people use fake email addresses to leave comments. If you irrationally think I’m going to do something bad with your email address, you shouldn’t bother posting here, since I’m obviously a piece of shit that doesn’t deserve your time – or use one of the oauth options like twitter, so I never even see your email. I find using a fake one pointless and disrespectful. And more frustrating, you won’t even have the option of being notified of this reply.

  • http://www.valpocreative.com ValpoCreative

    Great tutorial, I been trying to implement a nice simple cache system server side. Seems your the only developer that bothers to explain how to implement. 

  • Jordan Moore

    Does my job perfectly. Had an extremely simple script which pulled some (potentially thousands) of entries out of a database at a time, but the database wasn’t frequently updated. With a bit of tweaking I’ll be able to serve a cache 100% of the time and only update the cache when the database is updated – thanks!

  • http://www.orphanstear.org/ DangerNerd

    Hello there!

    Thanks for making this idea so accessible.   For some reason, even though I knew this could be done, the application of it for simple caching never even popped into my head!   Simple.  Obvious.  Easy.

    The quoting in your code example has been mangled (you asked us to mention if that happened again) but other than that, it all went well.

    I used this as a starting point and then built automatic directory structure creation into it.   No fun having 600,000+ files in one directory.

    Ended up using readfile() instead of include, and noticed a very sizable performance gain, at least in my setup.  Your concern about escaping the file could be mitigated by the fact that everything that is being output into the file should already be sanitized when written.   Upon re-insertion of that same pre-sanitized data, everything should be ok.

    I also applied this to cache only certain parts of a script’s output.  This way the heavy lifting portion is cached on some pages that have a lot of per-user customization,  without having to cache a different version for every user.

    Thanks again for pointing me in the right direction.

  • jeev

    Been looking over the entire internet for a clear explination like this one! Kudoos!

  • João Verissimo Ribeiro