cache_preload

If you haven’t noticed yet, this website is powered by WordPress and speed is optimized using W3 Total Cache… Previously I was using WP Super Cache but recently decided to try W3 Total Cache as it’s feature set (minify, cdn, etc.) is looking better. For my setup it actually worked better so I kept it as my caching plugin.

But it took me a while to figure out it’s “Cache Preload” option is not working well or at least not working as I expected. “Automatically prime the page cache” was checked but pages were not in cache…

After searching a bit I found the culprit. W3 Total Cache relies on internal cron of WordPress and it’s only triggered by site activity. If your site has no visitors for a while loading the pages will be slower because: 1. W3TC will check if the current page is cache and if it’s expired. 2. Needs to clean up that expired page and re-generate a current copy. 3. Finally serve that file.

My first attempt to fix that was trying to put a real cron job for wp-cron.php but that didn’t worked well. Cron was working but W3TC still failed to prime the cache. There are couple of threads about that at the wordpress support forums.

Optimus Cache Prime (OCP) is a smart cache preloader for websites with XML sitemaps. It crawls all URLs in a given sitemap so the web server builds cached versions of the pages before visitors or search engine spiders arrive.

My next attempt was trying to use Optimus Cache Prime, a Phyton script written by Patrick Mylund Nielsen.

That was exactly what I needed… But…

My host had Python 2.4 installed and ocp.py required 2.5+

I have asked my  hosting (Inmotion) but they said they cannot change that on shared hosting and I should move to virtual private server (VPS) just for that… Yeah, sure!

Finally, I have decided to stop being lazy and created my own solution using PHP which I will be sharing here with you.

Idea

This PHP script has the same basic idea as Optimus Cache Prime and uses sitemap.xml as it’s source.

It will read the sitemap.xml file, parse the local URL’s listed in it. Then checks if the cache file for that url exists in W3TC. If the cache exists it will skip that url. But if not, it will visit the link using minimal resources causing the W3TC re-create cache for that page.

System Requirements

    • PHP5 (Required due to SimpleXML, you may change that to use with earlier versions of PHP)
    • W3 Total Cache
    • WP Super Cache (Not tested! But it should work)

Usage

    • Copy the code into a php file (warm.php) and place it in the site root where sitemap.xml exists.
    • Review/edit the configuration options inside the script.
    • Set a cron job to run that script every 5 minutes (or even every minute as the code is very easy on the system resources.)
    • Sample: */5 * * * * php -q /home/youraccount/public_html/warm.php
    • Thats it!

Features

    • Reads sitemap.xml as a file, saving a web server call.
    • Checks for the local cache file before trying to re-cache, saving resources.
    • Optionally uses priority tags in sitemap.xml
    • Configurable page limit per session, useful for larger sites.
    • Frees memory and stops executing as soon as possible to save further resources.
    • Failsafe to stop executing in case of an url or network problem.
    • New: Option to fix trailing slash cache creation problem.

Code & Download

    • Current version: Version 2.1 – 21 August 2011
    • Download warm.php as ZIP file.

License

This code is free to use, distribute, modify and study. If you modify it please keep my copyright intact. When referencing please link back to this website / post in any way e.g. direct link, credits etc. If you find this useful, please leave a comment and share using the buttons below!

<?php
// W3 TOTAL CACHE WARMER (PRELOADER) by Pixel Envision (E.Gonenc)
// Version 2.1 - 21 August 2011
 
//Configuration options
$priority = true;//Use priorities defined in sitemap.xml (true/false)
$ppi = 10;//Pages to be cached per interval
$delay = 0.5;// Delay in seconds between page checks, default is half a second
$quiet = true;// Do not output process log (true/false)
$trailing_slash = false;// Add trailing slash to URL's, that might fix cache creation problems (true/false)
 
$sitemap = "sitemap.xml";//Path to sitemap file relative to the warm.php
 
// Defaults for W3TC
$index = "_index.html";//Cache file to check
$rootp = "wp-content/w3tc/pgcache";//Root of cache
 
//Do not change anything below this line unless you know what you are doing
ignore_user_abort(TRUE);
set_time_limit(600);
 
$xml = simplexml_load_file($sitemap);
$UL=$UP=array();
foreach ($xml->url as $url_list) {
	$UL[]=$url_list->loc;
	$UP[]=$url_list->priority;
}
unset($xml);
if($priority==true) {arsort($UP,$sort_flags = SORT_NUMERIC);}
$i=0;
foreach ($UP as $key => $val) {
 
		$path=$rootp;
		$url=$UL[$key];
		$sub=explode("/",$url);
		if($sub[3]) {$path.="/".urldecode($sub[3]);}
		if($sub[4]) {$path.="/".urldecode($sub[4]);}
		if($sub[5]) {$path.="/".urldecode($sub[5]);}
		$path.="/".$index;
 
		if (file_exists($path)) {
			if($quiet!=true) {echo "Priority: ".$val." => Skipped: ".$path."\n";}
		} else {
			if($trailing_slash==true) {$url = rtrim($url,"/")."/";}
			$ch = curl_init();
			curl_setopt ($ch, CURLOPT_URL, $url);
			curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
			curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 15);
			curl_setopt ($ch, CURLOPT_HEADER, true);
			curl_setopt ($ch, CURLOPT_NOBODY, true);
			$ret = curl_exec ($ch);
			curl_close ($ch);
			if ($ret) {$i++;} else {echo "Unable to connect $url, exiting...";break;}
			usleep($delay*1000000);
			if($quiet!=true) {echo "Priority: ".$val." => Warmed: ".$path." by visiting ".$url."\n";}
		}
	if ($i < $ppi) {flush();} else {break;}
 
}
exit;
?>

Share this

94 Responses to “PHP Cache Warmer (Preloader) for W3 Total Cache”

Add Comment
  1. Hello, does this still work? I really want to set it up but the cache folder seems to be elsewhere now and it doesn’t seem to be working for me :( using newest updates

    • Hi Frantisek, to be honest I’m not sure. When W3 total cache delayed to catch up WP 3.x update I’ve moved on to WP Super Cache and haven’t looked at W3 after that…

  2. One more thing I want to mention here. In the latest version, which is currently 0.9.2.11, the path to cache folder and folder name have also been changed.

    • Hi Aamir, unfortunately I won’t be able to verify this. At some point last year W3TC had problems following WP updates (and I want to stay current due to security fixes), so I’ve moved to WP Super Cache and still using it. W3TC internal preload could be fixed in the mean time, but if it’s not should be easy to update cache warmer’s folder settings…

  3. Could you please confirm you script is still needed for the latest version of w3 total cache?? I am going to use Preload feature but not sure it will work or not.

  4. I have WordPress with Quick-Cache running but auto cache preload doesn’t work.

    I tried v2.2.3 by Kevin Dean and all I got was bunch of errors after running warm.php:

    Warning: simplexml_load_string() [function.simplexml-load-string]

    Any idea how to fix it?

    • Is this the full error string? Just a wild guess but there seems to be a problem with the xml parser, are you sure your sitemap xml is valid?

      • Here is the full error log http://pastebin.com/kkCKYpHQ
        My sitemap.xml is auto generated by Google Sitemap Generator Plugin by Arne Brachhold. I checked the validity of it by few different validators and it seems okay.

        • It looks like xml parser fails at parsing “& raquo;” entity “right-pointing double angle quotation mark”. I think if you could remove or replace those (with standard quotation mark) it might work…

  5. If you use nginx in front of apache, I highly recommend you follow this guide to be sure nginx is serving the cached file – and not apache.

    http://codex.wordpress.org/Nginx

    However, this breaks the “warm.php” file and it will always show “No Cache”.

    To fix this issue, edit this line :

    curl_setopt ($ch, CURLOPT_NOBODY, true);

    and set it to “false” – and everything will work again :)

  6. Hello:

    Have you tried “Quick Cache” plugin? It seems to be the alternative to W3TC with cache preloading working ok.

    I would really like to know your opinion on this plugin Quick Cache.

    Thanks a lot.

    • Hi Jano,

      Yes I did, actually this site currently running on Quick Cache right now. I’ve moved on from W3TC few months ago as they couldn’t keep up with WP updates. So far, I’m pretty happy with it… :)

  7. When I run W3 TOTAL CACHE WARMER 2.2.3 multiple times in a row (in cron and web), it still shows :

    No Cache: / (1.0)
    No Cache: /contact-us/ (0.6)
    No Cache: /comment-form-validation-for-wordpress/ (0.6)
    No Cache: /add-facebook-like-button-in-wordpress/ (0.6)

    … etc

    The cache does exist .. but it runs like it doesn’t exist.

    Any ideas?

    • Hi Dave,

      As it’s David’s version I’m not sure what might be the cause, we’ll need him to step in for a reply…

      • My mistake (it helps if I read the comments) – “With W3 Total Cache, you need to set the page cache to Disk:Enhanced”.

        Mine was set to Disk:Basic. It’s working great now!

        • Great to hear that! And thanks for sharing the info here, I’m sure it’ll be helpful for others too…

  8. Thanks for the script. Works great for me, though I made the following modification to avoid errors that looked like this:

    PHP Notice: Undefined offset: 4 in /[my path here]/warm.php on line 38
    PHP Notice: Undefined offset: 5 in /[my path here]/warm.php on line 39

    I changed
    if($sub[3]) {$path.=”/”.urldecode($sub[3]);}
    if($sub[4]) {$path.=”/”.urldecode($sub[4]);}
    if($sub[5]) {$path.=”/”.urldecode($sub[5]);}
    to
    if(count($sub)>3) {$path.=”/”.urldecode($sub[3]);}
    if(count($sub)>4) {$path.=”/”.urldecode($sub[4]);}
    if(count($sub)>5) {$path.=”/”.urldecode($sub[5]);}

    • Try my warm2.2.3.zip version linked further down the page. Many modifications were made and it no longer relies on the portions of the code you mention.

  9. Hi,

    When I run warm.php it seems to not detect any existing cache files and starts from the beginning again. My cache is not formatted the same way as you mention in your documentation. I just have a bunch of numbered folders and some strange named file.

    Is it different in newer versions of W3 Total Cache?

    • I’m not using the W3TC right now (trying quick cache for a while) so I’m not sure of anything changed in W3 Total Cache. Don’t you have wp-content/w3tc/pgcache” folder? If you could let me know about your path & file/folder names I might help you further…

      I would also suggest trying Kevin’s version, look for the link few posts down…

    • With W3 Total Cache, you need to set the page cache to Disk:Enhanced. Disk:Basic is the older method with the files as you describe and is not compatible with the warming script.

  10. You wouldnt happen to have a list of mobile phone browser referrers that i can add into W3TC please?

  11. Do you think or know if it’s possible to retro fit warm to joomla system – cache plugin?

    • I don’t have experience with Joomla but as long as you can configure it to cache pages by visits, yes warm should work. Only part you might need to figure out is where it checks if the page is cached or not. It’s important to not to cause extra load for already cached pages.

  12. Version 2.2.3

    • Some systems are configured with allow_url_fopen set to false which also prevents simplexml_load_file from working. simplexml_load_file has now been replaced with a Curl solution.

    Download Link:
    http://www.rhubarbproductions.com/warm/warm2.2.3.zip

  13. Hi, Thanks for this script first of all. I downloaded V2.2 and I ran it from the browser. It worked and cached pages were stored in w3tc/pgcache/ directory.

    I can directly access them via http://www.mydomain.com/wp-content/w3tc/pgcache/samplepage/_index.html

    But when I access the page http://www.mydomain.com/samplepage the source code shows:
    #########
    Engine: disk: enhanced
    Cache key: _index.html_gzip
    Caching: enabled
    Status: not cached
    #########

    It always shows Status as not cached. And also the cache key is showing as _index.html_gzip instead of _index.html.

    when I tried accessing directly http://www.mydomain.com/wp-content/w3tc/pgcache/samplepage/_index.html_gzip then it shows a 404 error.

    Can you please confirm me whether it is serving cached pages or not (status is always not cached) while cached files are created in w3tc directory after running warm.php. Is it having to do something with my gzip setting (the cache key). If so where can I modify it.

    Thanks

    • Sounds more like a W3 Total Cache discussion than the warm script since all that does is trigger your cache like a regular web page hit.

      _index.html and _index.html_gzip are both generated by w3 total cache. Seeing a gzip version is normal.

      Looks like you have the debug output turned on in the plugin. Fine for testing, but can slow things down once you’re satisfied and should be off most of the time. When running normally, the source of a cached file should end with a “Served from:” line that provides the date and time. If it’s older than the current time and the time is the same through reloading the page, then it’s serving cached files.

      Alternatively, one thing definitive to check is to look in your cache folder and if the date modified of a file updates each time you load the same page then something is wrong… Possibly a conflict with the theme or plugin. In those cases, you may need to try Quick Cache or WP Supercache in PHP mode instead.

      • Thanks got it working. I have one problem though. My WP theme uses additional stylesheets when it detect browsers like Firefox, Chrome extra to add some CSS3 effects – shadow and rounded corners. With your plugin I wanted to access the sitemap using Firefox or Chrome or any webkit browser as useragent so that additonal stylesheet is also added into cache and show the effects.

        I tried changing the user agent in your script to-
        curl_setopt ($ch, CURLOPT_USERAGENT, ‘Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4′);
        curl_setopt($ch, CURLOPT_USERAGENT, “Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3″);

        But still it doesn’t work as needed. Can you suggest me anything. I tried changing double quotes to single quotes etc.

        Thanks

        • Problem with that is caching delivers the same output for all browsers. I don’t think W3 Total Cache has a way around that at the moment. I know Quick Cache can do it with it’s SALT option.

          I’d try my best to not deliver extra css that way. CSS3 items are ignored by browsers that don’t support them, so unless you overriding css, just leave the CSS3 items in there for everyone so the same stylesheet is delivered to all. It should be possible to get the latest versions of the main browsers working off of the same stylesheets.

  14. Version 2.2.2

    • Reorganized settings so all important options are grouped at top of file.

    • Clarified some of the instructions.

    • For Cron and Command Line use, if warm.php has not been configured from using http://www.example.com, PHP / XML errors are suppressed and an instructional error message is provided instead.

    • Miscellaneous minor bug fixes.

    Download Link:

    http://www.rhubarbproductions.com/warm/warm2.2.2.zip

    • Hello,
      today I received email from my hosting company:
      you have reached the limits of the resources available for your account.
      Memory resources were limited for your site.
      Your CPU usage was at 96% out of 100% – so it can be limited soon.
      does w3 total cache or warm use a lot of Memory and CPU,what setting can lower the usage of memory and cpu,I only run warm in browser right now.
      thank you!

      • Hi Judy,

        I’ll let Kevin reply that as I’m still unable to check his version throughly…

        But if you don’t need it’s extended features, you may try my version (2.1) to test if it’s caused by the warmer or W3TC itself…

      • I don’t think warm would be enough to cause an issue. It can cause a brief higher cpu usage when you warm many pages rapidly since it’s the same as a browser viewing those pages rapidly. If the pages are skipped, it shouldn’t be an issue. You could try increasing the delay between page loads.

        What’s probably more likely is how you have W3 Total Cache configured. It has options for using various memory based methods of caching that could potentially use a lot of server memory that you may not have available. Looking at your home page it looks like w3tc is set to use disk methods. If you’re on a shared / virtual server it’s likely that you should disable database and object caching since it may actually be slower. For example, MYSQL loads databases into memory for fast access, but if you use disk caching for the database you’re removing that advantage if the disk is not quick to respond, which is likely. Using just the disk based Page Cache should be lower impact on the server once the pages are cached than loading the pages without the cache. It’s a basic HTML load versus HTML / PHP / Database load without the cache.

        You should monitor the cpu and memory usage via top on the command line while you’re running warm or monitoring other traffic on the site to observe the impact.

        Also, useful information regarding w3tc can be gleaned in the plugin forum: http://wordpress.org/tags/w3-total-cache?forum_id=10

        • Thank you for reply,yes it’s not caused by warm,when I don’t use warm,it still reach momory limit,I have 14 plugins installed,I have not idea which one cause this,I asked the hosting company,they haven’t reply now.

          • For stats about your plugins you can install the following plugin:

            http://wordpress.org/extend/plugins/p3-profiler/

            This plugin can be disabled when you’re not running tests so it doesn’t contribute to any plugin impact on a normal basis.

            Once pages are cached the plugin hit would not be a factor for those page loads, only the original caching of them.

            You may just be seeing a simultaneous traffic increase on your server. Each apache process uses a bunch of memory and could memory could run out if too many processes are launched before MaxClients is reached for Apache. Also, other processes running could be a factor as well.

            Just some thoughts.

          • This plugin could prove useful as well in observing memory usage for non-cached pages.

            http://wordpress.org/extend/plugins/memory-viewer/

  15. Hi, Thank you for your script,when I run cron job,I got this,anything wrong with my setting?
    Cannot bind/listen socket – [2] No such file or directory.
    Couldn’t create FastCGI listen socket on port lic_html/warm.php

    • Hi Judy,

      That looks like a server side setup or a cron path problem…

      I believe “lic_html/warm.php” should read “public_html/warm.php”, right?

  16. I’ve made some updates to warm.php for the following:

    Changelog

    Version 2.2.1 – 8 March 2012 by Rhubarb Productions (K.Dean) http://www.rhubarbproductions.com

    • Accelerated display of log even more on systems that gzip compression was interferring with quick flushing of the output buffer.

    • Will now not show the Priorty () if no priority is set.

    • Added HTML header and footer to browser output so the page can have a title and be further styled if desired.

    • Added User Agent so Warm can be identified in site logs. User Agent is also required to generate Quick Cache files.

    See previous post for 2.2 changes.

    Download Link:

    http://www.rhubarbproductions.com/warm/warm2.2.1.zip

    • Slight tweak to Warm 2.2.1

      http://www.rhubarbproductions.com/warm/warm2.2.1.zip

      Instead of a manual setting for disabling gzip compression it now will do it automatically if apache_setenv() is enabled on their server.

      On servers that do not support the automatic disabling of gzip encoding via apache_setenv(), the output log will not appear until much later or at the end of processing in the browser.

      On servers that support htaccess, try adding the following to the site’s .htaccess file to disable gzip for warm and the output log should appear quickly.

      SetEnvIfNoCase Request_URI warm\.php$ no-gzip dont-vary

    • Hi,
      I use warm2.2.1 and got this,what’s wrong?thank you!
      http://www.example.com – Set to warm 10 pages per run with a 2 second delay

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : Space required after the Public Identifier in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : SystemLiteral " or ‘ expected in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : SYSTEM or PUBLIC, the URI is missing in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 228

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 228

      Sitemap Start: http://www.example.com/sitemap.xml

      Warning: Invalid argument supplied for foreach() in /home/rithum/public_html/warm.php on line 241

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : Space required after the Public Identifier in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : SystemLiteral " or ‘ expected in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: http://www.example.com/sitemap.xml:1: parser error : SYSTEM or PUBLIC, the URI is missing in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /home/rithum/public_html/warm.php on line 101

      Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/rithum/public_html/warm.php on line 101

      Warning: Invalid argument supplied for foreach() in /home/rithum/public_html/warm.php on line 104

      Run Complete

      • You need to set:

        $manual_server_url = “http://www.example.com”;

        to your actual domain name to run via command line or cron.

        • Thank you for your reply,I got this now:
          http://www.saleinfo.org – Set to warm 10 pages per run with a 0.5 second delay

          Sitemap Start: http://www.saleinfo.org/sitemap.xml

          No Cache: / (1.0)
          No Cache: /特价手机/ (0.6)
          No Cache: /大家电清仓/ (0.6)
          No Cache: /麦包包销售排行榜/ (0.6)
          No Cache: /å›½é™…å“ ç‰Œå¤ªé˜³é•œé”€å”®æŽ’è¡Œæ¦œ/ (0.6)
          No Cache: /特价书/ (0.6)
          No Cache: /èµ„ç”Ÿå ‚é”€å”®æŽ’è¡Œæ¦œ/ (0.6)
          No Cache: /gps销售排行榜/ (0.6)
          No Cache: /雅顿销售排行榜/ (0.6)
          No Cache: /æœ é¥°ç®±åŒ…ç‰¹ä»·æ¸…ä»“åŒº/ (0.6)
          does no cache means already cached?

          • No Cache means it tried to Warm the cache but afterward the file wasn’t found in the cache.

            This can happen for 2 reasons:

            • The page in the sitemap is not generated by wordpress. In that case add the page or a matching word to the excludes.

            • The location of the cache path is not correct

            In this case, It may be the asian characters used in the URLs or url encoding. I haven’t had a test case like that one.

            When it gets down to the following in your sitemap, does it warm or skip these?

            /membership-confirm/
            /membership/
            /add-new-confirm/
            /add-new/
            /edit-item/
            /profile/
            /dashboard/

            When you look in W3TC’s cache folder are the file names created using the asian characters or the url encoded characters? There may just be an encode/decode that needs to be added.

            I’ll try a test here and see what happens as well.

          • I tried some asian text for a page name here and it worked as expected except that my warm run shows the text as asian characters rather than url-encoded characters.

            I’m using the same sitemap plugin as you, but I’m using the 4.0b8 version. Not sure if that would make any difference.

            I actually just ran your warm, and it appears to work for me. It was showing skipped and a few No Cache. So, if those pages are consistently pages that shouldn’t be and can’t be cached, add them to the excludes so the script will run faster next time.

          • Also, when running in the browser I noticed your warm page was displaying until the script was finished. Did you try the suggested htaccess line that came with the warm files? I’ve only had my servers to test on so on want to see how it fares out in the wild. If it works, you should see the warm progress more quickly line by line. Although it’ll be more noticeable with pages to warm than skips.

      • Try Version 2.2.2

        • Reorganized settings so all important options are grouped at top of file.

        • Clarified some of the instructions.

        • For Cron and Command Line use, if warm.php has not been configured from using http://www.example.com, PHP / XML errors are suppressed and an instructional error message is provided instead.

        • Miscellaneous minor bug fixes.

        Download Link:

        http://www.rhubarbproductions.com/warm/warm2.2.2.zip

        • Hi,
          Thank you for the replay,I ‘ll try warm2.2.2 later.when I run it in browser,every line is skipped,I think it mean all pages are warmed,when run it in command line,the email show every line is no cache,maybe just the log error.

          • Sounds like when run on the command line it’s not finding your cache files.

            Are you running the same warm.php as the browser?

            Does your command line login have the proper permissions to read everything? Maybe try logging in as root if you aren’t already and are able to.

            I’ve tested the command line running as root under FreeBSD, Red Hat Linux and Cent OS.

  17. I’ve made some updates to warm.php for the following:

    Changelog

    Version 2.2 – 5 March 2012

    • Added compatibility for Sitemapindex files that link to other sitemaps where the urls are actually located.

    • Added an Exclude list where files or folders that are not cache-able (such as pdf files) can be listed so they’re always excluded.

    • Removed limitation on sub-folder depth so it doesn’t require modifying the script to delve deeper into a site.

    • Added ability to auto-detect the server url so the script can just be dropped and viewed in the browser without configuration. Cron or Command Line still requires further configuration.

    • Added absolute path detection so script can be run on the command line from anywhere.

    • Fixed issue where on some systems, more that the configured number of pages were being warmed.

    • Added the ability to temporarily override the settings for Pages per run, Delay, and quiet mode via query string or command line parameters. For example, if your script is configured to run quiet, with a delay of 5 seconds and 10 pages per run, buy you want to quickly change it to show the log, set the delay to 0.5 seconds and increase the pages to 20 you can use one of the following:

    Command Line:
    php /path/to/example.com/warm.php -p 20 -q 0 -d 0.5

    Browser:
    http://www.example.com/warm.php?p=20&q=0&d=0.5

    • Created an HTML version of the Output Log for viewing in the Browser. Provides links to sitemaps & pages for the listed URLs.

    • Output Log now lists the current Pages per Run count and delay between pages.

    • Added the ability to force the Output log to update more often when viewing in browser. The initial time to display the first content appears to be limited to your php configuration.

    • In addition to the Previous Skipped and Warmed indicators, there’s now a few more:

    “Sitemap Start” – shows the initial sitemap provided.

    “Sitemap” – If the initial sitemap is a sitemapindex, this shows the current sitemap being parsed as provided by the sitemapindex.

    “No Cache” – shows when a page has been warmed, but a cache page wasn’t generated. These are good candidates to be added to the Exclude list.

    “Next Run” – instead of stopping when the Page per run count is reached, the script will show you which pages that need warming are being delayed until the next run of the script.

    • Just before the “Warmed” indicator is the numerical count for the current Page per Run, so you can see it count up to the configured number.

    • The Output log Priority number has been moved to the end of the line in parenthesis.

    • The Output log now shows a “Run Complete” when finished so on larger sites you know when it’s done.

    Download Link:

    http://www.rhubarbproductions.com/warm/warm2.2.zip

    • Hi Kevin,

      Thank you very much for sharing your update, nicely done! I’ll check that out and probably include it in to the main post as well…

      • I’ve updated the warm2.2.zip to now contain a default file for both W3 Total Cache and WP Super Cache users.

      • I replaced the warm2.2.zip with a small bug fix where Next Run could show for a page when it should show Skipped if the Skipped page came after the Pages per Run count was passed.

      • OK, in addition to the previous W3 Total Cache and WP Super Cache versions, the warm2.2.zip now comes with a version for Quick Cache.

  18. I don’t know enough about code to make changes so I installed it as is. I ran the test you suggested and the script cached 10 previously uncached pages. The process continued to run according to Entry Processes on my c-panel. Then, I guess the process timed out and Entry Processes went to zero.

    I think my c-panel is seeing scripts that run the pages, then stay active until they time out. After they run the pages their cpu usage drops to zero, but each running script counts as one entry process. If this is what is happening, I guess it’s not a big deal as long as my host doesn’t nail me when I reach 20 processes.

    • Script exits as soon as it finishes it’s task but on your configuration some how it stays open until timeout. Since your CPU usage drops to zero, I don’t think you’ll have problems with your host.

      Just to be on the safe side, you can try this… Locate the following line:

      set_time_limit(600);

      600 is max execution time in seconds which is 10 minutes. Change that value to something like 30 (half a minute). If your pages are not loaded, that should be more than enough to cache 10 pages…

      You may also try to set your cron with longer intervals…like every 10 mins or so…

      I hope this helps… :)

  19. Hi,

    I’ve spent all day trying to get various plug ins to preload my cache, tried your solution and I finally have preloaded cached files! However, soon after I activated the cron-job the Entry Processes began creeping upward at a steady rate. What’s going on? Any idea? I turned off the cron-job because I was worried when I maxed out of Entry Processes my host would penalize me.

    Best,

    TJ

    • Hi TJ,

      ppi vaule (default is 10) is designed to prevent that kind of problems, are you using the script exactly as downloaded? Also what is your cron (frequency) settings?

      In your case, I would disable the cron (for testing), set quiet = false; so it would output what it does and call the script using your browser.

      If it caches 10 (previously uncached) pages and stops, then it is working fine. In that case problem might be with your cron setup. But if keeps running after the ppi limit, we should look somewhere else…

  20. Excellent !

    If I use this cron job, can I turn off (uncheck) in W3 Total Cache the option to Cache Preload and your cron will still work ? There is a bug in W3TC calling an Undefined function (PgCacheAdmin.php) so I would like to turn it off …

    Thanks !

  21. Thanks for this script. It’s definitely helping us. Oddly, the page cache was primed on some of our websites, but all were in the same hosting environment. This script has made it work on all sites.

  22. Hi,

    I recently found this post and implemented the cronjob on my server (PHP). It appears that W3TC uses different paths now? It currently is re-caching all of the URLs because it can’t find the appropriate path. W3TC uses hex encoding, and the warming file is using a straight human-readable path.

    Example:

    [home dir]wp-content/w3tc/pgcache/target/_index.html (what it’s looking for)
    Priority: 1.0 => Warmed: [home dir]wp-content/w3tc/pgcache/target/_index.html by visiting [url of file]/target

    The real path (when viewing over FTP etc):
    [home dir]wp-content/w3tc/pgcache/0/f/9/0f962be9f87d071794797001119c496a

    • Hi Matt,

      I’m using the latest version of W3TC (Version 0.9.2.4) on this website and the paths are in the original format, not encoded like yours… Which page cache mode of W3TC you are using? Disk: Enhanced or Disk: Basic?

  23. Thanks so much for this script. It is the missing piece that completes w3tc. (At least in the way I use w3tc). Had it working in 15 minutes. Ended up with a 2 minute cron with w3tc expires at 8 hours and cleanup at 8 hours. Cache is primed 99+% of the time now.
    I think I will also follow up on OCP v2.0. It seems like it has merits for larger sites.

  24. You rock!

    Just used this to make a sluggish Amazon Instance punch well above it’s weight.

    Only thing I changed was:

    //Do not change anything below this line unless you know what you are doing
    set_time_limit(0);
    ignore_user_abort(1);

    That’s it :)

    Thanks again!

    Hugh

  25. Hi,

    Just tested the warmer and I have enabled output so I can check what is going to be warmed. The main issue I have is that the chached files are not created. The warmer is trying to warm same URL again and again.

    The issue is not with the warmer, but with the W3TC. I have modified one line as suggested here http://wordpress.org/support/topic/plugin-w3-total-cache-preloading-page-cache-does-not-save-pages but alas.

    The other issue is that the W3TC is caching some pages from time to time, but I cannot get a clue why =[

    I hope they will fix this undocumented features =]

    Keep up the good work!

    • Hi,

      Does your W3TC creates cached pages upon normal browser visit? If so, following worth a shot…

      Please modify the code as follows and try it again…

      From 10 to 30 => curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 30);
      Comment out (or totally remove) => //curl_setopt ($ch, CURLOPT_NOBODY, true);

      The fix you have mentioned is for W3TC internal primer, which is not working for me at all… But above two changes basically does the same thing in my code. Changes access method to GET instead of HEADER…

      Anyway, if you can try that let me know if it works or not!

      • Hi,

        The W3TX is working somehow – some pages are cached in wp-content/w3tc/pgcache/ but when I try to visit some page that is not cached … nothing is happening. Viewing the source of page do not have the W3TC comment line. Basically I do not have idea why only some pages are created in wp-content/w3tc/pgcache/ =[

        Modified the warmer code as you suggested, but alas – nothing changed.

        It gets weird and weird – I have another site using W3TC and there page caching is fine (except the preloading part). Exported settings and then imported them in the site I have issues and nothing changes. Sad panda =[

        Regards,

        • I see, there is definitely something wrong with your setup…

          I would suggest trying out WP SUper Cache instead. If I wouldn’t be using W3TC minify options I might have switch back to it as well… It’s easier to setup and also caches sub-pages (such as page 2, page 3, etc.) with disk cache.

          You may also use warmer with it just by modifying the following (not tested)

          $index = “index.html”;//Cache file to check
          $rootp = “wp-content/cache/supercache/www.yourdomainhere.com”;//Root of cache

          • Hi,

            Happy face – found the bugger that was messing with W3TC proper working. One plugin have this line ‘@ob_start(“ob_gzhandler”);’ and after commenting it the W3TC start to cache all pages.

            Still, the preload is broken. The warmer is giving me results, but I cannot find the pages that are warmed in cache =[

            Going to revert the changes and back to 2.0 vanilla.

            Will drop you a line after do some more testing on vanilla version.

            Regards,

          • DARN!

            Found the second bugger that was not allowing warmer to actually do the work. All the links in sitemap.xml were missing trailing slash … The warmer is telling me that he is warming the URL, but no cache file is generated.

            So after fixing the trailing slash the warmer start to load proper URL and cache is there. YAY!

            I’m not sure if this can be added to the logic, because it depends on current user setup of permalinks, sitemap, etc.

            Anyway, it is working as intended now =]

            Regards,

          • Ah, great the hear that!

            Yes, I think I will add a check/fix for that since that might happen to others as well…

  26. You’re the man! Thank you so much! Will try it out and let you know. :)

  27. Hey, I have been having the same problem with my w3 cache not preloading as expected so I wanted to try your script.

    I did not setup a cron yet, but just uploaded it to my site and hit it with my browser to test it out such as http://mysite.com/warm.php.

    I have a few question for you…

    1) how many pages does the script attempt to cache? I have W3 Cache set to 10 pages every 900 seconds. When I ran the script, it appeared to cache far more than 10. In fact, the script still looks like it is running as I seem tons of new pages being added to the site even though the site has no traffic… it is a test site.

    2) Does your script go by the W3 Cache Prime settings or can you control how many pages are cached within the script each time the script is run?

    3) When I ran the script in my browser, the page never finished loading. It just kept running and running. Even after I close my browser window, it still appears to be running and creating pages. When will it stop or how do I kill it?

    4) My site map is quite lengthy. Is it trying to cache all the pages on a single run?

    I am not that familiar with PHP, so I really don’t know how to modify your code. But it does look like it is working, but just all at once. I want to limit how many pages are cached at one time to preserve server resources.

    Thanks for the script and any advise you can give.

    Dustin

    • Just to follow up to my last post, has been over 45 minutes and apparently the script is still running since pages are still being created. My processor on my server has been running at nearly 100% the entire time. The script really does seem to work, but sucked up all my server resources when it ran for the first time because none of the pages existed in the cache. If there is a way to limit how many pages are cached on a single run, this will be an awesome script. Thanks in advance for any help you can give. Dustin

    • Hi Dustin,

      Currently this script is pretty basic without any limiting/throttling functions but I think I’ll need to add those to make it actually useful for larger sites such as yours. Let me try to answer your questions…

      1) Current version tries to cache all pages provided in sitemap.xml
      2) Script has no connection to W3 Cache Prime settings.
      3) It should die out when the PHP execution limit is reached (defined in PHP settings)
      4) Yes, refer to answer #1

      I think I should update it really soon ;)

      So based on your input, I think I will add settings for limiting the cached pages per run. So, you can set it to 10 and it will stop caching after 10 pages. Also it looks like delay between loads (current default is 1) should be smarter (it should actually wait for connection to end, then executing the defined delay).

      Those two updates should solve your current problems.

      Hopefully, future versions might be W3TC aware or could be a plugin for it…

      • Yes, setting a limit would be awesome. Then I could set it to run every 10-15 minutes and run 10-15 pages at a time. That would give the script time to finish by the time it runs again. I look forward to your update. Thanks again for the great script! :)

  28. Hi,

    I’m sorry to hear you weren’t able to run OCP due to the old Python installation.

    I’ve just recently released version 2.0, which doesn’t require any runtime. If you’re still interested in the local cache-probing feature, and a variable concurrent request option (for very fast priming of thousands of pages), you might find it useful!

    • Hi Patrick,

      Thanks for letting me know, I will definitely check it out!

      At least one good thing has come out of this as now we also have a PHP version with similar functionality… Mine also does local cache-probing but of course concurrent request option is really a must for large sites…

Leave a Reply