How to efficiently invalidate cache?

Make me a cache money millionaire!

  • Help me set up a simple 100% caching forward proxy for my home. I dicked around with squid and then apache+mod_proxy/mod_cache yesterday afternoon, and while they proxy beautifully, they don't seem to have much cache hit even on static content- almost or completely 0% cache hit. First, I am at work so I'm going to be a touch fuzzy on the details, and can't implement anything till I'm home. Goal: I'm looking to set up a forward caching proxy for static content on my home network, mostly for browsing while on my home machines (not counting mobile devices like ipad, iphone, android) and largely from FF on Mac, which is my principle browsing option. That Mac is running FF with FoxyProxy Standard installed, and I set up rules such that "mostly" static content like jpg, gif, png, css, js and known static pages from whitelisted URLs/sistes (even those with queries that I can trust to be minimally volatile for my purposes) are sent to the proxy to hopefully be cached for a few days, and thus avoid the RTT/lag of loading from the original site on repeat visits or browsing around, especially given how flaky my Comcast is. This is especially useful for sites like imgur, which I visit more often than is healthy and has all of those thumbnail images on the home page. I also have a couple of GM scripts that do preloading for sites like Craigslist etc, which on page reload would really cut down on traffic generated outside of my router. Anything not matching these whitelist rules of *.jpg, etc will bypass the proxy altogether and load as normal. And yes, I am well aware of the risks, but honestly I trust my instincts and web knowledge, and ability to one-click disable foxyproxy if I suspect erratic behavior. And no, the browser's default behavior is not caching nearly enough for my tastes. Setup: Base machine is a 2008 Mac Pro running OSX 10.6 (Snow Leopard, I believe- definitely not Lion). I have VMWare Fusion 4 running a couple of Windows VMs and an Ubuntu 11.10 VM. I was setting up the proxies in my Win2k3 VM, simply because it was there and acts as little more than a VPN client for TS'ing to work, and tends to be running in the background as often as the Mac is powered on- which is to say, 24/7. The ideal here, for my short term purposes, is a caching proxy on the Win2k3 VM or the Mac (the Linux is for experimenting, and is less stable/consistently there) where I can filter in the browser to effectively have a local disk cache that supplants the browser's in a way I can explicitly view and control. Failures: FoxyProxy is working fine when enabled, as I see the traffic going to the proxy only for those whitelists, and pages continue to work fine. With both Squid and Apache mod_proxy/mod_cache/mod_disk_cache, they seem to work great as proxies, and even seem to create cache files... yet even for urls that aren't parameterized such as http://site.com/static/images/1234abcd.jpg, they both show evidence of cache miss despite repeat visits. Even just clicking forward/back shows the browser requests the content anew, the proxy logs show a cache miss (TCP_MISS in Squid, the SetEnv/CustomLog trick in Apache, and Netmon 3.x to confirm the outbound re-request by the proxy for content it ostensibly cached). Some content does get written to the cache folder, but doesn't appear to be used- the cache miss ratio is almost 100% in Apache, and exactly 100% in Squid. Squid was a snap to install and setup, but looking at its logs while it proxied, it was doing a TCP_MISS 100% of the time- despite the cache folder being populated with *some* content. I tried adjusting the refreshfilter and cache rules, and again this would result in content being written to disk... and then showing TCP_MISS in the logs 100% of the time on page reload (by reload I mean both F5, and simply revisiting the same URL in a new tab). Because Apache is about as universal as it gets, I tried that after Squid failed, and it exhibited the same behavior: proxies for images fine, writes files to disk, so the browsing is seamless... but doesn't appear to actually use the cache on followup visits. I tried enabling just about every cache element in mod_cache including the multiple items to ignore certain headers and those that violate the HTTP standard and would normally be a bad idea if I wasn't whitelisting via FoxyProxy... but no dice: it still won't cache. Outcome: Basically, I want a 100% caching forward proxy that I can whitelist some types of traffic to (via FoxyProxy) and have them server from disk cache for N minutes/days (configurable) before expiring. Ostensibly, Apache should work fine for this, but while it's caching some files to disk, it doesn't then use them. I'd prefer to run the caching proxy on the Win2k3, but since I have Mac and Linux as options those would work as well- although the Linux is the most volatile as an OS, what with it being a VM and upgraded/rebuilt relatively often.

  • Answer:

    Have you checked to see what kinds of cache-related directives sites may be sending you in their HTTP headers? For instance, using the https://addons.mozilla.org/en-US/firefox/addon/live-http-headers/, I can see that Metafilter is sending a 'Cache Control: private' directive. This designates the request as user-specific and instructs any caches not to place the response into a multiuser shared cache. I imagine a lot of sites are probably doing that, and it could be that Apache and Squid are by default not caching it.

hincandenza at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

Seconding what Ron said. There are many sites that do not set cache expiration on static content, so the static content is effectively served with a "don't cache me" instruction. The sad thing is that it would be in their interest as well as yours to set their web server caching correctly so that their bandwidth costs go down and user experiences fast-loading web pages. Big popular sites like Facebook or Twitter have caching strategies. Let me check Metafilter... Here is what YSlow ( http://yslow.org/ ) says about Metafilter on the subject of caching: There are 6 static components without a far-future expiration date. (no expires) http://static.chartbeat.com/js/chartbeat.js (no expires) http://connect.decknetwork.net/i/atmail_envelope.png (2012/5/22) http://www.google-analytics.com/ga.js (no expires) http://d217i264rvtnq0.cloudfront.net/styles/mefi/favicon.ico (2012/4/14) http://www.metafilter.com/scripts/favorite_front031611-min.js (no expires) http://connect.decknetwork.net/deckMF_js.php?... I don't necessarily agree with YSlow's above criticism of Metafilter. It is just an example. Are you hoping to override the caching instructions given by the server? It seems like that would give you a long series of small, irritating problems to deal with.

ErikH2000

Ron: Right, but if you look at the http://httpd.apache.org/docs/2.2/mod/mod_cache.html documentation, there's a number of directives, include a CacheStorePrivate which I specifically set to "On" (default is off) so in theory, it should be caching even with the Cache Control: private header. The set of options on that page I basically enabled across the board as appropriate wherever they dictated when to forceably override normal cache behavior. About the only one I didn't set was the CacheIgnoreHeaders because I wasn't sure which headers to specify. I'll look at the HTTP headers and see if any others are coming through, and if they can be explicitly overridden with CacheIgnoreHeaders. ErikH2000: I also set the CacheIgnoreLastMod and CacheDefaultExpire to essentially handle the no expiration date issue. Although now we're getting to a place I'd have to double-check the conf, and since I don't leave my ssh open on my router unless I'm going on vacation, I can't pop back into my machine from work to double check right now. Also, as I said in my initial writeup, I am aware of where these problems could crop up, and it's why I'd use such as proxy as a whitelist proxy for file types or specific sites, and simply click to disable FoxyProxy if anything seemed "irritating" or broken. As you say, a lot of sites don't make good use of caching, or have a number of small page elements that are more costly simply because of the new download than the actual bytes/sec time (and many sites do not have pipelining enabled, etc). I'm not oblivious to issues of scale and caching on web servers... which is why I trust myself to have a cache I control in front of my browsing experience, where I can whitelist sites or content types as desired. FoxyProxy supports RegEx whitelisting as well as simple wildcards.

hincandenza

A thought occurred to me just now, that I have all my cache values in an httpd_cache.conf in /conf/extras, and an include line in the main httpd.conf... but is it possible the httpd.conf is overriding those values later in its own doc- I assume the include is in-place? I hadn't even checked, so for all I know the resultant settings are not what I think from /extras/httpd-cache.conf. Is there an easy way to see what the run time config is on startup, perhaps through a verbose logging setting?

hincandenza

Just FYI, this is my Apache httpd-cache.conf:# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html<IfModule mod_proxy.c>ProxyRequests On<Proxy *>Order Deny,AllowDeny from allAllow from all</Proxy>ProxyVia On</IfModule><IfModule mod_cache.c><IfModule mod_disk_cache.c>CacheRoot E:/PROXYCACHECacheEnable disk /CacheDirLevels 3CacheDirLength 2CacheIgnoreCacheControl OnCacheIgnoreNoLastMod OnCacheIgnoreQueryString OnCacheStoreNoStore OnCacheStorePrivate OnCacheMaxFileSize 100000000CacheDefaultExpire 259200CacheMaxExpire 432000</IfModule>ProxyTimeout 60#NoProxy 192.168.*.*/255.255.*.*# When acting as a proxy, don\'t cache the list of security updateCacheDisable http://security.update.server/update-list/</IfModule># End of proxy directives

hincandenza

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.