How Web Caching Works In CDN ( Proxy ) Chain Use Cache-Control Http Header

When a web browser requests a web page from the original web server, the request may go through several intermediary proxy or CDN servers like the below picture. Each node in the chain ( includes both client browser, CDN proxy server, and original webserver ) can cache the requested web page resources follow their own cache policy that you configured in it. So the next time you or another client requests the same web page resources, the web browser, intermediary proxy server, or original web server can return the cached version of the resources to the client to reduce web page load time and improve web page load performance.

cdn-web-proxy-server-chain

From the above picture, we can see I have set up two CDN servers between the original web server and the client web browser, and each node in the chain can have it’s own cache.

  1. Original Web Server Cache is software that caches web resources in the web server, this cache is a shared cache, it can be accessed by multiple different client users. In my example, because my original web server is WordPress, so I install the WP Super Cache plugin to generate dynamic web content’s Html version in the server cache to improve server performance.
  2. Cloudflare or Ezoic CDN Cache is also a shared cache, the cached web resources can be accessed by the different client if the cached resource is not expired, but if the cached resource expires or the client tell the CDN server to validate the web resource content, then the CDN server will reload the updated web resources from the upper data source ( Ezoic CDN load resources from Cloudflare CDN, Cloudflare CDN load resources from the original webserver ). But Cloudflare CDN does not cache the Html page by default, so if Ezoic CDN wants to reload the Html page content, it will load the Html content from the original web server directly.
  3. Client Web Browser Cache is a private cache, the cached data is used by the client web browser only. All the cached data is saved on your local computer hard disk. When the cached web resource expires, the local cache will get the updated web resource from the Ezoic CDN server.

1. HTTP Cache-Control Header.

We control cache policy use an HTTP header cache-control. This header can be set both in the request or response header. The cache-control header’s value can be following.

  1. public: every proxy cache server can cache a copy of the original webpage.
  2. private: only the client browser can cache the web resources.
  3. max-age: the web resource will be stale before the max-age expire. Each layer cache in the chain only promises it’s cached version is fresh.
  4. no-cache: before return the cached resource back to the client, must validate whether the cached resource is validated or not from the original web server. If validated then return the cached resources, if not then update the cached resource from the original web server.
  5. no-store: every cache server should not store the copy of the requested web resources.
  6. For more cache-control header values, please refer Cache-Control.

1.1 Set Cache-Control Header In HTTP Request.

  1. When it is used in the HTTP request header, it is used to tell the upper cache server how to operate the request.
  2. For example, when you check the Disable cache checkbox in the firefox developer tool, you can see there is Cache-Control:no-cache header in the Request Headers section.

1.2 Set Cache-Control Header In HTTP Response.

  1. You should set the response Cache-Control header value on the server-side ( in your source code or .htaccess ).
  2. Then when you return this header to the client browser, the browser will know whether the web resource should get from the local cache (  if the resource is fresh not expired  ) or from the original web server again ( the web resource has expired ).
  3. For example, below cache-control header value ( cache-control: public, max-age=86400 ) will tell the client browser this web resource can be returned from the disk cache ( you can see this from the Headers —> General —> Status Code field values ( 200 ) from the web browser inspector window ) within 86400 seconds. If time goes by 86400 seconds since the resource’s last update, the web browser will request the web resource from the upper data source ( for example the Ezoic CDN server ).
  4. If the cache-control header value is something like cache-control: max-age=0, must-revalidate, no-cache, no-store. This means the client browser must validate the web resource status first, if the web resource is not modified ( upper data source return status code 304 ), then it will return the cached version back to the browser client, if the web resource is modified since last retrieve ( upper data source return status code is 202 ), then it will update the cached web resource to the newest version and return the newest version to the client browser.

1.3 How To Check Whether A Web Resource Has Been Modified Or Not.

  1. To check whether the web resource has been changed or not, we use an etag HTTP response header. This header value is a hash value based on the web resource content. It’s value is generated and reserved on the server-side.
  2. If the client request provided etag header value is the same as server reserved, that means the web resource content is not changed. Then the cached web resource can be used. If the client and server etag values are different, that means the web resource has been changed. So the web resource needs to be updated in the cache.
  3. With etag header to verify the web resource’s freshness, this way can reduce the website traffic and improve performance.

2. Cache-Control Header In The CDN Proxy Server Chain.

cdn-web-proxy-server-chain

  1. As we have discussed at the beginning of this article, each node in the above chain can have it’s own cache policy.
  2. Each node cache policy setting can make cache-control header’s value changed or even add extra response headers ( for example Cloudflare add CF-Cache-Status header, Ezoic add x-ezoic-cdn header ).
  3. So you can think of each upper node as the original data source of the lower node. And each node maintains it’s own web resource data freshness. If the cached web resource expired then the node will request the web resource from the upper node.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.