Push Your CDN Cache Hit Ratio Past 95 Percent
Stale-while-revalidate, surrogate keys, and origin-shield patterns that keep traffic off your servers and pages near instant.
A high cache hit ratio is one of those numbers that sounds like infrastructure trivia until you watch what it does to a business. Every request your CDN serves from the edge is a request your origin never sees, a page that returns in single-digit milliseconds from a POP near the user, and a server you do not have to scale. Push the ratio from 80 percent to 95 percent and you have cut origin traffic by three quarters, made the median page load feel instant, and quietly removed your busiest server from the critical path. That instant median is not a vanity metric either, since every slow second has a measurable cost in conversion and revenue.
Most sites sit lower than they should, not because their content is uncacheable but because the cache is misconfigured in a handful of predictable ways. The fixes are unglamorous and the payoff is large. Here is how to find the leaks and seal them.
Measure the ratio that matters
Cache hit ratio is hits divided by total cacheable requests. The number every major CDN reports, and the target worth chasing for a content-heavy site, is 90 percent or higher. Below that, you are paying origin and latency costs you do not need to.
But the headline number hides the useful signal. Break the ratio down by content type and by URL path. Static assets like images, CSS, and JavaScript should be hitting close to 100 percent; if they are not, something is actively breaking caching for files that should never miss. Those image bytes are also where the biggest transfer savings live, which is why an image pipeline that serves the perfect byte count pairs so well with a high hit ratio. HTML and API responses are where the interesting work is. Look at the misses, group them, and you will usually find two or three patterns responsible for most of the leakage.
The cookie and header leaks that quietly disable caching
The most common reason a cacheable response misses is that something on it tells the CDN not to cache it. Two culprits dominate.
Cookies on static hosts. If your origin sets a cookie on a response, most CDNs treat that response as personalized and refuse to cache it, or vary the cache on the cookie value so every visitor gets a unique entry that never gets reused. Analytics cookies, session cookies set globally, and framework defaults all do this. Strip unnecessary cookies and Set-Cookie headers from responses for static and semi-static content before they reach the cache. A single global session cookie can drag a 99 percent asset hit ratio down to nothing, and an oversized one can do worse than that, as Safari 520 errors from large auth cookies demonstrate.
Restrictive Cache-Control directives. Cache-Control: private, no-store, and no-cache each tell the CDN to skip or revalidate. They get applied by default in a lot of frameworks and proxies, then never removed. Audit what your origin actually emits. A response that says private will never be cached at the edge no matter how cacheable the content really is. Replace those with explicit, generous TTLs for anything that does not contain per-user data.
The discipline here is to be explicit. Set a deliberate Cache-Control on every route rather than letting framework defaults decide, and the leaks close one by one. Getting these headers right across a fleet of vhosts is exactly the kind of unglamorous correctness work that disciplined server administration exists to handle, and it sits next to the other headers you are setting deliberately, like the security headers every Next.js app should ship.
Stale-while-revalidate: serve fast and stay fresh
The hardest content to cache is the kind that updates, because a short TTL means frequent misses and a long TTL means stale data. Stale-while-revalidate breaks that trade-off.
The mechanism: when a cached object passes its freshness window, the CDN serves the stale copy to the user immediately and fetches a fresh copy from origin in the background. No visitor waits on the revalidation. The stale response goes out in milliseconds, the fresh one lands in the cache a moment later, and the next visitor gets the update. You get the latency of a hit and the freshness of a miss, on every request.
Cache-Control: public, max-age=60, stale-while-revalidate=600
That header says: treat this as fresh for 60 seconds, and for the next 600 seconds after that, keep serving the stale copy instantly while quietly refreshing it. For a homepage, a category listing, a blog index, anything that updates but not per-user, this single directive is often the largest hit-ratio improvement you can make. Frequently updated content gets a short max-age plus a generous stale-while-revalidate, and your origin sees a trickle of background revalidations instead of a flood of blocking misses.
Surrogate keys: purge precisely instead of waiting out a TTL
The reason teams set timid TTLs is fear of stale content they cannot clear. If the only way to update a cached page is to wait for its TTL to expire, you keep TTLs short and your hit ratio suffers. Surrogate keys remove that fear.
A surrogate key, also called a cache tag, is a label you attach to a cached object. Tag every product page with product-1234, tag every page that lists that product with the same key, and when the product changes you purge by the tag in a single API call. Every page carrying that tag is invalidated at once, no matter how many URLs are involved, while everything else stays cached.
This inverts the strategy. Instead of short TTLs and frequent misses, you set long TTLs, cache aggressively, and purge surgically the instant content actually changes. The hit ratio climbs because objects live in the cache until they genuinely need to leave, not until an arbitrary clock runs out. Fastly built its purge API around exactly this; most enterprise CDNs offer some form of tag-based purging. Wire your CMS or application to emit the right surrogate keys and to purge them on write, and you get freshness without sacrificing the ratio.
Origin shield: stop the misses from stampeding your server
Even with everything above tuned, the misses you do have can hurt out of proportion. A CDN runs many edge POPs around the world. When an object is not cached, each POP fetches it from your origin independently, so a single popular but uncached object, or a cache purge during peak traffic, can fan out into dozens of simultaneous origin requests for the same thing.
Origin shield fixes the stampede. You designate one mid-tier POP or region as a shield that sits between all the edge POPs and your origin. Edge misses go to the shield first; the shield is the only layer that ever talks to your origin. So instead of 40 POPs each requesting the same uncached object, the shield fetches it once, caches it, and serves the other 39 from there. Your origin sees one request where it would have seen forty.
This is the difference between an origin that idles comfortably and one that buckles every time a cache entry expires under load. Enabling shielding consolidates cache-miss traffic, smooths out your origin's request pattern, and makes the rare miss survivable. It also improves the odds that a stale-while-revalidate copy exists somewhere to serve when origin is briefly unreachable.
Put it together
The path past 95 percent is a sequence, not a single switch. Measure the ratio broken down by content type so you know where the misses live. Strip the cookies and remove the restrictive Cache-Control directives that are silently disabling the cache. Layer stale-while-revalidate over anything that updates but is not personalized, so freshness never costs you a blocking miss. Tag content with surrogate keys so you can cache aggressively and purge precisely. Then put an origin shield in front of the whole thing so the misses you have left never stampede.
None of these are exotic. They are configuration done deliberately instead of by default, and together they take a server that strains under traffic and move most of that traffic off it entirely. The pages get faster, the origin gets quieter, and the next traffic spike stops being an emergency. The slice of traffic that still reaches origin is where slashing time to first byte with streaming server rendering earns its keep, and reining in egress is part of capping the bandwidth bill quietly eating your margin.






