Cache Strategies in Distributed Systems

Caching is the cornerstone of high-performance distributed systems. It’s the closest thing we have to a magic wand for reducing latency and protecting overwhelmed databases. Yet, when basic caching is deployed naively in systems with massive, concurrent traffic—like Netflix releasing a new season, an e-commerce platform starting a Black Friday sale, or millions tuning into an IPL streaming match—the cache itself can become the cause of a catastrophic system failure.

This blog explores the advanced cache strategies necessary to move beyond simple Time-To-Live (TTL) expiration and handle the realities of distributed scale.

1. Why Basic TTL Caching Is Not Enough

In a simple setup, you cache a piece of data (e.g., a product details page) and set a TTL (Time-To-Live) of 60 seconds. For those 60 seconds, your database is safe. Every request is served from the lightning-fast in-memory cache.

The problem occurs at second 61.

If your system receives 10,000 requests per second for that product, at second 61, the cache entry expires. Now, every single one of those 10,000 concurrent requests detects a "cache miss" and synchronously hammers the database to recompute the exact same value.

How Cache Expiry Causes Traffic Spikes

This phenomenon is known as the Thundering Herd problem (or a Cache Stampede). A backend database designed to handle 500 queries per second is suddenly hit with 10,000. It chokes, latency increases dramatically, and the application servers waiting on the database responses start to back up, leading to cascading failures.

2. Advanced Strategies to Prevent Cache Stampedes

To tame the thundering herd, we need to ensure that when data needs refreshing, the backend isn't overwhelmed by redundant work.

TTL Jitter – Adding Randomness to Expiration

The simplest solution to the thundering herd is to stop all keys from expiring at the exact same moment. If you are caching thousands of products during a sale, don’t set their TTLs to exactly 30 minutes.

TTL Jitter adds a small, random amount of time to the base TTL. Instead of 600 seconds, you set the TTL to 600 + rand(0, 30) seconds. This "smears" the expiration times over a window of 30 seconds. Instead of a single spike of 10,000 requests at second 600, you have a smoothed distribution of requests between second 600 and second 630.

Mutex / Cache Locking

The thundering herd happens because 10,000 requests try to do the same work simultaneously. The Mutex / Cache Locking strategy ensures only one request is allowed to recompute the value.

The Logic:

Request A hits a cache miss.
It attempts to acquire a distributed lock (e.g., using Redis SETNX—set if not exists) for that specific key.
If Request A gets the lock, it proceeds to fetch data from the DB, updates the cache, and releases the lock.
If Requests B, C, and D hit a cache miss while Request A has the lock, they cannot get the lock. They must wait, retry after a short delay, or return a fallback value until the cache is populated.

This serializes the regeneration process, turning 10,000 potential DB queries into exactly one.

Probability-Based Early Expiration (Probabilistic Early Re-computation)

This is a more mathematically elegant solution. Instead of waiting for the key to expire completely, the system makes a probabilistic decision to refresh the key before it officially expires.

We don't need to get heavy on the math, but conceptually, the algorithm uses a function (often involving an exponential component) where the probability of refreshing increases as the current time gets closer to the TTL expiration.

Example: If a key has 60 seconds to live, a request hitting it at second 10 has a 0.1% chance of triggering a background refresh. A request at second 55 might have a 40% chance. This ensures that a popular key is almost guaranteed to be refreshed by someone before it dies, "smearing" the refresh load without requiring explicit locking.

3. Balancing Freshness, Latency, and Consistency

The strategies above primarily focus on load management. The following strategies help balance the inherent tradeoffs in distributed caching: How fresh must the data be? How fast must it load? Must it be consistent across all users?

Stale-While-Revalidate (SWR) Strategy

This strategy optimizes for the lowest possible latency at the cost of slight freshness.

In an SWR flow (commonly used by CDNs), the system defines two windows of time for a cached item: its Fresh period (TTL) and its Stale period.

The logic:

A request arrives. If the data is within its Fresh TTL, it is served immediately.
If the data has exceeded its Fresh TTL but is still within its Stale Window, the system serves the stale copy immediately to the user (achieving ultra-low latency).
Simultaneously, the system fires an asynchronous background request to "revalidate" (refresh) the cache entry.

Real-world CDNs use this behavior frequently. The first user to hit an expired item at the edge gets the older version instantly, while the edge node updates its copy for the next user.

Cache Warming / Pre-Warming

All the strategies discussed so far are reactive: they react to user requests, hitting misses or near-misses. Cache Warming is proactive. You populate the cache before the users arrive.

This is critical before a planned traffic spike, such as a product launch (e.g., a new iPhone launch on an e-commerce site) or the absolute start of a major sale. Without pre-warming, the very first wave of users will hit a cold cache, causing an instant stampede.

Pre-warming usually involves running scripts to query critical, high-volume APIs and database keys (e.g., top 100 featured products, homepage layout) and injecting them into the cache layer just before the traffic event.

Summary: When to Use Which Strategy

There is no one-size-fits-all caching strategy. Your choice depends on the behavior you want your distributed system to exhibit under load.

Feature / Scenario	TTL Jitter	Mutex Locking	SWR	Cache Warming
Main Goal	Smooth Backend Spikes	Serialize Recomputation (Strict single fetch)	Low Latency (Serve instantly)	Proactive Readiness
Freshness	High	High	Acceptably Stale	High (Depends on warming time)
Tradeoff	Slight added staleness	Complexity of lock management	Complexity of async logic	Need to predict "hot" keys
Complexity	Low	High	Medium	Medium
Example Use Case	Product Catalog during normal traffic	Expensive Financial Reports	Trending News Articles (High throughput, lag acceptable)	New Product Launch (Scheduled Event)

Cache Strategies in Distributed Systems

1. Why Basic TTL Caching Is Not Enough

How Cache Expiry Causes Traffic Spikes

2. Advanced Strategies to Prevent Cache Stampedes

TTL Jitter – Adding Randomness to Expiration

Mutex / Cache Locking

Probability-Based Early Expiration (Probabilistic Early Re-computation)

3. Balancing Freshness, Latency, and Consistency

Stale-While-Revalidate (SWR) Strategy

Cache Warming / Pre-Warming

Summary: When to Use Which Strategy

Comments

System Design

Understanding the Thundering Herd Problem

More from this blog

Understanding the Thundering Herd Problem

Building a Persona Bot: Lessons from Simulating Hitesh Choudhary & Piyush Garg

Boost Your Coding Efficiency with My AI-Powered Terminal Assistant

Unlocking the AI Chakra: Become the Hokage of Tech

Command Palette

1. Why Basic TTL Caching Is Not Enough

How Cache Expiry Causes Traffic Spikes

2. Advanced Strategies to Prevent Cache Stampedes

TTL Jitter – Adding Randomness to Expiration

Mutex / Cache Locking

Probability-Based Early Expiration (Probabilistic Early Re-computation)

3. Balancing Freshness, Latency, and Consistency

Stale-While-Revalidate (SWR) Strategy

Cache Warming / Pre-Warming

Summary: When to Use Which Strategy

Comments

System Design

Understanding the Thundering Herd Problem

More from this blog