Monitoring Redis: Prevent Cache Failures That Cascade Into Outages

Redis is often the silent hero of your architecture. It caches database queries, stores sessions, manages rate limits, powers real-time features, and handles message queues. It works so well that teams forget it's there.

Until it fails. And when Redis fails, everything behind it fails too.

The Redis Cascade Effect

Here's what typically happens when Redis goes down:

Cache misses spike — Every request that normally hits Redis now hits your database
Database overloads — It's suddenly handling 10-100x more queries than normal
Response times spike — 50ms responses become 5-second responses
Connection pools exhaust — Database connections run out
Application errors — Services start returning 500 errors
Total outage — The application becomes unusable

The entire cascade can unfold in under 60 seconds.

What to Monitor

Connection Health

Port 6379 availability — Basic TCP check, catches complete crashes
Connected clients vs. maxclients — Alert at 80% to prevent connection refusal
Rejected connections — Any rejected connection is a problem
Connection rate — Sudden spikes might indicate a connection leak

Memory

Used memory vs. maxmemory — Alert at 85%
Memory fragmentation ratio — Should be close to 1.0; high values waste memory
Evicted keys — If keys are being evicted, you're at capacity
Expired keys — Normal, but sudden spikes might indicate unusual patterns

Performance

Command latency — P50, P95, P99 latency for key operations
Operations per second — Baseline and alerting on anomalies
Hit rate — keyspace_hits / (keyspace_hits + keyspace_misses). Below 90% usually means problems
Slow log entries — Commands taking longer than the slow-log threshold

Persistence

Last RDB save status — Was the last snapshot successful?
Last RDB save time — How long since the last successful save?
AOF status — Is the append-only file being written?
AOF rewrite status — Is the background rewrite completing?

Replication (If Using Replicas)

Connected slaves — Are all replicas connected?
Replication lag — How far behind are replicas?
Master link status — Is the replica connected to the master?

Setting Up Redis Monitoring

Layer 1: External Port Check

Monitor port 6379 (or your custom port) every 30 seconds. This catches complete Redis crashes immediately.

Layer 2: Application-Level Check

Create a health endpoint in your application that:

Writes a key to Redis (SET health_check timestamp)
Reads it back (GET health_check)
Returns the round-trip time

Monitor this endpoint. It catches connectivity issues, authentication problems, and performance degradation.

Layer 3: Redis Metrics

Collect Redis INFO output and track key metrics. Most monitoring tools can parse Redis INFO directly.

Common Redis Failure Scenarios

Memory Exhaustion

Redis has a maxmemory setting. When reached, behavior depends on the eviction policy:

noeviction: Returns errors on write commands (safest but breaks things)
allkeys-lru: Evicts least recently used keys (data loss, but stays functional)

Monitor memory usage and alert well before maxmemory.

Slow Commands Blocking Everything

Redis is single-threaded. One slow command (like KEYS * on a large dataset) blocks ALL other commands. Monitor for slow log entries and alert on any command taking >100ms.

Persistence Fork Failure

Redis forks the process for RDB snapshots and AOF rewrites. On systems with limited memory, the fork can fail, stopping persistence silently.

Network Partition

If Redis becomes unreachable due to a network issue, your application might continue running but with degraded performance (falling back to database for every request).

The Minimum Redis Monitoring Setup

Port monitor on Redis port (30-second interval)
Application health check that reads/writes Redis (60-second interval)
Memory usage alert at 85% of maxmemory
Connected clients alert at 80% of maxclients
Hit rate alert below 90%

Five monitors. The prevention of your most likely cascade failure.

Monitoring Redis: Prevent Cache Failures That Cascade Into Outages

Monitoring Redis: Prevent Cache Failures That Cascade Into Outages

The Redis Cascade Effect

What to Monitor

Connection Health

Memory

Performance

Persistence

Replication (If Using Replicas)

Setting Up Redis Monitoring

Layer 1: External Port Check

Layer 2: Application-Level Check

Layer 3: Redis Metrics

Common Redis Failure Scenarios

Memory Exhaustion

Slow Commands Blocking Everything

Persistence Fork Failure

Network Partition

The Minimum Redis Monitoring Setup

Related articles

Uptime Monitoring vs Observability: Do You Need Both?

Cron Job Monitoring: How to Know When Your Scheduled Tasks Fail

Monitoring Stripe, PayPal, and Payment Gateways: Protect Your Revenue