uptimeMonitoruptimeMonitor
Back to Blog
Monitoring

Monitoring Redis: Prevent Cache Failures That Cascade Into Outages

When Redis goes down, your database gets hammered, response times spike, and your entire application crumbles. Here's how to monitor Redis before it takes everything down.

UT
UptimeGuard Team
February 12, 20269 min read4,238 views
Share
rediscachedatabasemonitoringperformance

Monitoring Redis: Prevent Cache Failures That Cascade Into Outages

Redis is often the silent hero of your architecture. It caches database queries, stores sessions, manages rate limits, powers real-time features, and handles message queues. It works so well that teams forget it's there.

Until it fails. And when Redis fails, everything behind it fails too.

The Redis Cascade Effect

Here's what typically happens when Redis goes down:

  1. Cache misses spike — Every request that normally hits Redis now hits your database
  2. Database overloads — It's suddenly handling 10-100x more queries than normal
  3. Response times spike — 50ms responses become 5-second responses
  4. Connection pools exhaust — Database connections run out
  5. Application errors — Services start returning 500 errors
  6. Total outage — The application becomes unusable

The entire cascade can unfold in under 60 seconds.

What to Monitor

Connection Health

  • Port 6379 availability — Basic TCP check, catches complete crashes
  • Connected clients vs. maxclients — Alert at 80% to prevent connection refusal
  • Rejected connections — Any rejected connection is a problem
  • Connection rate — Sudden spikes might indicate a connection leak

Memory

  • Used memory vs. maxmemory — Alert at 85%
  • Memory fragmentation ratio — Should be close to 1.0; high values waste memory
  • Evicted keys — If keys are being evicted, you're at capacity
  • Expired keys — Normal, but sudden spikes might indicate unusual patterns

Performance

  • Command latency — P50, P95, P99 latency for key operations
  • Operations per second — Baseline and alerting on anomalies
  • Hit rate — keyspace_hits / (keyspace_hits + keyspace_misses). Below 90% usually means problems
  • Slow log entries — Commands taking longer than the slow-log threshold

Persistence

  • Last RDB save status — Was the last snapshot successful?
  • Last RDB save time — How long since the last successful save?
  • AOF status — Is the append-only file being written?
  • AOF rewrite status — Is the background rewrite completing?

Replication (If Using Replicas)

  • Connected slaves — Are all replicas connected?
  • Replication lag — How far behind are replicas?
  • Master link status — Is the replica connected to the master?

Setting Up Redis Monitoring

Layer 1: External Port Check

Monitor port 6379 (or your custom port) every 30 seconds. This catches complete Redis crashes immediately.

Layer 2: Application-Level Check

Create a health endpoint in your application that:

  1. Writes a key to Redis (SET health_check timestamp)
  2. Reads it back (GET health_check)
  3. Returns the round-trip time

Monitor this endpoint. It catches connectivity issues, authentication problems, and performance degradation.

Layer 3: Redis Metrics

Collect Redis INFO output and track key metrics. Most monitoring tools can parse Redis INFO directly.

Common Redis Failure Scenarios

Memory Exhaustion

Redis has a maxmemory setting. When reached, behavior depends on the eviction policy:

  • noeviction: Returns errors on write commands (safest but breaks things)
  • allkeys-lru: Evicts least recently used keys (data loss, but stays functional)

Monitor memory usage and alert well before maxmemory.

Slow Commands Blocking Everything

Redis is single-threaded. One slow command (like KEYS * on a large dataset) blocks ALL other commands. Monitor for slow log entries and alert on any command taking >100ms.

Persistence Fork Failure

Redis forks the process for RDB snapshots and AOF rewrites. On systems with limited memory, the fork can fail, stopping persistence silently.

Network Partition

If Redis becomes unreachable due to a network issue, your application might continue running but with degraded performance (falling back to database for every request).

The Minimum Redis Monitoring Setup

  1. Port monitor on Redis port (30-second interval)
  2. Application health check that reads/writes Redis (60-second interval)
  3. Memory usage alert at 85% of maxmemory
  4. Connected clients alert at 80% of maxclients
  5. Hit rate alert below 90%

Five monitors. The prevention of your most likely cascade failure.

Share
UT

Written by

UptimeGuard Team

Related articles