Monitoring Docker Containers: What Breaks and How to Catch It

Containers are supposed to make operations simpler. And they do — until they don't. The abstraction that makes containers powerful also makes them harder to monitor. Problems that would be obvious on a bare-metal server can be invisible inside a container.

How Containers Fail Differently

Silent Restarts

When a container crashes, your orchestrator (Docker Compose, Kubernetes, ECS) automatically restarts it. From the outside, the service might appear healthy — it's running, right? But each restart means a brief outage, lost in-flight requests, and potentially lost state.

OOMKills

Containers have memory limits. When they exceed those limits, the kernel kills the process — no graceful shutdown, no error logging, no warning. The orchestrator restarts it and the cycle continues.

Zombie Containers

The container is running. The process inside is alive. But it's not actually doing anything useful — stuck in a deadlock, waiting on a resource that will never arrive, or caught in an infinite loop that consumes CPU but produces no results.

Image Pull Failures

A new deployment requires pulling an updated image. If the registry is down or the image tag doesn't exist, the container can't start. The old container might keep running (good) or might have been stopped first (bad).

Networking Issues

Containers communicate over virtual networks. DNS resolution between containers can fail. Network policies can block traffic. Load balancer health checks might pass even when the application is broken.

What to Monitor

Container-Level Metrics

Restart count — A container that keeps restarting has a problem, even if it's "running"
Memory usage vs. limit — How close are you to an OOMKill?
CPU usage — Sustained high CPU might indicate a loop or inefficiency
Network I/O — Sudden drops might indicate connectivity issues

Application-Level Health

HTTP health checks — Not just "is the port open" but "does the app respond correctly"
Readiness checks — Can the container actually serve traffic? (Database connected, cache warmed, etc.)
Custom health endpoints — Return detailed status including dependency health

Orchestrator-Level Monitoring

Pod/task status — Running, pending, failed, evicted
Deployment rollout status — Is the deployment stuck?
Resource pressure — Are nodes running out of CPU, memory, or disk?
Scheduling failures — Can new containers be placed?

Practical Monitoring Setup

Layer 1: External HTTP Monitoring

Monitor your containerized services from outside your cluster:

HTTP checks on exposed endpoints
Keyword validation to confirm correct responses
Response time tracking

This is your user-perspective view. If this is green, users are happy regardless of what's happening internally.

Layer 2: Container Health Checks

Configure proper health checks in your container definitions:

Docker Compose:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Layer 3: Log Monitoring

Containerized applications should log to stdout/stderr. Aggregate these logs and monitor for:

Error rate spikes
OOMKill messages
Connection refused errors
Timeout patterns

Layer 4: Restart Monitoring

Track container restart counts. Alert when:

Any container restarts more than 3 times in 10 minutes
Total cluster restarts exceed normal baseline

Common Pitfalls

Health Checks That Lie

A health check that only returns 200 without actually testing the application is worse than no health check — it provides false confidence. Test real functionality: database connectivity, cache access, critical dependencies.

Monitoring Only the Load Balancer

The load balancer says all backends are healthy. But it's only checking TCP port availability, not application health. Add HTTP-level health checks with content validation.

Ignoring Resource Limits

Running without memory/CPU limits means one runaway container can starve others. Set limits AND monitor usage relative to those limits.

The Minimum Container Monitoring Setup

External HTTP monitoring on all exposed services (30-second intervals)
Proper health checks in every container definition
Restart count monitoring with alerts
Memory usage monitoring (alert at 85% of limit)
Log aggregation with error rate tracking

Containers make deployment easy. Don't let them make monitoring hard.

Monitoring Docker Containers: What Breaks and How to Catch It

Monitoring Docker Containers: What Breaks and How to Catch It

How Containers Fail Differently

Silent Restarts

OOMKills

Zombie Containers

Image Pull Failures

Networking Issues

What to Monitor

Container-Level Metrics

Application-Level Health

Orchestrator-Level Monitoring

Practical Monitoring Setup

Layer 1: External HTTP Monitoring

Layer 2: Container Health Checks

Layer 3: Log Monitoring

Layer 4: Restart Monitoring

Common Pitfalls

Health Checks That Lie

Monitoring Only the Load Balancer

Ignoring Resource Limits

The Minimum Container Monitoring Setup

Related articles

Uptime Monitoring vs Observability: Do You Need Both?

Cron Job Monitoring: How to Know When Your Scheduled Tasks Fail

Monitoring Stripe, PayPal, and Payment Gateways: Protect Your Revenue