uptimeMonitoruptimeMonitor
Back to Blog
Guides

The Complete Glossary of Uptime Monitoring Terms

MTTR, SLO, P99, error budget, synthetic checks — uptime monitoring has its own vocabulary. This glossary defines every term you will encounter.

UT
UptimeGuard Team
July 25, 20258 min read9,131 views
Share
glossaryterminologybeginnerreferencemonitoring

The Complete Glossary of Uptime Monitoring Terms

Uptime monitoring comes with its own vocabulary. Whether you are new to monitoring or just need a quick reference, this glossary covers every term you will encounter.

A-D

Alert Channel — The notification method used to deliver monitoring alerts (Slack, SMS, email, PagerDuty, webhooks).

Alert Fatigue — The phenomenon where too many alerts cause teams to ignore or miss important notifications.

Availability — The percentage of time a service is operational and accessible. Usually expressed as a percentage (99.9%).

Check Interval — How frequently a monitoring service tests your endpoint. Common intervals range from 30 seconds to 5 minutes.

Cold Start — The initial delay when a serverless function or container starts up after being idle.

Circuit Breaker — A design pattern that stops calling a failing service to prevent cascading failures.

Downtime — Any period when a service is unavailable to users.

E-I

Error Budget — The amount of allowable unreliability defined by your SLO. A 99.9% SLO gives you 0.1% error budget (43 minutes per month).

Escalation Policy — Rules defining who gets notified and when if the primary on-call does not respond.

False Positive — A monitoring alert that fires when no actual problem exists.

Graceful Degradation — Designing systems to maintain partial functionality when components fail.

Heartbeat Monitoring — A check that expects regular pings from your application. If the ping stops, an alert fires. Used for cron jobs and background tasks.

HTTP Monitoring — Checking a URL by sending an HTTP request and verifying the response status code and content.

Incident — An event that causes service degradation or outage requiring response.

K-O

Keyword Monitoring — Checking that a web page contains (or does not contain) specific text. Catches content-level failures that status codes miss.

Latency — The time delay between a request and its response. Usually measured in milliseconds.

MTBF (Mean Time Between Failures) — Average time between incidents.

MTTD (Mean Time to Detect) — Average time from an incident occurring to it being detected by monitoring.

MTTR (Mean Time to Recovery) — Average time from incident detection to full service restoration.

Multi-Region Monitoring — Checking services from multiple geographic locations to detect region-specific outages.

On-Call — A rotation where team members are designated to respond to incidents outside business hours.

P-S

P50/P95/P99 — Percentile response times. P95 means 95% of requests are faster than this value.

Ping Monitoring — Sending ICMP ping packets to check if a host is reachable.

Port Monitoring — Checking if a specific TCP port is open and accepting connections.

Post-Mortem — A blameless review conducted after an incident to identify root causes and preventive actions.

SLA (Service Level Agreement) — A contractual commitment to maintain a certain level of service availability.

SLI (Service Level Indicator) — The actual measured metric used to evaluate service health.

SLO (Service Level Objective) — An internal target for service reliability, stricter than the SLA.

SSL Monitoring — Tracking SSL/TLS certificate validity and expiration dates.

Status Page — A public page displaying the current operational status of your services.

Synthetic Monitoring — Automated tests that simulate user interactions from external locations.

T-Z

TTFB (Time to First Byte) — Time from the request being sent to the first byte of the response being received.

Uptime — The percentage of time a service is available. The inverse of downtime.

Webhook — An HTTP callback triggered by an event. Used for real-time notifications between services.

Bookmark this page. You will come back to it.

Share
UT

Written by

UptimeGuard Team

Related articles