SLA / SLO Definition template thumbnail

OPERATIONS / DEVOPS / SRE TEMPLATE

SLA / SLO Definition Template

SLO targets, error budgets, alerting thresholds, SLA commitments, and escalation policies.

Use this template

What's inside

Field

Details

Service

Service or product name

Owner

Team / person responsible

Last Reviewed

Next Review

SLO Definitions

What are we promising to ourselves (SLO) and to customers (SLA)? SLOs should be tighter than SLAs — the SLO is your internal standard, the SLA is the contractual minimum.

Metric

SLO (Internal)

SLA (External)

Measurement

Window

Availability

99.95%

99.9%

Synthetic monitoring + real user requests

Rolling 30 days

Latency (p50)

< 100ms

N/A

API gateway metrics

Rolling 30 days

Latency (p99)

< 500ms

< 1000ms

API gateway metrics

Rolling 30 days

Error rate

< 0.1%

< 1%

5xx responses / total requests

Rolling 30 days

Error Budget

The error budget is the gap between 100% and your SLO. It's the amount of unreliability you can "spend" on shipping fast. When the budget runs out, you slow down and focus on reliability.

SLO

Budget (30 days)

Budget Remaining

Status

99.95% availability

21.6 minutes downtime

X minutes

Healthy

< 0.1% error rate

~43,200 errors per 30M requests

X errors

Healthy

Alerting

Alert

Condition

Burn Rate

Action

Slow burn

Budget consumption 2x normal rate

Budget exhausted in ~15 days

Investigate — ticket in current sprint

Fast burn

Budget consumption 10x normal rate

Budget exhausted in ~3 days

Page on-call — treat as incident

Budget exhausted

Error budget = 0

N/A

Deploy freeze until budget recovers or SLO is adjusted

SLA Commitments

If you have contractual SLAs with customers, document them here. SLAs without consequences are just marketing.

Tier / Plan

SLA

Credit / Penalty

Measurement

Enterprise

99.9% availability

10% credit per 0.1% below SLA

Monthly, measured by provider monitoring

Business

99.5%

N/A

Free

No SLA

N/A

Best effort

Delete this section if you don't have contractual SLAs.

Escalation

Condition

Action

Who

SLO at risk (burn rate alert)

Investigate root cause, consider deploy freeze

On-call engineer

SLO breached

Incident declared, postmortem required

Eng lead

SLA breached

Customer communication, credit processing

Eng lead + account manager

Other Ops templates