OPERATIONS / DEVOPS / SRE TEMPLATE
Disaster Recovery Plan Template
A disaster recovery plan template on one wiki page: what you protect, recovery targets, the runbook, and who is called. Copy it in and keep it current.
TL;DR. A disaster recovery plan is targets, a runbook, and a contact list — and it has been tested. Copy the body of this page into a wiki page, set RTO and RPO with the business, write the runbook so a tired engineer can follow it, and schedule a drill.
A disaster recovery plan earns its keep at 3am, when the person reading it is stressed and the systems are down. Write for that reader: numbered steps, named commands, no judgement calls. The single most common failure is a plan that was written once and never tested — by the time it is needed, the systems have changed and the steps no longer work.
What a disaster recovery plan includes
- Scope. The system this plan covers and what depends on it.
- Targets. RTO (how fast back) and RPO (how much data loss is tolerable).
- The runbook. Numbered recovery steps, with commands.
- Contacts. Who is called, who declares the incident, who talks to customers.
- Test record. When it was last drilled and what broke.
How to use this template
- Copy the body below into a new wiki page — one per system.
- Set RTO and RPO with the business, not just engineering.
- Write the runbook as steps a tired on-call engineer can follow.
- List who is called and how.
- Schedule a drill and record the result.
The template — copy from here
Summary
- System:
<name>— Owner:<team / role> - RTO:
<target time to restore>— RPO:<tolerable data loss> - Last tested:
<date>— Next test:<date>
What this protects
<The system, the data it holds, and what else breaks if it is down.>
Recovery runbook
<Detect and declare — how the failure is confirmed and who declares the incident.><First action — the command or console step, named exactly.><Restore data — from which backup, with the command.><Verify — how you confirm service is actually back.><Stand down — who calls the all-clear.>
Contacts
| Role | Name | Contact path |
|---|---|---|
| Incident commander | <name> | <path> |
| On-call engineer | <name> | <path> |
| Customer comms | <name> | <path> |
Dependencies and backups
- Backups:
<where, how often, retention.> - Depends on:
<upstream systems, vendors.>
Test record
| Date | Scenario | What broke | Fixed |
|---|---|---|---|
<date> | <drill> | <finding> | <yes / link> |
Common questions
What should it include? Scope, targets, a runbook, contacts, and a test record.
Disaster recovery versus business continuity? DR restores systems; business continuity keeps the business running while that happens. A combined plan covers both.
How often should it be tested? At least annually, and after any major architecture change.
Keep the plan in a wiki the on-call team can reach fast and search under pressure, with version history showing what changed after each drill. For a section-by-section walkthrough of a filled-in plan, see the disaster recovery plan example. Pair this with the Service Runbook and the Incident Postmortem, or browse the full template library.