How to write a postmortem in one page
How to write a postmortem: the seven-section shape, the blameless rule, why timeline comes before root cause, and where the doc has to live to get read.
TL;DR. A postmortem is one short page about one incident, written within a week of the event, blameless, and structured the same way every time: summary, impact, timeline, root cause, action items, prevention, and appendix. Timeline goes before root cause — describe what happened in order, then explain why. The doc has to live in a wiki the next on-call can find on a Tuesday afternoon, or the lessons stay with the people who were in the room and nobody else.
Most postmortems read like a Norse saga written by someone who lost the saga. (Rocket, our VP of Engineering, signs off postmortems with three words: “Make it fast. Again.” That’s the entire approval process; the rest is up to the writer.) The good news about how to write a postmortem is that the shape has been stable since Google wrote down their version a decade ago and the world copied it. The bad news is that most teams still get the order wrong — they lead with root cause, bury the timeline, skip the action items, and lose the document to whichever folder structure was in fashion that quarter. The rest of this post is the working shape, the rule that keeps it blameless, and the part nobody on the SERP writes about: where the doc lives once you stop typing.
A postmortem is one page about one incident
A postmortem is a written artifact — not a meeting, not a thread — that documents one specific incident, what happened, why, and what’s being done about it. The shape borrows from medical practice (literally: “after death,” the investigation that explains how the patient ended up on the slab) but the audience is different — your team, plus whoever joins the team next quarter and inherits the service.
A postmortem succeeds when:
- A new on-call can read it cold, six months later, and understand both what went wrong and what’s been done about it.
- The action items have owners and dates, not vibes.
- The team disagrees about the root cause in the postmortem meeting — and the document captures that disagreement rather than papering over it.
- Reading the doc once is enough; no follow-up Slack thread is required to figure out what we decided.
Anything shorter is a status update; anything longer is a novel. One short page per incident. Some incidents need five hundred words; some need fifteen hundred; almost none need three thousand. Optimise for finished and readable, not thorough and unread.
Blameless means system-language, not person-language
The single biggest move that separates a useful postmortem from a counter-productive one is blamelessness, which sounds like a HR euphemism and is actually a rigorous discipline.
Blamelessness is how you phrase the writing, not whether you’re polite in the meeting:
- “Sarah pushed the bad config” — person-language. The document blames Sarah. The next time someone is about to push a config, they think about Sarah, hesitate, freeze, and don’t speak up about the unrelated risk they spotted.
- “The deploy pipeline accepted a config with no validation gate” — system-language. The document describes the system that allowed the bad outcome. The fix is “add a validation gate,” and Sarah is in the next meeting describing the gap she noticed but didn’t feel safe naming.
The test: read every sentence in the postmortem and ask “if this fact were untrue and a different person had done it, would the system have prevented the outcome?” If yes, the sentence is system-shaped. If no, it’s blame-shaped, and you rewrite it.
The Google SRE chapter on postmortem culture is the canonical primary source for this discipline; the PagerDuty postmortem docs expand it with operational scaffolding. Both are worth reading once, every couple of years, the way you’d re-read a runbook you wrote and forgot.
The seven sections every postmortem needs
A working postmortem is a single doc with seven sections, in this order. Same headings every time; the on-call doesn’t have to learn a new format under stress.
| Section | What it answers | Length |
|---|---|---|
| Summary | What happened, in 2–3 sentences a non-engineer can read | 30–60 words |
| Impact | Who was affected, for how long, with what numbers | A paragraph plus a fact-row |
| Timeline | What happened, in order, with timestamps | Half the doc |
| Root cause | Why it happened, with contributing factors | A paragraph or two |
| Action items | What’s being done, by whom, by when | A list with owners |
| Prevention | What systems-level change makes this class of incident less likely | A paragraph |
| Appendix | Logs, dashboards, screenshots, links | Whatever’s needed |
The Summary is what someone reading the doc title and the first paragraph remembers. “On 2026-04-22, the search index went read-only for 47 minutes due to a Postgres autovacuum storm; the API stayed up but search returned stale results. Mitigated by a manual reindex; full prevention shipped in PR-12345.” That’s a summary. “On Tuesday we had an incident” is a stub.
The Impact row is numbers your customer-success team would recognise. “47 minutes of stale search results, 3,200 affected sessions, 0 lost data, 12 customer support tickets opened.” If you don’t have those numbers, the postmortem is half-written.
Timeline first, root cause second
This is the order most templates get wrong. Industry boilerplate puts Root Cause near the top because that’s the section the postmortem reader is looking for. Putting it there is a trap — it forces the writer to commit to a cause before they’ve laid out the evidence, and it forces the reader to take the cause on faith before they’ve read what happened.
The right order is timeline first, root cause second. The timeline is factual: a chronological list of events with timestamps and sources, with no interpretation. “03:47 — pager fires; alert text was ‘p99 search latency > 2s for 5m.’ 03:51 — on-call confirms via dashboard. 03:54 — on-call checks deploy log; last deploy was 14:00 the previous day.” Just what, when, and where you saw it.
The root cause comes after the timeline because by then the reader can see how you got from the symptoms to the cause. “Root cause: a long-running autovacuum on the search-snapshot table held an exclusive lock that blocked the indexer’s write-path. Contributing factor: the autovacuum threshold was inherited from the default and was never tuned for this table’s churn pattern.” Two sentences.
Three things the timeline does that root cause alone can’t:
- It captures the detection delay — the gap between when the system started misbehaving and when a human noticed. That gap is often the most fixable thing in the incident.
- It captures the wrong-turn moments — when the on-call pursued a hypothesis that turned out to be wrong. Those moments are gold for the next person, who’ll reach for the same hypothesis and lose the same hour unless the postmortem warns them.
- It anchors the discussion in evidence. Discussion in the postmortem meeting that doesn’t reference the timeline is speculation; the timeline is the agreed shared reality.
The Cache Catastrophe — what canon calls our standing warning about wikis-and-crumbs — is the patron incident of this rule. The original postmortem put root cause first (“chip crumb in Enter key during late-night Redis maintenance”) and the timeline second; reading it cold made the cause look obvious and the real lesson — the two-meter snack-free zone wasn’t part of the runbook before the incident — got buried. Rewriting it timeline-first surfaced the policy gap as the actual fix. “Make it fast. Again,” signed Rocket.
Where the postmortem lives once it’s written
This is the part the long-form SERP guides skip. Atlassian, PagerDuty, and Google explain what to write and who attends the meeting; they almost never name where the finished doc actually goes between the day it’s published and the day a regulator (or a new hire) needs to read it. Here’s the lesson, after watching teams email PDFs of postmortems and lose them within a quarter: a postmortem in an inbox is a postmortem you don’t have.
Concrete moves that work:
- One short page per incident in your wiki. Title it Postmortem — <service> — <date> — <summary>. Searchable by service name, by date, by symptom. The Incident Postmortem template is the importable shape; copy it once and live with it.
- Linked from the incident response playbook. Every playbook ends in Learn — postmortem. The link from the playbook to the postmortem closes the loop; the link from the postmortem back to the playbook is what lets the next on-call land in the right doc.
- Pages have to load fast. Pages load in 50–150ms depending on your network on Raccoon Page; if the postmortem takes a second longer than the dashboard the on-call is staring at, the postmortem will lose to the dashboard. Sub-second loads, keyboard-first is the same practical bar as everywhere else — and uniquely uncomfortable when the postmortem itself is about a wiki that loaded slowly.
- Indexed in a single register. A page called Postmortems that lists every one in reverse-chronological order, with the summary visible. The list is the actual artifact of your reliability discipline; the individual docs are the receipts.
- Reviewed at the QBR. Not in a celebrate the saves way — in a what shape are these incidents taking way. The pattern across postmortems is more useful than any single one.
The postmortem is part of the same operating-discipline shape as a team charter, an SLO, and a service runbook — the doc lives in the wiki, the wiki is reachable in one keystroke, and the team’s running operating system uses it or it isn’t a system.
When you don’t need a formal postmortem
Three signs the formal postmortem is the wrong artifact for your situation:
- The incident never affected a user. A failed deploy rolled back automatically before traffic hit it; an alert fired and self-healed in two minutes. A short note in Slack and a one-line action item is fine. The postmortem earns its keep when the incident taught you something.
- You’re a team of two on a side project. Two engineers on a side project sharing a Slack DM are the postmortem. The doc earns its keep when the team is large enough that the on-call wasn’t always the same person and the reviewer wasn’t always in the meeting.
- It’s the same incident as last week. If you’re writing the same postmortem twice, you have a system problem, not an incident problem; the second postmortem isn’t the artifact — the action item from the first one is, and it didn’t ship.
Above that bar, the formal postmortem earns its keep. Below it, you’re writing postmortem theatre. Pick the cheapest plan that fits the job — same logic, by the way, applies to wikis: our Free tier — three users, one space, a hundred pages, no card — is the right home for the first postmortem a small team writes; if and when your team grows into a regular rotation, the Team tier at $8/user/month is the honest math.
A note on Raccoon Page itself: we are a wiki, not an incident-management platform. For real-time chat during the incident, real-time co-editing of the timeline, and the PagerDuty-Statuspage-Slack mesh that runs the response, you’ll need other tools. The wiki is where the finished postmortem lives — searchable, linkable, durable.
Things people actually ask
What is a postmortem, in one sentence? A short written document that explains what happened during a specific incident, why it happened, what the impact was, and what’s being done so that class of incident is less likely next time — written within a week of the incident, blameless, and structured the same way every time.
What does blameless actually mean? That the document uses system-language — describes the processes and tools that produced the outcome — instead of person-language — describing what an individual did or didn’t do. The test: replace any name in the doc with “someone” and see if the sentence still tells you the same useful thing. If yes, blameless. If no, rewrite it.
How long should a postmortem be? One short page per incident. Most useful postmortems are 500–1,500 words; almost none usefully exceed 3,000. The length should match the complexity of the incident, not the seniority of the writer.
When should we write a postmortem? Within a week of the incident. The freshness of memory and the willingness of the team to engage decay together. Two weeks out, half the team can’t remember the order things happened in. After that, every postmortem reads like historical fiction.
What’s the difference between a postmortem and a retrospective? A postmortem is keyed to a specific incident — “the search outage of 2026-04-22.” A retrospective is keyed to a period — “the last sprint.” Different artifacts, different shapes. Don’t conflate them; the postmortem describes one event, the retrospective describes a stretch of work.
Who should attend the postmortem meeting? The on-call who responded, the engineer (or engineers) who own the affected system, an incident commander or scribe to keep the discussion structured, and a representative from any team that was affected (often: support, product). Not ten people. Five at most.
How do action items get followed through? Each action item has a named owner, a due date, and a linked ticket in your tracker. The postmortem doc is not the tracker; it links to the tracker. The reliability win is in closing the action items, which is a separate discipline from writing the postmortem.
What’s a root cause vs a contributing factor? The root cause is the proximate technical or process gap that produced the outcome. Contributing factors are the conditions that made the root cause more likely or harder to detect. A fatigue-plus-crumbs incident has a root cause (missing keyboard cover on a live terminal) and contributing factors (overnight on-call alone, no two-meter snack-free zone). Both belong in the doc.
Should every postmortem be public? Inside your company: yes — visibility is half of the reliability dividend. Outside your company: optional, depending on what the incident touched. Public postmortems are excellent marketing for trustworthy companies and excellent self-harm for sloppy ones; pick deliberately.
If your last postmortem is currently a Confluence page nobody links to and a Loom recording from the meeting nobody watches, the upgrade isn’t a different page — it’s a single short doc with the seven headings, in a wiki the team can search. Try the Free tier on the postmortem of your next small incident; even a near-miss will teach the team more than the formal one for the big incident did. If the next 04:00 page can’t find it, write to us; we want to know which folder it ended up in.
Written by The Editorial Raccoon — house style for Raccoon Page. Numbers and claims pulled from product reality; jokes pulled from the Raccoon Corp canon. No raccoons were quoted in real life.