SOFTWARE ENGINEERING TEMPLATE
Solution Architecture Template
End-to-end solution architecture covering business case, system design, data model, security, integration, and implementation strategy.
Use this templateWhat's inside
Field | Details |
|---|---|
Solution | Name |
Status | Draft |
Author(s) | Names |
Reviewers | Names |
Last Updated | |
Target Date |
Summary
What are we building, why, and what does the end state look like? Keep it to a few sentences — enough for someone to decide whether to keep reading.
Context
What problem are we solving? How do things work today and why is that insufficient? What happens if we do nothing?
If this replaces an existing system, describe what works well today — not just the flaws. New systems have a habit of accidentally dropping capabilities users depend on.
Requirements
What must this solution do and what constraints must it operate within?
Functional
ID | Requirement | Priority |
|---|---|---|
R1 | What the system must do | Must Have |
R2 | Must Have | |
R3 | Should Have |
Non-Functional
Include the NFRs that actually drive architectural decisions. Skip the ones that don't — not every system needs five-nines availability.
Concern | Target | Architectural Implication |
|---|---|---|
Latency | p99 < Xms | What this target demands (caching, read replicas, etc.) |
Throughput | X req/s | |
Availability | XX.X% | |
Recovery (RTO/RPO) | RTO: Xh / RPO: Xm |
Constraints
Non-negotiable limitation (budget, timeline, team size, technology mandate, compliance)
Existing system that cannot be changed and must be integrated with
Assumption that, if wrong, would change the design
Solution Design
The architecture at a glance. Include or link to a diagram showing major components, data flow, and system boundaries.
Decision | Choice | Rationale |
|---|---|---|
Architecture style | Monolith / Microservices / Serverless / Event-driven | Why this fits the problem and team |
Communication | Sync (HTTP/gRPC) / Async (queue) / Events | |
Primary data store | PostgreSQL / DynamoDB / etc. | |
Deployment | Single region / Multi-region / Edge |
Components
Describe each major component: what it does, what it talks to, how it fails.
Component | Responsibility | Tech | Scaling | Failure Mode |
|---|---|---|---|---|
Component name | What it does | Language / framework | How it scales | What breaks if this goes down |
Data Architecture
How data is modeled, stored, and moved. Data outlives code — get this right.
-- Core entities and relationships
-- Adapt to your data store
CREATE TABLE resources (
id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Data Type | Store | Retention | Rationale |
|---|---|---|---|
Transactional | PostgreSQL | Indefinite | ACID, relational integrity |
Cache | Redis | TTL | Low-latency reads |
Files | S3 / GCS | Indefinite | Object storage |
If migrating from an existing system: describe the migration approach (dual-write, batch, strangler fig), validation strategy, and rollback plan.
Integrations
How this solution connects to other systems. Integration points are where architectural plans meet reality.
System | Direction | Protocol | Auth | Failure Handling |
|---|---|---|---|---|
External system | In / Out / Both | REST / gRPC / Queue | API key / OAuth / mTLS | Retry + circuit breaker / fail-open / dead-letter |
Security
Area | Approach |
|---|---|
Authentication | How users and services prove identity |
Authorization | Permission model (RBAC, ABAC, policies) |
Encryption | At rest and in transit — algorithms, key management |
Secrets | How credentials are stored and rotated |
Input validation | Validation boundaries, sanitization |
Audit logging | What is logged, retention, tamper-proofing |
Compliance | Applicable standards (GDPR, SOC 2, HIPAA) — delete if none |
Observability
Layer | Tool | What to Watch |
|---|---|---|
Metrics | Prometheus / Datadog / CloudWatch | Request rate, error rate, latency, saturation |
Logging | Structured JSON / ELK | Correlation IDs, error context |
Tracing | OpenTelemetry / Jaeger | Cross-service request flow |
Alerting | PagerDuty / Opsgenie | Thresholds, escalation, runbook links |
Infrastructure
Resource | Spec | Scaling | Est. Monthly Cost |
|---|---|---|---|
Compute | Instance type / container config | Auto-scale rules | $X,XXX |
Database | Engine, size, replicas | Connection pooling, read replicas | $X,XXX |
Cache / Queue / CDN | Configuration | $XXX | |
Total | $X,XXX |
Alternatives Considered
What other approaches were evaluated and why they were rejected. This saves future engineers from re-evaluating the same options.
Alternative: [Name]
Pros | Cons |
|---|---|
Why it was rejected | |
Duplicate this subsection for each alternative considered.
Repositories & Skills
Repository | Purpose | Stack | Owner |
|---|---|---|---|
repo-name | What it contains | Language / framework | Team |
Skills needed that the team may not have today:
Skill | Phase | Available? | Plan |
|---|---|---|---|
Skill area | Build / Deploy / Operate | Yes / No | Hire / train / contract |
Estimates & Plan
Work Package | Phase | Estimate | Dependencies | Owner |
|---|---|---|---|---|
Schema & data model | 1 | X days | None | |
Core service / API | 1 | X days | Schema | |
Integrations | 2 | X days | Core service | |
Frontend | 2 | X days | API | |
Testing & hardening | 3 | X days | All above | |
Deploy & migrate | 3 | X days | All above |
Rollout
Phases
Phase | Scope | Exit Criteria | Target |
|---|---|---|---|
1: Foundation | Core infrastructure and data model | First flow working in staging | |
2: Build | Features, integrations, frontend | Core workflow end-to-end | |
3: Harden | Testing, security, monitoring | Load test + security review pass | |
4: Ship | Migration, traffic cutover | 100% on new system |
Rollout Strategy
Dimension | Approach |
|---|---|
Traffic migration | Feature flag / percentage rollout / blue-green / canary |
Rollback trigger | Error rate > X% / latency > Xms / data inconsistency |
Rollback mechanism | Feature flag off / redeploy previous version / DNS switch |
Parallel running | Duration and how traffic splits between old and new |
Decommission | When and how the old system is shut down |
Risks
Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
Data migration inconsistency | Medium | High | Dual-write with reconciliation before cutover |
Performance regression at scale | Medium | High | Load test at 2x before each rollout stage |
Low | Medium |
Open Questions
Question | Owner | Status |
|---|---|---|
Unresolved question | Name | Open |
Open |
References
Related ADRs, product briefs, or feature specs
External resources: vendor docs, architecture examples, relevant papers
Previous architecture document (if this evolves an existing system)
Other Engineering templates
-
Project READMEDocument a project's purpose, setup instructions, architecture, and contribution guidelines. -
API DocumentationProtocol-agnostic API documentation covering contract, authentication, errors, reliability, versioning, and operations. -
Architecture Decision Record (ADR)Structured record of an architecture decision: context, options evaluated, decision rationale, and consequences.