Scalability

The ability to handle growing load by adding resources. Measure load with explicit load parameters (req/s, read:write ratio, payload size, fan-out) and success with percentiles (p95/p99), never averages — tail latency is what users feel.

Vertical vs Horizontal

Vertical (scale up) — bigger machine. Simple, but a ceiling and a single point of failure.
Horizontal (scale out) — more machines. Near-unlimited, but forces you to handle statelessness, coordination, and consistency.

Make services stateless

Push session/state to a shared store (Redis, DynamoDB) so any node can serve any request. Statelessness is the precondition for horizontal scaling and easy autoscaling.

Scaling Reads

Caching — first and cheapest lever (CDN, app cache, DB query cache).
Read replicas — offload reads from the primary (mind replication lag).
CDN / edge — serve static and cacheable content close to users.

Scaling Writes

Partitioning / sharding — split data by key (hash or range). Watch for hot keys / skew.
Async & queues — absorb spikes; do expensive work out-of-band via Messaging & Event-Driven Architecture.
Write-optimized stores — LSM-tree engines, append-only logs.

Patterns & Tactics

Load balancing — distribute across nodes (L4/L7); health checks + autoscaling.
Back-pressure & rate limiting — protect the system from itself.
Bulkheads & circuit breakers — isolate failures so one slow dependency doesn’t sink everything.
Idempotency — retries are inevitable in distributed systems; make operations safe to repeat.

Watch For

Amdahl’s / Universal Scalability Law — coordination and contention cap real-world speedup; throughput can decrease past a point.
The N+1 and chatty-call problems across service boundaries.
Stateful bottlenecks hiding behind “stateless” services (the shared DB).

Engineering Knowledge Base

Explorer

Scalability

Scalability

Vertical vs Horizontal

Scaling Reads

Scaling Writes

Patterns & Tactics

Watch For

See Also

Graph View

Table of Contents

Backlinks