Scalability

The ability to handle growing load by adding resources. Measure load with explicit load parameters (req/s, read:write ratio, payload size, fan-out) and success with percentiles (p95/p99), never averages — tail latency is what users feel.

Vertical vs Horizontal

  • Vertical (scale up) — bigger machine. Simple, but a ceiling and a single point of failure.
  • Horizontal (scale out) — more machines. Near-unlimited, but forces you to handle statelessness, coordination, and consistency.

Make services stateless

Push session/state to a shared store (Redis, DynamoDB) so any node can serve any request. Statelessness is the precondition for horizontal scaling and easy autoscaling.

Scaling Reads

  • Caching — first and cheapest lever (CDN, app cache, DB query cache).
  • Read replicas — offload reads from the primary (mind replication lag).
  • CDN / edge — serve static and cacheable content close to users.

Scaling Writes

  • Partitioning / sharding — split data by key (hash or range). Watch for hot keys / skew.
  • Async & queues — absorb spikes; do expensive work out-of-band via Messaging & Event-Driven Architecture.
  • Write-optimized stores — LSM-tree engines, append-only logs.

Patterns & Tactics

  • Load balancing — distribute across nodes (L4/L7); health checks + autoscaling.
  • Back-pressure & rate limiting — protect the system from itself.
  • Bulkheads & circuit breakers — isolate failures so one slow dependency doesn’t sink everything.
  • Idempotency — retries are inevitable in distributed systems; make operations safe to repeat.

Watch For

  • Amdahl’s / Universal Scalability Law — coordination and contention cap real-world speedup; throughput can decrease past a point.
  • The N+1 and chatty-call problems across service boundaries.
  • Stateful bottlenecks hiding behind “stateless” services (the shared DB).

See Also

Large Scale Systems · CAP Theorem & Consistency · Caching · DDIA Notes