System Design Mastery — Visual Guide to Distributed Systems

📐 Foundation

System Design Fundamentals

Core concepts every engineer must understand before designing large-scale systems.

📈

Scalability

How systems handle growing amounts of work by adding resources.

┌─────────────────────────────────────────┐
│         VERTICAL SCALING (Scale Up)      │
│  ┌──────┐     ┌──────────────┐          │
│  │ 4 CPU│ ──→ │  32 CPU      │          │
│  │ 8 GB │     │  256 GB RAM  │          │
│  └──────┘     └──────────────┘          │
│  Simpler but has hardware limits        │
├─────────────────────────────────────────┤
│        HORIZONTAL SCALING (Scale Out)    │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐  │
│  │Server│ │Server│ │Server│ │Server│   │
│  │  1   │ │  2   │ │  3   │ │  N   │   │
│  └──────┘ └──────┘ └──────┘ └──────┘   │
│  Complex but virtually unlimited        │
└─────────────────────────────────────────┘

Key Concepts

Vertical Scaling: Add more power (CPU, RAM) to existing machine. Simple but limited.
Horizontal Scaling: Add more machines. Complex (need load balancing, data sync) but unlimited.
Elasticity: Auto-scale based on demand (AWS Auto Scaling, K8s HPA).
Rule of thumb: Design for 10x current load, plan for 100x.

Core

⏱️

Latency vs Throughput

Understanding time-per-request vs requests-per-second trade-offs.

┌─────────────────────────────────────────┐
│          LATENCY NUMBERS (2024)          │
├─────────────────────────────────────────┤
│  L1 cache ref .............. 1 ns       │
│  L2 cache ref .............. 4 ns       │
│  Main memory ref .......... 100 ns      │
│  SSD random read .......... 16 μs       │
│  HDD random read .......... 2 ms        │
│  Same datacenter RTT ...... 0.5 ms      │
│  Cross-continent RTT ...... 150 ms      │
├─────────────────────────────────────────┤
│  THROUGHPUT = Requests / Second          │
│                                          │
│  Single server:    ~1K-10K RPS          │
│  With caching:     ~50K-100K RPS        │
│  CDN edge:         ~1M+ RPS            │
└─────────────────────────────────────────┘

Key Concepts

Latency: Time to complete one operation. Measured in ms/μs.
Throughput: Number of operations per unit time (QPS/RPS).
P99 Latency: 99th percentile — worst 1% of requests. More important than average.
Trade-off: Batching increases throughput but adds latency.

Core

🔺

CAP Theorem

In a distributed system, you can only guarantee two of three properties.

            Consistency (C)
                 ╱╲
                ╱  ╲
               ╱    ╲
         CP   ╱  Pick ╲   CA
             ╱   Two!   ╲
            ╱            ╲
           ╱──────────────╲
   Availability (A) ── Partition
                        Tolerance (P)
                   AP

┌──────────┬──────────────────────────┐
│    CP    │ MongoDB, HBase, Redis    │
│          │ Consistent but may be    │
│          │ unavailable during split │
├──────────┼──────────────────────────┤
│    AP    │ Cassandra, DynamoDB,     │
│          │ CouchDB — Always         │
│          │ available, eventually    │
│          │ consistent               │
├──────────┼──────────────────────────┤
│    CA    │ Traditional RDBMS        │
│          │ (single node only)       │
└──────────┴──────────────────────────┘

Key Concepts

Consistency: Every read receives the most recent write.
Availability: Every request receives a response (no errors).
Partition Tolerance: System works despite network failures between nodes.
Reality: Network partitions DO happen, so you really choose between CP and AP.

Core

🔄

Consistency Patterns

Strong, eventual, and causal consistency — when to use which.

STRONG CONSISTENCY (Linearizable):
  Client ──write(x=5)──→ DB ──→ All replicas
  Client ──read(x)────→ DB ──→ Always returns 5
  ✅ Banking, inventory  ⚠️ High latency

EVENTUAL CONSISTENCY:
  Client ──write(x=5)──→ Primary
  Client ──read(x)────→ Replica ──→ May return old value
  Eventually all replicas converge
  ✅ Social feeds, DNS  ⚡ Low latency

CAUSAL CONSISTENCY:
  If A causes B, everyone sees A before B
  But concurrent events can be in any order
  ✅ Chat apps, collaborative editing

When to Use

Strong: Financial transactions, inventory counts, booking systems.
Eventual: Social media feeds, analytics, DNS updates.
Causal: Chat messages, document collaboration.
Read-your-writes: User always sees their own updates immediately.

Core

🧮

Back-of-Envelope Estimation

Quick calculations to estimate system capacity and requirements.

┌─────────────────────────────────────────┐
│         POWER OF TWO CHEAT SHEET        │
├─────────────────────────────────────────┤
│  2^10 = 1 Thousand (1 KB)              │
│  2^20 = 1 Million   (1 MB)              │
│  2^30 = 1 Billion   (1 GB)              │
│  2^40 = 1 Trillion  (1 TB)              │
├─────────────────────────────────────────┤
│         TIME CONVERSIONS                │
│  1 day  = 86,400 sec ≈ 100K sec        │
│  1 month = 2.6M sec  ≈ 2.5M sec        │
│  1 year  = 31.5M sec ≈ 30M sec         │
├─────────────────────────────────────────┤
│         COMMON ESTIMATES                │
│  Daily active users → QPS:              │
│  QPS = DAU × queries/day / 86400       │
│  Peak QPS = QPS × 2-3                  │
│                                          │
│  Storage:                               │
│  = users × data_per_user × retention   │
│                                          │
│  Bandwidth:                             │
│  = QPS × avg_request_size              │
└─────────────────────────────────────────┘

Estimation Framework

Step 1: Estimate DAU from total users (typically 10-30% DAU ratio).
Step 2: QPS = DAU × avg actions / 86,400 seconds.
Step 3: Storage = QPS × data per request × retention period.
Step 4: Peak QPS = Average QPS × 2-3x for burst traffic.

Interview

🛡️

Availability & Reliability

Nines of availability and how to achieve high uptime.

┌──────────┬────────────┬───────────────┐
│  Level   │  Uptime %  │  Downtime/yr  │
├──────────┼────────────┼───────────────┤
│ 2 nines  │  99%       │  3.65 days    │
│ 3 nines  │  99.9%     │  8.77 hours   │
│ 4 nines  │  99.99%    │  52.6 min     │
│ 5 nines  │  99.999%   │  5.26 min     │
└──────────┴────────────┴───────────────┘

ACHIEVING HIGH AVAILABILITY:
┌─────────────────────────────────────────┐
│  Redundancy    → No single point of     │
│                  failure (SPOF)          │
│  Replication   → Data across regions    │
│  Failover      → Auto switch on failure │
│  Health Checks → Detect failures fast   │
│  Graceful      → Degrade, don't crash   │
│  Degradation                             │
└─────────────────────────────────────────┘

Key Strategies

Active-Passive: Standby takes over on primary failure (simple, slight downtime).
Active-Active: All nodes serve traffic (no downtime, complex sync).
SLA: Service Level Agreement — contractual uptime guarantee.
MTTR: Mean Time To Recovery — how fast you can fix failures.

Core

🧱 Components

Building Blocks of System Design

The essential components used to build any large-scale distributed system.

⚖️

Load Balancer

Distributes incoming traffic across multiple servers for reliability and performance.

         ┌──────────┐
         │  Clients │
         └────┬─────┘
              │
       ┌──────▼──────┐
       │    LOAD      │
       │  BALANCER    │
       │  (L4 / L7)  │
       └──┬───┬───┬──┘
          │   │   │
     ┌────▼┐ ┌▼──┐ ┌▼────┐
     │ Srv │ │Srv│ │ Srv │
     │  1  │ │ 2 │ │  3  │
     └─────┘ └───┘ └─────┘

ALGORITHMS:
  Round Robin ─── Equal distribution
  Weighted RR ─── Based on server capacity
  Least Conn ─── Route to least busy
  IP Hash ────── Session affinity
  Consistent ─── Minimize remapping
    Hash           on server changes

Key Concepts

L4 (Transport): Routes based on IP/port. Fast, no content inspection. (HAProxy, NLB)
L7 (Application): Routes based on URL, headers, cookies. Smart but slower. (Nginx, ALB)
Health Checks: Periodically ping servers, remove unhealthy ones from pool.
Global LB: DNS-based, routes to nearest datacenter (Cloudflare, Route53).

Infrastructure

💾

Caching

Store frequently accessed data in fast storage to reduce latency and database load.

CACHE-ASIDE (Lazy Loading):
  App ──read──→ Cache ──miss──→ DB
                  │               │
                  │    ┌──────────┘
                  │    │ load into cache
                  ◀────┘

WRITE-THROUGH:
  App ──write──→ Cache ──write──→ DB
  (Consistent but higher write latency)

WRITE-BEHIND (Write-Back):
  App ──write──→ Cache ──async──→ DB
  (Fast writes but risk of data loss)

CACHE HIERARCHY:
  ┌─────────┐ < 1ms  ┌───────┐
  │ L1: App │────────→│Browser│
  │  Memory │         │ Cache │
  └────┬────┘         └───────┘
       │ 1-5ms
  ┌────▼────┐         ┌───────┐
  │L2: Redis│────────→│  CDN  │
  │Memcached│         │ Edge  │
  └────┬────┘         └───────┘
       │ 5-50ms
  ┌────▼────┐
  │L3: DB   │
  │  Query  │
  │  Cache  │
  └─────────┘

Key Concepts

TTL: Time-to-live — auto-expire stale data. Balance freshness vs hit rate.
Eviction: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO.
Cache Stampede: Many requests hit DB when cache expires. Fix: lock or stagger TTL.
Invalidation: Hardest problem! Event-driven invalidation > TTL-based.

Performance

🌐

Content Delivery Network (CDN)

Geographically distributed servers that cache content close to end users.

Without CDN:
  User (Tokyo) ───────500ms──────→ Origin (US)

With CDN:
  User (Tokyo) ──20ms──→ CDN Edge (Tokyo)
                              │ cache miss
                              ▼
                         Origin (US)
                              │
                    CDN caches response
                              │
  Next User ────20ms───→ CDN Edge ✓ (cached!)

PUSH CDN vs PULL CDN:
┌──────────┬──────────────────────────────┐
│   PUSH   │ You upload to CDN upfront    │
│          │ Best for static, known content│
├──────────┼──────────────────────────────┤
│   PULL   │ CDN fetches on first request │
│          │ Best for dynamic content     │
│          │ Risk: slow first request     │
└──────────┴──────────────────────────────┘

Key Concepts

Edge Locations: 200+ PoPs worldwide (Cloudflare, CloudFront, Akamai).
Static Content: Images, CSS, JS, videos — perfect for CDN.
Dynamic Content: API responses with short TTL or edge computing.
Cache Invalidation: Purge by URL, tag, or wildcard when content updates.

Infrastructure

🗄️

Database Selection

Choosing the right database for your use case — SQL vs NoSQL and beyond.

┌──────────────────────────────────────────┐
│          DATABASE DECISION TREE           │
├──────────────────────────────────────────┤
│                                           │
│  Need ACID transactions?                 │
│    YES → SQL (PostgreSQL, MySQL)         │
│    NO  ↓                                 │
│                                           │
│  Need flexible schema?                   │
│    YES → Document DB (MongoDB)           │
│    NO  ↓                                 │
│                                           │
│  Key-value lookups at massive scale?     │
│    YES → DynamoDB, Redis, Cassandra      │
│    NO  ↓                                 │
│                                           │
│  Complex relationships / graph queries?  │
│    YES → Neo4j, Neptune                  │
│    NO  ↓                                 │
│                                           │
│  Time-series / IoT data?                 │
│    YES → InfluxDB, TimescaleDB           │
│    NO  ↓                                 │
│                                           │
│  Full-text search?                       │
│    YES → Elasticsearch, OpenSearch       │
│                                           │
│  Vector similarity (AI/ML)?              │
│    YES → Pinecone, pgvector, Milvus     │
└──────────────────────────────────────────┘

Key Concepts

SQL: Strong consistency, ACID, complex queries. Scale with read replicas + sharding.
NoSQL: Flexible schema, horizontal scaling, eventual consistency. Types: document, key-value, wide-column, graph.
NewSQL: Best of both — CockroachDB, Spanner, TiDB. Distributed SQL.
Polyglot Persistence: Use different DBs for different parts of the system.

Data

📊

Database Sharding & Replication

Scaling databases horizontally through partitioning and replication strategies.

REPLICATION (Read Scaling):
  ┌────────┐   sync/async   ┌─────────┐
  │Primary │──────────────→  │ Replica │ ← reads
  │(writes)│──────────────→  │ Replica │ ← reads
  └────────┘                 └─────────┘

SHARDING (Write Scaling):
  ┌────────────────────────────────┐
  │        Shard Router            │
  └───┬────────┬────────┬──────┘
      │        │        │
  ┌───▼──┐ ┌──▼───┐ ┌──▼───┐
  │Shard 1│ │Shard 2│ │Shard 3│
  │ A-H   │ │ I-P   │ │ Q-Z   │
  └───────┘ └───────┘ └───────┘

SHARDING STRATEGIES:
  Range-based  → user_id 1-1M, 1M-2M
  Hash-based   → hash(user_id) % N
  Geo-based    → by region/country
  Directory    → lookup table maps key→shard

Key Concepts

Hotspot: One shard gets disproportionate traffic. Fix: better partition key or consistent hashing.
Cross-shard queries: Expensive! Design schema to avoid them.
Rebalancing: Adding/removing shards. Consistent hashing minimizes data movement.
Vitess: MySQL sharding middleware (used by YouTube, Slack).

Data

📨

Message Queues & Event Streaming

Decouple services with asynchronous communication for reliability and scalability.

MESSAGE QUEUE (Point-to-Point):
  Producer ──→ ┌──────────┐ ──→ Consumer
               │  Queue   │
  Producer ──→ │ ┌──┬──┐  │ ──→ Consumer
               │ │m3│m2│m1│
               └──────────┘

EVENT STREAMING (Pub/Sub):
  Publisher ──→ ┌──────────────┐
               │  Topic        │
               │  ┌──┬──┬──┐  │──→ Consumer Group A
               │  │e3│e2│e1│  │──→ Consumer Group B
               └──────────────┘──→ Consumer Group C

WHEN TO USE WHAT:
┌─────────────┬──────────────────────────┐
│ RabbitMQ    │ Task queues, RPC, routing│
│ Kafka       │ Event streaming, logs,   │
│             │ CDC, high throughput     │
│ SQS         │ Simple cloud queues      │
│ Redis Pub/  │ Real-time notifications  │
│   Sub       │ (no persistence)         │
└─────────────┴──────────────────────────┘

Key Concepts

At-least-once: Messages may be delivered multiple times. Consumers must be idempotent.
Exactly-once: Hard to achieve. Kafka supports with transactions.
Dead Letter Queue: Failed messages go to DLQ for investigation.
Backpressure: Slow consumer? Queue grows. Solution: scale consumers or rate limit producers.

Infrastructure

🔌

API Design

REST, GraphQL, gRPC — choosing the right API paradigm for your system.

┌──────────┬──────────┬──────────┬──────────┐
│          │   REST   │ GraphQL  │  gRPC    │
├──────────┼──────────┼──────────┼──────────┤
│ Protocol │  HTTP    │  HTTP    │ HTTP/2   │
│ Format   │  JSON    │  JSON    │ Protobuf │
│ Contract │  Loose   │  Schema  │ Strict   │
│ Caching  │  Easy    │  Hard    │  Hard    │
│ Learning │  Easy    │  Medium  │  Hard    │
│ Speed    │  Good    │  Good    │  Fastest │
├──────────┼──────────┼──────────┼──────────┤
│ Best For │ Public   │ Flexible │ Internal │
│          │ APIs,    │ frontend │ micro-   │
│          │ CRUD     │ queries  │ services │
└──────────┴──────────┴──────────┴──────────┘

REST Example:
  GET  /api/users/123
  POST /api/users     { "name": "Alex" }
  PUT  /api/users/123 { "name": "Alex B" }

gRPC: 10x faster for service-to-service

Key Concepts

Versioning: URL (/v2/users) or header (Accept: v2). Never break clients.
Rate Limiting: Token bucket or sliding window. Return 429 when exceeded.
Pagination: Cursor-based > offset-based for large datasets.
Idempotency: PUT/DELETE should be idempotent. Use idempotency keys for POST.

Design

🔗

Consistent Hashing

Distribute data across nodes with minimal redistribution when nodes change.

HASH RING:
              Node A
               ╱╲
              ╱  ╲
         ○  ╱    ╲  ○ ← keys
           ╱      ╲
    Node D ────────── Node B
           ╲      ╱
         ○  ╲    ╱  ○
              ╲  ╱
               ╲╱
             Node C

Traditional Hashing:
  hash(key) % N → Add/remove node =
  ALL keys remapped! ❌

Consistent Hashing:
  Keys mapped to ring position
  Only K/N keys remapped when
  nodes change ✅

VIRTUAL NODES:
  Each physical node → 100-200 virtual nodes
  Ensures even distribution
  Node A: A-1, A-2, ... A-150 on ring

Key Concepts

Problem: hash(key) % N breaks when N changes — all data reshuffled.
Solution: Hash ring where only neighbors are affected by node changes.
Virtual Nodes: Each server gets multiple points on ring for better balance.
Used by: DynamoDB, Cassandra, Memcached, CDN routing.

Algorithm

🔀

Proxy & API Gateway

Forward proxy, reverse proxy, and API gateway patterns for routing and security.

FORWARD PROXY (client-side):
  Client ──→ Proxy ──→ Internet ──→ Server
  (VPN, anonymity, content filtering)

REVERSE PROXY (server-side):
  Client ──→ Reverse ──→ Server 1
              Proxy  ──→ Server 2
                     ──→ Server 3
  (Load balancing, SSL, caching, security)

API GATEWAY:
  ┌─────────────────────────────────┐
  │         API GATEWAY             │
  │  ┌─────┬──────┬──────┬──────┐  │
  │  │Auth │Rate  │Route │Trans-│  │
  │  │     │Limit │ing   │form  │  │
  │  └──┬──┴──┬───┴──┬───┴──┬───┘  │
  │     │     │      │      │      │
  └─────┼─────┼──────┼──────┼──────┘
        ▼     ▼      ▼      ▼
     User  Order  Payment  Notif
     Svc    Svc    Svc     Svc

Key Concepts

Nginx: Most popular reverse proxy. Also serves static files, SSL termination.
API Gateway: Kong, AWS API Gateway — auth, rate limiting, monitoring in one place.
Service Mesh: Istio/Envoy — proxy sidecar per service for observability.
BFF: Backend-for-Frontend — separate gateway per client type (web, mobile).

Infrastructure

🏗️ Architecture

Design Patterns & Architectures

Battle-tested patterns for building reliable, scalable distributed systems.

🔲

Microservices Architecture

Breaking a monolith into independently deployable services.

MONOLITH:                  MICROSERVICES:
┌──────────────┐     ┌─────┐ ┌─────┐ ┌─────┐
│   All Code   │     │User │ │Order│ │Pay  │
│   One Deploy │ ──→ │ Svc │ │ Svc │ │ Svc │
│   One DB     │     └──┬──┘ └──┬──┘ └──┬──┘
└──────────────┘        │      │      │
                     ┌──▼──┐┌──▼──┐┌──▼──┐
                     │Users││Order││Pay  │
                     │ DB  ││ DB  ││ DB  │
                     └─────┘└─────┘└─────┘

COMMUNICATION:
  Synchronous:  REST / gRPC (request-response)
  Asynchronous: Kafka / RabbitMQ (events)

SERVICE DISCOVERY:
  Client ──→ Service Registry ──→ Instance
  (Consul, Eureka, K8s DNS)

Key Principles

Single Responsibility: Each service owns one business capability.
Database per Service: No shared databases — communicate via APIs/events.
Independent Deploy: Change one service without affecting others.
Trade-off: Operational complexity (networking, monitoring, debugging) increases significantly.

Architecture

⚡

Event-Driven Architecture

Services communicate through events for loose coupling and real-time processing.

EVENT SOURCING:
  Commands ──→ Event Store ──→ Projections
                  │
  OrderCreated    │──→ Read Model (SQL)
  ItemAdded       │──→ Analytics
  OrderPaid       │──→ Notifications
  OrderShipped    │──→ Search Index

CQRS (Command Query Separation):
  ┌─────────┐         ┌─────────────┐
  │ COMMAND │──write──→│ Write Model │
  │  API    │         │ (Event Store)│
  └─────────┘         └──────┬──────┘
                             │ events
  ┌─────────┐         ┌──────▼──────┐
  │  QUERY  │◀──read──│ Read Model  │
  │  API    │         │(Materialized)│
  └─────────┘         └─────────────┘

SAGA PATTERN (Distributed Transactions):
  Order → Payment → Inventory → Shipping
    │        │          │
    ◀── Compensate ◀── Rollback (on failure)

Key Concepts

Event Sourcing: Store events, not state. Replay to rebuild any point in time.
CQRS: Separate read and write models. Optimize each independently.
Saga: Coordinate distributed transactions with compensating actions.
Choreography vs Orchestration: Events react (loose) vs central coordinator (controlled).

Architecture

🚦

Rate Limiting

Protect your system from abuse and ensure fair usage across clients.

TOKEN BUCKET:
  ┌──────────────────────┐
  │  Bucket (max: 10)    │
  │  ●●●●●●●○○○          │ ← refill 1/sec
  │                      │
  │  Request arrives:    │
  │    Token available?  │
  │    YES → Process ✅  │
  │    NO  → Reject 429 ❌│
  └──────────────────────┘

SLIDING WINDOW LOG:
  |──────── 1 minute window ────────|
  |  req  req  req  req  req  req   |
  |  ↑                          ↑   |
  timestamp              timestamp  |
  Count requests in window.
  Reject if count > limit.

DISTRIBUTED RATE LIMITING:
  Client ──→ API Gateway ──→ Redis
              │                │
              │  INCR key     │
              │  EXPIRE 60s   │
              │  if count > N │
              │    → 429      │

Algorithms

Token Bucket: Allows bursts, smooth rate. Used by AWS, Stripe.
Sliding Window: Precise counting, memory intensive for high traffic.
Fixed Window: Simple but allows 2x burst at window boundaries.
Leaky Bucket: Processes at fixed rate, queues excess. Good for smoothing.

Reliability

🔌

Circuit Breaker & Resilience

Prevent cascading failures across services with fault tolerance patterns.

CIRCUIT BREAKER STATES:
  ┌────────┐  failures > threshold  ┌──────┐
  │ CLOSED │ ─────────────────────→ │ OPEN │
  │(normal)│                        │(fail │
  └────────┘                        │fast) │
       ▲                            └──┬───┘
       │                               │
       │    success                    │ timeout
       │                               │
  ┌────┴─────┐                    ┌───▼──────┐
  │  CLOSED  │ ◀── success ───── │HALF-OPEN │
  └──────────┘                    │(test one)│
                   failure ──→    └──────────┘
                   back to OPEN

RESILIENCE PATTERNS:
┌────────────────┬──────────────────────┐
│ Retry          │ Exp backoff + jitter │
│ Timeout        │ Don't wait forever   │
│ Bulkhead       │ Isolate failures     │
│ Fallback       │ Cached/default value │
│ Circuit Breaker│ Fail fast on errors  │
└────────────────┴──────────────────────┘

Key Concepts

Cascading Failure: Service A → B → C. If C is slow, A and B pile up and crash.
Bulkhead: Isolate thread pools per dependency. One failure doesn't exhaust all resources.
Retry with Jitter: Exponential backoff + random jitter prevents thundering herd.
Libraries: Resilience4j (Java), Polly (.NET), Hystrix (deprecated).

Reliability

🌍

Distributed Consensus

How distributed nodes agree on state — Paxos, Raft, and leader election.

RAFT CONSENSUS:
  ┌────────┐   vote request   ┌────────┐
  │Follower│ ◀──────────────  │Candidate│
  └────┬───┘                  └────┬───┘
       │                           │
       │   majority votes          │
       │                      ┌────▼───┐
       │  ◀── heartbeat ───── │ Leader │
       │                      └────────┘
       │
  Leader sends log entries to followers
  Committed when majority acknowledge

LEADER ELECTION:
  1. Follower timeout → becomes Candidate
  2. Requests votes from all nodes
  3. Majority votes → becomes Leader
  4. Sends heartbeats to maintain leadership

SPLIT BRAIN:
  Network split → two leaders!
  Solution: Quorum (majority) required
  5 nodes → need 3 to agree (survives 2 failures)

Key Concepts

Raft: Understandable consensus. Used in etcd, CockroachDB, TiKV.
Quorum: Majority agreement. W + R > N for strong consistency.
Vector Clocks: Track causality across distributed nodes.
Gossip Protocol: Nodes share state via random peer communication (Cassandra).

Distributed

🎯

Probabilistic Data Structures

Space-efficient structures for approximate membership, counting, and cardinality.

BLOOM FILTER:
  "Is X in the set?"
  ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
  │0│1│0│1│0│0│1│0│1│0│0│1│  bit array
  └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
    ↑   ↑       ↑       ↑
    h1  h2      h3      h1 (hash functions)

  "Maybe in set" → could be false positive
  "Definitely not" → NEVER false negative

  Use case: Check if username taken before
  hitting database. 10M users in 12MB!

COUNT-MIN SKETCH:
  "Approximately how many times did X occur?"
  Used for: top-K, frequency estimation

HYPERLOGLOG:
  "How many unique items?" (cardinality)
  Count 1 BILLION unique items in 12 KB!
  Used by: Redis PFCOUNT, analytics

Key Concepts

Bloom Filter: Fast membership test. Used in databases (skip disk reads), CDNs, spam filters.
HyperLogLog: Count unique visitors with 0.81% error in 12KB. Redis built-in.
Count-Min Sketch: Frequency estimation in streaming data. Overestimates, never under.
Trade-off: Accuracy for space/time. Perfect when approximate is good enough.

Algorithm

🏢 Real World

Real-World System Designs

End-to-end designs of popular systems you'll encounter in interviews.

🔗

Design URL Shortener

Like bit.ly — convert long URLs to short ones with analytics.

┌──────┐    POST /shorten     ┌───────────┐
│Client│ ──────────────────→  │ API Server│
└──────┘  {"url":"long..."}   └─────┬─────┘
                                    │
    1. Generate short ID (Base62)   │
    2. Store mapping                │
                              ┌─────▼─────┐
                              │ Database   │
                              │ short→long │
                              │ + metadata │
                              └─────┬─────┘
                                    │
┌──────┐    GET /abc123       ┌─────▼─────┐
│Client│ ──────────────────→  │ Cache      │
└──────┘    301 Redirect      │ (Redis)    │
                              └───────────┘

ID GENERATION:
  Base62: [a-zA-Z0-9] → 62^7 = 3.5 trillion
  Snowflake ID → timestamp + machine + seq
  MD5/SHA → hash + take first 7 chars

Key Decisions

301 vs 302: 301 (permanent) = cached by browser. 302 (temp) = always hits server (better for analytics).
Read-heavy: 100:1 read/write ratio. Cache aggressively in Redis.
Custom aliases: Check uniqueness, reserve words blacklist.
Analytics: Log click events to Kafka → aggregate in analytics pipeline.

Interview Classic

💬

Design Chat System

Like WhatsApp/Slack — real-time messaging with groups, media, and presence.

┌──────┐  WebSocket  ┌──────────────────┐
│User A│ ◀────────→  │  Chat Server     │
└──────┘             │  (WS Gateway)    │
                     └────────┬─────────┘
┌──────┐  WebSocket          │
│User B│ ◀────────→  ┌───────▼──────┐
└──────┘             │ Message Queue │
                     │   (Kafka)     │
                     └───────┬──────┘
                             │
              ┌──────────────┼──────────┐
              ▼              ▼          ▼
        ┌──────────┐  ┌──────────┐ ┌────────┐
        │ Message  │  │ Presence │ │  Push   │
        │   DB     │  │  Service │ │Notific. │
        │(Cassandra│  │ (Redis)  │ │ (FCM/   │
        │  / HBase)│  └──────────┘ │  APNS)  │
        └──────────┘               └────────┘

MESSAGE FLOW:
  1. User A sends via WebSocket
  2. Server publishes to Kafka topic
  3. Recipient's chat server consumes
  4. If online → deliver via WebSocket
  5. If offline → push notification + store

Key Decisions

WebSocket: Full-duplex, persistent connection for real-time. Fallback: long polling.
Message ordering: Sequence IDs per conversation. Cassandra: partition by chat_id, cluster by timestamp.
Group chat: Fan-out on write (small groups) vs fan-out on read (large channels).
E2E Encryption: Signal Protocol — keys on devices, server can't read messages.

Interview Classic

📱

Design News Feed

Like Twitter/Instagram — generate and serve personalized content feeds.

FAN-OUT ON WRITE (Push Model):
  User posts ──→ Write to all followers' feeds
  ┌──────┐
  │Post  │──→ Feed cache (follower 1)
  │      │──→ Feed cache (follower 2)
  │      │──→ Feed cache (follower N)
  └──────┘
  ✅ Fast reads  ❌ Celebrity problem (millions)

FAN-OUT ON READ (Pull Model):
  User opens app ──→ Query all followed users
  ┌──────┐
  │Reader│──→ Get posts from user A, B, C...
  │      │──→ Merge + Sort + Return top N
  └──────┘
  ✅ No celebrity problem  ❌ Slow reads

HYBRID (Twitter's approach):
  Regular users → fan-out on write
  Celebrities (>10K followers) → fan-out on read
  Merge both at read time

Key Decisions

Feed Storage: Pre-computed in Redis (user_id → list of post_ids). Limit to 800 items.
Ranking: ML model scores posts by relevance, recency, engagement, relationship.
Cache: Feed cache + content cache + social graph cache.
Pagination: Cursor-based (last_seen_id) to handle real-time insertions.

Interview Classic

🔔

Design Notification System

Multi-channel notifications — push, SMS, email at scale with preferences.

┌──────────┐     ┌───────────────────────┐
│ Services │────→│  Notification Service │
│(triggers)│     └───────────┬───────────┘
└──────────┘                 │
                    ┌────────▼────────┐
                    │  Message Queue  │
                    │    (Kafka)      │
                    └──┬─────┬─────┬──┘
                       │     │     │
                  ┌────▼┐ ┌─▼──┐ ┌▼────┐
                  │Push │ │SMS │ │Email│
                  │Worker│ │Wrkr│ │Wrkr │
                  └──┬──┘ └─┬──┘ └──┬──┘
                     │      │       │
                  ┌──▼──┐┌──▼──┐┌───▼───┐
                  │APNS/│ │Twil│ │SES/   │
                  │FCM  │ │io  │ │Sendgrd│
                  └─────┘ └────┘ └───────┘

FEATURES:
  ✓ User preferences (channel, frequency)
  ✓ Rate limiting (max 3 push/hour)
  ✓ Template engine (personalization)
  ✓ Analytics (delivered, opened, clicked)
  ✓ Retry with exponential backoff

Key Decisions

Priority Queue: Urgent (OTP, security) → high priority. Marketing → low priority.
Deduplication: Idempotency key prevents duplicate notifications.
User Preferences: Channel preferences, quiet hours, frequency caps.
Delivery Tracking: Sent → Delivered → Opened → Clicked funnel.

Interview Classic

🔍

Design Search System

Full-text search with autocomplete, ranking, and real-time indexing.

INDEXING PIPELINE:
  Data Source ──→ Crawler/CDC ──→ Processor
                                     │
                              ┌──────▼──────┐
                              │  Tokenize   │
                              │  Normalize  │
                              │  Stem/Lemma │
                              └──────┬──────┘
                                     │
                              ┌──────▼──────┐
                              │ Inverted    │
                              │ Index       │
                              │ (Elastic-   │
                              │  search)    │
                              └─────────────┘

INVERTED INDEX:
  "distributed" → [doc1, doc5, doc9]
  "system"      → [doc1, doc3, doc5]
  "design"      → [doc1, doc2, doc7]

  Search "distributed system":
  → Intersection: [doc1, doc5] ← results!

SEARCH FLOW:
  Query → Parse → Search Index → Rank → Return
           │                       │
        Spell     TF-IDF, BM25, PageRank,
        correct   personalization, freshness

Key Concepts

Inverted Index: Maps terms to documents. Core of all search engines.
BM25: Industry-standard ranking algorithm. Considers term frequency and document length.
Autocomplete: Trie data structure + top-K queries by frequency.
Typeahead: Prefix search on pre-computed suggestions, cached aggressively.

Interview Classic

🤖

Design AI Agent Platform

Multi-tenant AI agent orchestration — reasoning, planning, and tool execution.

┌─────────────┐
│ Slack/Teams  │──→ API Gateway (Auth, Rate Limit)
└──────┬──────┘          │
       │          Session Manager (Redis)
       │                 │
       │       ┌─────────▼──────────┐
       │       │  REASONING ENGINE   │
       │       │  ┌───────────────┐  │
       │       │  │ Planning (LLM)│  │→ Decompose
       │       │  │ Execution Eng │  │→ Run tools
       │       │  │ Observation   │  │→ Evaluate
       │       │  └───────────────┘  │
       │       └────┬──────────┬─────┘
       │            │          │
       │   ┌────────▼──┐  ┌───▼────────┐
       │   │Tool Registry│  │State Manager│
       │   │(per tenant) │  │(Redis + PG) │
       │   └────┬───────┘  └────────────┘
       │        │
       │  ┌─────┼──────┬──────────┐
       │  ▼     ▼      ▼          ▼
       │ SNOW  Jira  Salesforce  Okta
       └──────────────────────────────┘

Key Decisions

ReAct Pattern: Reason → Act → Observe → loop until done.
Plugin System: Each tenant configures their own connectors with isolated credentials.
Model Routing: Fast model for simple tasks, powerful model for complex reasoning.
Guardrails: Permission enforcement, output validation, cost controls per tenant.

Hot Topic 🔥

Need	Choose	Example
ACID Transactions	PostgreSQL	Payments, Orders
High Write Throughput	Cassandra	IoT, Time-series
Flexible Schema	MongoDB	CMS, Catalogs
Cache / Sessions	Redis	Sessions, Leaderboards
Full-Text Search	Elasticsearch	Product Search
Graph Queries	Neo4j	Social Networks
Global Scale SQL	Spanner/CockroachDB	Multi-region apps
Vector Search (AI)	Pinecone/pgvector	RAG, Similarity

Problem	Solution
Too many reads	Cache (Redis) + CDN + Read replicas
Too many writes	Sharding + Message queue + Batch writes
Single point of failure	Redundancy + Failover + Multi-AZ
Slow API responses	Async processing + Pagination + Compression
Data growing too fast	Data partitioning + Archival + TTL
Cross-region latency	Multi-region deploy + CDN + Edge computing
Thundering herd	Rate limiting + Circuit breaker + Backpressure

System Design Fundamentals

Scalability

Key Concepts

Latency vs Throughput

Key Concepts

CAP Theorem

Key Concepts

Consistency Patterns

When to Use

Back-of-Envelope Estimation

Estimation Framework

Availability & Reliability

Key Strategies

Building Blocks of System Design

Load Balancer

Key Concepts

Caching

Key Concepts

Content Delivery Network (CDN)

Key Concepts

Database Selection

Key Concepts

Database Sharding & Replication

Key Concepts

Message Queues & Event Streaming

Key Concepts

API Design

Key Concepts

Consistent Hashing

Key Concepts

Proxy & API Gateway

Key Concepts

Design Patterns & Architectures

Microservices Architecture

Key Principles

Event-Driven Architecture

Key Concepts

Rate Limiting

Algorithms

Circuit Breaker & Resilience

Key Concepts

Distributed Consensus

Key Concepts

Probabilistic Data Structures

Key Concepts

Real-World System Designs

Design URL Shortener

Key Decisions

Design Chat System

Key Decisions

Design News Feed

Key Decisions

Design Notification System

Key Decisions

Design Search System

Key Concepts

Design AI Agent Platform

Key Decisions

System Design Cheat Sheet

🎯 System Design Interview Framework (RESHADED)

Requirements

Estimation

Storage Schema

High-Level Design

API Design

Deep Dive

Evaluate

Discuss Trade-offs

🗄️ Database Quick Pick

⚡ Scaling Strategies

🔢 Numbers Every Engineer Should Know