Real-Time Notification & Approval System

"Design a system where AI agents request approvals from humans. Determine the approval chain, notify in real-time, track state, escalate overdue requests, and integrate with Slack, Email, and Microsoft Teams."

Table of Contents

  1. Clarifying Questions & Scope
  2. Back-of-Envelope Estimation
  3. High-Level Architecture
  4. Deep Dive 1: Rules Engine & Chain Builder
  5. Deep Dive 2: Async Notification Delivery
  6. Deep Dive 3: Escalation Scheduler
  7. Scaling & ML
  8. Cheat Sheet

1 Clarifying Questions & Scope

Dimension Clarification Assumption
Approval Types What kinds of approvals? Discount, access, expense, change management, escalations
Chain Complexity Single or multi-level? Multi-level: sequential, parallel, or conditional chains
Channels Which notification channels? Slack, Email, Microsoft Teams (with action buttons)
SLA Escalation rules? Remind at 75% SLA, escalate at 100%, auto-action at 150%
Delivery Guarantee Can we lose notifications? At-least-once delivery, idempotent processing

2 Back-of-Envelope Estimation

Scale Numbers

  • 100K approval requests/day across all tenants
  • 250K notifications/day (each approval triggers ~2.5 notifications avg)
  • <1 second notification delivery latency
  • 500K state changes/day (submitted, pending, reminded, approved, etc.)
  • Peak: ~1,200 approvals/hour during business hours

3 High-Level Architecture

  NOTIFICATION & APPROVAL SYSTEM
  ═══════════════════════════════════════════════════════════════════

  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌─────────────┐
  │ Approval │───>│ Rules Engine │───>│ Chain Builder │───>│    Kafka    │
  │ Request  │    │ (who needs   │    │ (sequential/  │    │   Topics   │
  │ from AI  │    │  to approve) │    │  parallel)    │    │            │
  └──────────┘    └──────────────┘    └──────────────┘    └──────┬──────┘
                                                                │
                         ┌──────────────────────────────────────┘
                         │
            ┌────────────┼────────────┬────────────────┐
            v            v            v                v
     ┌────────────┐ ┌──────────┐ ┌──────────┐  ┌────────────┐
     │ Slack      │ │ Email    │ │ Teams    │  │ Escalation │
     │ Worker     │ │ Worker   │ │ Worker   │  │ Scheduler  │
     │ (Bot API)  │ │ (SES)    │ │(Graph API│  │ (5-min     │
     │            │ │          │ │ + Webhook│  │  sweeps)   │
     └─────┬──────┘ └────┬─────┘ └────┬─────┘  └─────┬──────┘
           │              │            │              │
           └──────────────┴────────────┴──────────────┘
                                │
                    ┌───────────v───────────┐
                    │   State Tracker       │
                    │   Redis (hot state)   │
                    │   + PostgreSQL (log)  │
                    └───────────────────────┘

4 Deep Dive 1: Rules Engine & Chain Builder

Configurable Approval Rules

The Rules Engine determines WHO needs to approve and in what ORDER. Rules are configurable per tenant and per approval type.

Example: Discount Approval Tiers

  DISCOUNT APPROVAL RULES (SaaS Sales)
  ═══════════════════════════════════════════

  Discount 0-10%   → Sales Rep auto-approved
  Discount 10-20%  → Sales Manager approval
  Discount 20-30%  → Sales Director + Finance Manager (parallel)
  Discount 30%+    → VP Sales + CFO (sequential)

  DB ACCESS APPROVAL RULES (IT)
  ═══════════════════════════════════════════

  Read-only access  → Team Lead approval
  Read-write access → Team Lead → DB Admin (sequential)
  Admin access      → Team Lead → DB Admin → CISO (sequential)
  Production access → Team Lead → DB Admin → CISO → CTO (sequential)

Rule Storage

Chain Builder Output

  Example Chain for "30% discount on $50K deal":

  Step 1: VP Sales (Sarah)      ──sequential──>
  Step 2: CFO (Mike)            ──sequential──>
  Step 3: Legal Review (auto)   ──parallel with──>
  Step 3: Revenue Ops (Janet)

  Chain stored as ordered list of approval steps.
  Each step: approver_id, type (sequential/parallel), SLA_hours, escalation_to

5 Deep Dive 2: Async Notification Delivery

Kafka Topic Per Channel

Idempotent Delivery

Rich Notifications with Action Buttons

Slack Notification Example

Approval Request #4821
Type: Discount Approval
Requester: AI Sales Agent (on behalf of John Smith)
Details: 25% discount on Acme Corp deal ($50,000 ARR)
Justification: Competitive pressure from Competitor X, multi-year commitment
SLA: 4 hours remaining

[Approve] [Reject] [Request More Info] [Delegate]

Notification State Tracking

  NOTIFICATION LIFECYCLE
  ═══════════════════════════════════

  queued → sent → delivered → read → acted-upon
    │        │        │         │         │
    │        │        │         │         └─ User clicked Approve/Reject
    │        │        │         └─ Slack: message_read webhook
    │        │        └─ Slack: message_sent confirmation
    │        └─ Worker picked from Kafka, API call succeeded
    └─ Published to Kafka topic

  Each transition logged in PostgreSQL with timestamp.
  Redis holds current state for fast lookup.

6 Deep Dive 3: Escalation Scheduler

SLA Stages

SLA % Action Example (4-hour SLA)
75% Reminder notification to current approver At 3 hours: "Reminder: Approval #4821 needs your action"
100% Escalate to next level (approver's manager) At 4 hours: Notify manager, mark as escalated
150% Auto-action (approve/reject per policy) At 6 hours: Auto-approve if policy allows, else escalate to VP

Scheduler Implementation

State Machine

  APPROVAL STATE MACHINE
  ═══════════════════════════════════════════════════════

                    ┌──────────┐
                    │SUBMITTED │
                    └────┬─────┘
                         │ rules engine determines chain
                         v
                    ┌──────────┐
              ┌─────│ PENDING  │─────┐
              │     └────┬─────┘     │
              │          │           │
         75% SLA    user action   100% SLA
              │          │           │
              v          │           v
        ┌──────────┐     │     ┌───────────┐
        │REMINDED  │     │     │ ESCALATED │
        └────┬─────┘     │     └─────┬─────┘
             │           │           │
             └─────┬─────┘     user action
                   │                 │
            ┌──────┴──────┐          │
            v             v          v
      ┌──────────┐  ┌──────────┐  ┌──────────┐
      │ APPROVED │  │ REJECTED │  │ EXPIRED  │
      └──────────┘  └──────────┘  │(auto-act)│
                                  └──────────┘

OOO (Out of Office) Handling

7 Scaling & ML

Scaling Strategies

ML Enhancements

8 Cheat Sheet

Notification & Approval — Key Numbers

  • 100K approvals/day, 250K notifications/day
  • <1s notification delivery latency
  • Kafka topic per channel (Slack, Email, Teams)
  • Idempotent delivery with dedup key: approval_id + step_id + channel
  • Rules Engine: configurable per tenant, cached in Redis (TTL 5 min)
  • Chain types: sequential, parallel, conditional
  • SLA stages: remind 75%, escalate 100%, auto-action 150%
  • 5-min escalation sweep using Redis sorted set
  • State machine: submitted → pending → approved/rejected/escalated/expired
  • OOO handling with calendar integration and auto-delegation