Multi-Tenant Plugin/Connector Platform

"Design a platform for customers to connect their business systems (ServiceNow, Jira, Salesforce, SAP, etc.) to an AI agent. Different systems, APIs, authentication methods, and data schemas."

Table of Contents

  1. Interviewer Signals
  2. Clarifying Questions & Scope
  3. Back-of-Envelope Estimation
  4. High-Level Architecture
  5. Key Design Decisions
  6. Deep Dive 1: Template vs Instance
  7. Deep Dive 2: Middleware Chain
  8. Deep Dive 3: Sliding Window Rate Limiter
  9. Deep Dive 4: Credential Management
  10. Cheat Sheet

1 Interviewer Signals

Signal What They Want to See
Abstraction Can you design a unified interface over heterogeneous systems?
Multi-tenancy How do you isolate tenant data, credentials, and rate limits?
Extensibility How easy is it to add a new connector type (e.g., Workday)?
Resilience How do you handle external API failures, rate limits, timeouts?
Security How are credentials stored, rotated, and scoped?
Operational Maturity Monitoring, alerting, debugging — can you operate this at scale?

2 Clarifying Questions & Scope

Dimension Clarification Assumption
Connector Count How many systems to support? 10 connector types initially (ServiceNow, Jira, Salesforce, etc.)
Instances How many instances per customer? 5-15 connectors per customer, 350 customers = ~3,500 instances
Operations Read-only or read-write? Both: read data + create/update records
Auth Methods What auth types? OAuth 2.0, API Key, Basic Auth, mTLS, SAML
Rate Limits External API limits? Per-tenant, per-connector. Must respect external system limits.

3 Back-of-Envelope Estimation

Scale Numbers

  • 3,500 connector instances (350 customers x 10 avg)
  • 350K API calls/day to external systems
  • 1K credential rotations/day (OAuth token refreshes)
  • Peak: ~15 API calls/second
  • Avg response time target: <2s (including external API latency)

4 High-Level Architecture

  PLUGIN/CONNECTOR PLATFORM
  ═══════════════════════════════════════════════════════════════════

  AI Agent Request
       │
       v
  ┌──────────┐     ┌─────────────────────────────────────────────┐
  │ Gateway  │────>│          RUNTIME ENGINE                     │
  │ (Auth,   │     │  ┌─────────────────────────────────────┐    │
  │  Route)  │     │  │     MIDDLEWARE CHAIN                 │    │
  └──────────┘     │  │                                     │    │
                   │  │  Auth → RateLimit → Transform →     │    │
                   │  │  Execute → Retry → Log              │    │
                   │  │                                     │    │
                   │  └─────────────────────────────────────┘    │
                   └─────────────────────┬───────────────────────┘
                                         │
              ┌──────────────────────────┼──────────────────────┐
              │                          │                      │
       ┌──────v──────┐          ┌────────v────────┐     ┌──────v──────┐
       │  Registry   │          │ Config Store    │     │   Vault     │
       │ (Templates  │          │ (Per-tenant     │     │(Credentials │
       │ + Instances)│          │  settings)      │     │  secrets)   │
       └─────────────┘          └─────────────────┘     └─────────────┘

                   EXTERNAL SYSTEMS              CROSS-CUTTING
              ┌────────────────────┐        ┌─────────────────────┐
              │ ServiceNow │ Jira  │        │ Metrics │ Tracing   │
              │ Salesforce │ SAP   │        │ Alerts  │ Audit Log │
              │ Workday    │ etc.  │        └─────────────────────┘
              └────────────────────┘

5 Key Design Decisions

Decision Choice Why
Template vs Instance model Separate template (blueprint) from instance (runtime config) Like Docker Image vs Container. One "ServiceNow connector" template, many customer instances.
Middleware chain pattern Ordered chain of composable middleware Each concern (auth, rate-limit, transform) is isolated and testable. Easy to add new middleware.
Credential storage HashiCorp Vault with dynamic secrets Never store credentials in DB. Auto-rotation. Per-tenant isolation. Audit trail.
Rate limiting Sliding window per (tenant, connector) Respects external API limits. No burst spikes at window boundaries. Redis ZSET implementation.
Schema transformation Declarative field mappings in JSON Customers map their custom fields without code. "status" → "ticket_state", "assignee" → "owner".

6 Deep Dive 1: Template vs Instance

Analogy: Docker Image vs Container

A Template is like a Docker Image — it defines WHAT a connector can do. An Instance is like a Container — it's a running configuration for a specific tenant with their credentials and custom mappings.

Template (Blueprint)

  CONNECTOR TEMPLATE: ServiceNow
  ═══════════════════════════════════════════

  {
    "template_id": "servicenow-v2",
    "name": "ServiceNow ITSM Connector",
    "version": "2.3.1",
    "auth_types": ["oauth2", "basic_auth"],
    "base_url_pattern": "https://{instance}.service-now.com",
    "capabilities": [
      "read_tickets",
      "create_ticket",
      "update_ticket",
      "list_groups",
      "get_user",
      "search_kb_articles"
    ],
    "api_version": "v2",
    "rate_limit_default": 500,  // requests/minute
    "required_fields": ["instance_name"],
    "optional_fields": ["custom_table_prefix"]
  }

Instance (Tenant Configuration)

  CONNECTOR INSTANCE: Acme Corp's ServiceNow
  ═══════════════════════════════════════════

  {
    "instance_id": "inst-acme-snow-001",
    "tenant_id": "acme-corp",
    "template_id": "servicenow-v2",
    "config": {
      "instance_name": "acmecorp",
      "base_url": "https://acmecorp.service-now.com"
    },
    "credential_ref": "vault://acme-corp/servicenow/oauth",
    "field_mappings": {
      "short_description": "title",
      "assignment_group": "team",
      "u_custom_field_1": "business_unit",
      "u_location_code": "office_location"
    },
    "rate_limit_override": 300,  // Acme's ServiceNow plan limit
    "status": "active",
    "health_check_interval": 60  // seconds
  }

Registry

7 Deep Dive 2: Middleware Chain

  REQUEST FLOW THROUGH MIDDLEWARE CHAIN
  ═══════════════════════════════════════════════════════

  Incoming Request
       │
  ┌────v────┐  Inject credentials from Vault. Handle OAuth
  │  AUTH   │  token refresh automatically. mTLS cert loading.
  └────┬────┘
       │
  ┌────v────────┐  Check sliding window. Per (tenant, connector).
  │ RATE LIMIT  │  429 if exceeded. Queue if near limit.
  └────┬────────┘
       │
  ┌────v──────────┐  Map internal schema → external API schema.
  │  TRANSFORM    │  Apply customer's field_mappings. Type coercion.
  └────┬──────────┘
       │
  ┌────v────────┐  HTTP call to external system. Connection pooling.
  │  EXECUTE    │  Timeout: 30s. Circuit breaker per instance.
  └────┬────────┘
       │
  ┌────v────┐  Exponential backoff: 1s, 2s, 4s. Max 3 retries.
  │  RETRY  │  Only on 429, 503, 504. NOT on 400, 401, 404.
  └────┬────┘
       │
  ┌────v────┐  Full audit trail. Request/response (sanitized).
  │   LOG   │  Latency, status code, tenant, connector, operation.
  └────┬────┘
       │
       v
  Response to AI Agent

Middleware Details

8 Deep Dive 3: Sliding Window Rate Limiter

Redis ZSET Algorithm

  SLIDING WINDOW RATE LIMITER (Redis ZSET)
  ═══════════════════════════════════════════════════════

  Key: rate_limit:{tenant_id}:{connector_id}
  Score: timestamp (Unix ms)
  Member: unique request ID

  ALGORITHM (per request):
  ─────────────────────────────────────────────────────
  1. ZADD key {now_ms} {request_id}        // Add this request
  2. ZREMRANGEBYSCORE key 0 {now_ms - 60000} // Remove requests older than 60s
  3. count = ZCARD key                       // Count requests in window
  4. IF count > limit: REJECT (429)          // Over limit
     ELSE: ALLOW                             // Under limit
  5. EXPIRE key 120                          // TTL cleanup safety net

  EXAMPLE (limit = 5 requests/minute):
  ─────────────────────────────────────────────────────
  Time    Action          ZSET Size    Result
  00:00   Request A       1            ALLOWED
  00:15   Request B       2            ALLOWED
  00:30   Request C       3            ALLOWED
  00:45   Request D       4            ALLOWED
  00:50   Request E       5            ALLOWED
  00:55   Request F       6            REJECTED (429)
  01:05   Request G       5            ALLOWED (A expired at 01:00)
  01:20   Request H       5            ALLOWED (B expired at 01:15)

Why Sliding Window over Fixed Window?

Fixed window problem: Limit is 100/min. User sends 100 requests at 0:59, then 100 more at 1:01. That's 200 requests in 2 seconds — the external API sees a burst and throttles us.

Sliding window: Always counts the last 60 seconds exactly. No burst at window boundaries. External APIs stay happy.

9 Deep Dive 4: Credential Management

Vault-Based Architecture

OAuth 2.0 Flow Detail

  OAUTH TOKEN LIFECYCLE
  ═══════════════════════════════════════════

  1. Customer configures connector in UI
     → Redirects to external system's OAuth consent screen
     → Receives authorization code

  2. Backend exchanges code for access_token + refresh_token
     → Stores both in Vault: vault://acme/servicenow/oauth
     → Sets token_expiry metadata

  3. At runtime (each API call):
     → Auth middleware reads token from Vault
     → If expires_at < now + 5min:
        → Use refresh_token to get new access_token
        → Store new token in Vault
        → Use new token for request
     → Inject Authorization: Bearer {token}

  4. If refresh fails (token revoked):
     → Mark instance as "auth_failed"
     → Notify customer: "Please re-authenticate ServiceNow"
     → Stop processing requests (don't leak errors to users)

10 Cheat Sheet

Plugin/Connector Platform — Key Numbers

  • 3,500 connector instances (350 customers x 10 avg)
  • 350K API calls/day, 1K credential rotations/day
  • Template vs Instance = Docker Image vs Container
  • Middleware chain: Auth → RateLimit → Transform → Execute → Retry → Log
  • Sliding window rate limiter: Redis ZSET, per (tenant, connector)
  • Credentials in Vault only, never in DB or config
  • OAuth auto-refresh when <5 min remaining
  • Circuit breaker: opens after 5 consecutive failures, half-open at 30s
  • Retry: exponential backoff 1s/2s/4s, only on 429/503/504
  • Declarative field mappings in JSON (no code needed per customer)