"Design a platform for customers to connect their business systems (ServiceNow, Jira, Salesforce, SAP, etc.) to an AI agent. Different systems, APIs, authentication methods, and data schemas."
| Signal | What They Want to See |
|---|---|
| Abstraction | Can you design a unified interface over heterogeneous systems? |
| Multi-tenancy | How do you isolate tenant data, credentials, and rate limits? |
| Extensibility | How easy is it to add a new connector type (e.g., Workday)? |
| Resilience | How do you handle external API failures, rate limits, timeouts? |
| Security | How are credentials stored, rotated, and scoped? |
| Operational Maturity | Monitoring, alerting, debugging — can you operate this at scale? |
| Dimension | Clarification | Assumption |
|---|---|---|
| Connector Count | How many systems to support? | 10 connector types initially (ServiceNow, Jira, Salesforce, etc.) |
| Instances | How many instances per customer? | 5-15 connectors per customer, 350 customers = ~3,500 instances |
| Operations | Read-only or read-write? | Both: read data + create/update records |
| Auth Methods | What auth types? | OAuth 2.0, API Key, Basic Auth, mTLS, SAML |
| Rate Limits | External API limits? | Per-tenant, per-connector. Must respect external system limits. |
PLUGIN/CONNECTOR PLATFORM
═══════════════════════════════════════════════════════════════════
AI Agent Request
│
v
┌──────────┐ ┌─────────────────────────────────────────────┐
│ Gateway │────>│ RUNTIME ENGINE │
│ (Auth, │ │ ┌─────────────────────────────────────┐ │
│ Route) │ │ │ MIDDLEWARE CHAIN │ │
└──────────┘ │ │ │ │
│ │ Auth → RateLimit → Transform → │ │
│ │ Execute → Retry → Log │ │
│ │ │ │
│ └─────────────────────────────────────┘ │
└─────────────────────┬───────────────────────┘
│
┌──────────────────────────┼──────────────────────┐
│ │ │
┌──────v──────┐ ┌────────v────────┐ ┌──────v──────┐
│ Registry │ │ Config Store │ │ Vault │
│ (Templates │ │ (Per-tenant │ │(Credentials │
│ + Instances)│ │ settings) │ │ secrets) │
└─────────────┘ └─────────────────┘ └─────────────┘
EXTERNAL SYSTEMS CROSS-CUTTING
┌────────────────────┐ ┌─────────────────────┐
│ ServiceNow │ Jira │ │ Metrics │ Tracing │
│ Salesforce │ SAP │ │ Alerts │ Audit Log │
│ Workday │ etc. │ └─────────────────────┘
└────────────────────┘
| Decision | Choice | Why |
|---|---|---|
| Template vs Instance model | Separate template (blueprint) from instance (runtime config) | Like Docker Image vs Container. One "ServiceNow connector" template, many customer instances. |
| Middleware chain pattern | Ordered chain of composable middleware | Each concern (auth, rate-limit, transform) is isolated and testable. Easy to add new middleware. |
| Credential storage | HashiCorp Vault with dynamic secrets | Never store credentials in DB. Auto-rotation. Per-tenant isolation. Audit trail. |
| Rate limiting | Sliding window per (tenant, connector) | Respects external API limits. No burst spikes at window boundaries. Redis ZSET implementation. |
| Schema transformation | Declarative field mappings in JSON | Customers map their custom fields without code. "status" → "ticket_state", "assignee" → "owner". |
A Template is like a Docker Image — it defines WHAT a connector can do. An Instance is like a Container — it's a running configuration for a specific tenant with their credentials and custom mappings.
CONNECTOR TEMPLATE: ServiceNow
═══════════════════════════════════════════
{
"template_id": "servicenow-v2",
"name": "ServiceNow ITSM Connector",
"version": "2.3.1",
"auth_types": ["oauth2", "basic_auth"],
"base_url_pattern": "https://{instance}.service-now.com",
"capabilities": [
"read_tickets",
"create_ticket",
"update_ticket",
"list_groups",
"get_user",
"search_kb_articles"
],
"api_version": "v2",
"rate_limit_default": 500, // requests/minute
"required_fields": ["instance_name"],
"optional_fields": ["custom_table_prefix"]
}
CONNECTOR INSTANCE: Acme Corp's ServiceNow
═══════════════════════════════════════════
{
"instance_id": "inst-acme-snow-001",
"tenant_id": "acme-corp",
"template_id": "servicenow-v2",
"config": {
"instance_name": "acmecorp",
"base_url": "https://acmecorp.service-now.com"
},
"credential_ref": "vault://acme-corp/servicenow/oauth",
"field_mappings": {
"short_description": "title",
"assignment_group": "team",
"u_custom_field_1": "business_unit",
"u_location_code": "office_location"
},
"rate_limit_override": 300, // Acme's ServiceNow plan limit
"status": "active",
"health_check_interval": 60 // seconds
}
REQUEST FLOW THROUGH MIDDLEWARE CHAIN
═══════════════════════════════════════════════════════
Incoming Request
│
┌────v────┐ Inject credentials from Vault. Handle OAuth
│ AUTH │ token refresh automatically. mTLS cert loading.
└────┬────┘
│
┌────v────────┐ Check sliding window. Per (tenant, connector).
│ RATE LIMIT │ 429 if exceeded. Queue if near limit.
└────┬────────┘
│
┌────v──────────┐ Map internal schema → external API schema.
│ TRANSFORM │ Apply customer's field_mappings. Type coercion.
└────┬──────────┘
│
┌────v────────┐ HTTP call to external system. Connection pooling.
│ EXECUTE │ Timeout: 30s. Circuit breaker per instance.
└────┬────────┘
│
┌────v────┐ Exponential backoff: 1s, 2s, 4s. Max 3 retries.
│ RETRY │ Only on 429, 503, 504. NOT on 400, 401, 404.
└────┬────┘
│
┌────v────┐ Full audit trail. Request/response (sanitized).
│ LOG │ Latency, status code, tenant, connector, operation.
└────┬────┘
│
v
Response to AI Agent
SLIDING WINDOW RATE LIMITER (Redis ZSET)
═══════════════════════════════════════════════════════
Key: rate_limit:{tenant_id}:{connector_id}
Score: timestamp (Unix ms)
Member: unique request ID
ALGORITHM (per request):
─────────────────────────────────────────────────────
1. ZADD key {now_ms} {request_id} // Add this request
2. ZREMRANGEBYSCORE key 0 {now_ms - 60000} // Remove requests older than 60s
3. count = ZCARD key // Count requests in window
4. IF count > limit: REJECT (429) // Over limit
ELSE: ALLOW // Under limit
5. EXPIRE key 120 // TTL cleanup safety net
EXAMPLE (limit = 5 requests/minute):
─────────────────────────────────────────────────────
Time Action ZSET Size Result
00:00 Request A 1 ALLOWED
00:15 Request B 2 ALLOWED
00:30 Request C 3 ALLOWED
00:45 Request D 4 ALLOWED
00:50 Request E 5 ALLOWED
00:55 Request F 6 REJECTED (429)
01:05 Request G 5 ALLOWED (A expired at 01:00)
01:20 Request H 5 ALLOWED (B expired at 01:15)
Fixed window problem: Limit is 100/min. User sends 100 requests at 0:59, then 100 more at 1:01. That's 200 requests in 2 seconds — the external API sees a burst and throttles us.
Sliding window: Always counts the last 60 seconds exactly. No burst at window boundaries. External APIs stay happy.
OAUTH TOKEN LIFECYCLE
═══════════════════════════════════════════
1. Customer configures connector in UI
→ Redirects to external system's OAuth consent screen
→ Receives authorization code
2. Backend exchanges code for access_token + refresh_token
→ Stores both in Vault: vault://acme/servicenow/oauth
→ Sets token_expiry metadata
3. At runtime (each API call):
→ Auth middleware reads token from Vault
→ If expires_at < now + 5min:
→ Use refresh_token to get new access_token
→ Store new token in Vault
→ Use new token for request
→ Inject Authorization: Bearer {token}
4. If refresh fails (token revoked):
→ Mark instance as "auth_failed"
→ Notify customer: "Please re-authenticate ServiceNow"
→ Stop processing requests (don't leak errors to users)