"Design a search system that indexes Confluence, SharePoint, Slack, Google Drive, and ServiceNow KB. Users ask natural language queries. Must enforce per-user access permissions. Multi-tenant architecture with <500ms retrieval latency."
| Dimension | Clarification | Assumption |
|---|---|---|
| Document Volume | How many docs per customer? | 100K - 10M documents per customer |
| Freshness | How quickly must new content be searchable? | Fresh within minutes (near real-time) |
| Access Control | How are permissions enforced? | ACL-based permissions per document/space |
| Tenant Isolation | How strict is data isolation? | Strict isolation — zero cross-tenant leakage |
| Answer Generation | Return docs or generate answers? | Generate answers with citations to source docs |
| Metric | Target |
|---|---|
| Retrieval Latency | <500ms (vector + keyword + rerank) |
| End-to-End (with answer) | <3 seconds (including LLM generation) |
| Ingestion Throughput | 10K documents/minute per connector |
| Freshness SLA | <5 minutes from source update |
| Permission Accuracy | 100% (zero unauthorized access) |
DATA SOURCES INGESTION PIPELINE STORAGE
+-----------+ +------------------+
| Confluence|--+ | Vector DB |
+-----------+ | +-----------+ +---------+ +---------+ | (Pinecone/ |
| SharePoint|--+-->| Connectors|-->| Extract |-->| Chunk |--+-|> Qdrant) |
+-----------+ | | REST / | | Text | | 500-1K | | +------------------+
| Slack |--+ | Graph API | | (Tika) | | tokens | |
+-----------+ | | Webhooks | +---------+ | overlap | | +------------------+
|Google Drv |--+ | + Polling | | 100 tok | | | Elasticsearch |
+-----------+ | +-----------+ +---------+ +-|> (BM25 keyword) |
|ServiceNow|--+ | +------------------+
+-----------+ +---------+
| Embed | +------------------+
| ada-002 | | Metadata Store |
+---------+ | (PostgreSQL) |
+------------------+
EACH CHUNK STORES:
text | source_url | author | timestamp | section_title | ACL_metadata | tenant_id
USER QUERY FLOW
+--------+ +-------+ +----------+ +------------------+ +------------+
| User |--->| NLU |--->| Query |--->| Hybrid Retrieval |--->| Permission |
| Query | |Intent | | Expansion| | | | Filter |
+--------+ +-------+ |Synonyms | | Vector (cosine) | +-----+------+
+----------+ | + BM25 (keyword) | |
+------------------+ v
+------------+
+--------+ +----------+ +---------+ | Re-rank |
| User |<---| Answer |<---| LLM |<-----------------------| Cross-enc |
|Response| | + Cites | |Generate | | Top-50 |
+--------+ +----------+ +---------+ +------------+
Primary: Webhooks for real-time updates (<1 min latency). Each source sends change events to our ingestion queue.
Fallback: Polling every 5 minutes to catch missed webhooks. Reconciliation job compares source timestamps vs. our last-indexed timestamps. This guarantees no document is missed even if webhooks fail silently.
Query: "How do I reset my password?"
Matches document titled: "Credential Recovery Procedures"
BM25 would miss this entirely — no keyword overlap. Vector search captures semantic similarity.
Query: "VPN-2847 error code"
Matches document containing: "Error VPN-2847: Certificate expired"
Vector search might return generic VPN docs. BM25 nails the exact error code.
Combine results from both retrieval methods using RRF:
RRF Score Formula: ───────────────────────────────────────────────────── score(doc) = 1/(k + rank_vector) + 1/(k + rank_keyword) where k = 60 (standard constant) Example: ┌──────────┬──────────────┬──────────────┬───────────┐ │ Document │ Vector Rank │ BM25 Rank │ RRF Score │ ├──────────┼──────────────┼──────────────┼───────────┤ │ Doc A │ 1 │ 5 │ 0.0317 │ │ Doc B │ 3 │ 2 │ 0.0321 │ ← Winner │ Doc C │ 2 │ 8 │ 0.0309 │ │ Doc D │ 10 │ 1 │ 0.0307 │ └──────────┴──────────────┴──────────────┴───────────┘ Doc B ranks best overall — good in BOTH systems.
CRITICAL REQUIREMENT: A user must NEVER see content they don't have access to in the source system. This is a compliance and trust requirement — a single violation can lose a customer. Permission filtering is not optional, it's the #1 priority.
PERMISSION FILTERING FLOW
─────────────────────────────────────────────────────────────
Query: "How to configure SSO?" User: jane@acme.com
┌──────────────────┐
│ Hybrid Retrieval │ → Top 200 results (no permission check yet)
└────────┬─────────┘
│
┌────────v─────────┐
│ Resolve User ACL │ → jane@acme.com is in: [engineering, sso-admins, all-staff]
│ (Redis cache 5m) │
└────────┬─────────┘
│
┌────────v─────────┐
│ Filter by ACL │ → 200 results → 47 accessible → Top 10
│ chunk.acl ∩ user │
└────────┬─────────┘
│
┌────────v─────────┐
│ Cross-Encoder │ → Re-rank top 10 → Final 5 for answer generation
│ Re-rank │
└────────┬─────────┘
│
┌────────v─────────┐
│ LLM Answer Gen │ → "To configure SSO, follow these steps... [1][2][3]"
│ with Citations │
└──────────────────┘
| Metric | Target | How Measured |
|---|---|---|
| MRR (Mean Reciprocal Rank) | >0.65 | Position of first relevant result in top-10 |
| NDCG@10 | >0.70 | Graded relevance of top-10 results |
| Answer Accuracy | >90% | Human eval of LLM-generated answers (weekly sample) |
| Citation Accuracy | >95% | Do citations actually support the generated answer? |
| Permission Accuracy | 100% | Automated audit: compare returned results against source ACLs |