Enterprise Knowledge Graph

"Design a knowledge graph that models org structure, systems, relationships, and permissions. The AI agent uses this graph for context-aware decision making."

Table of Contents

  1. Clarifying Questions & Scope
  2. Back-of-Envelope Estimation
  3. High-Level Architecture
  4. Deep Dive 1: Graph Schema
  5. Deep Dive 2: Sync Pipeline
  6. Deep Dive 3: Agent Integration
  7. Scaling & ML
  8. Cheat Sheet

1 Clarifying Questions & Scope

Dimension Clarification Assumption
Data Sources Where does graph data come from? HR systems, AD/LDAP, ITSM, CMDB, ticketing systems
Entity Types What kinds of nodes? Person, Team, Role, Application, Permission, Document, Ticket
Query Patterns What questions does the agent ask? "Who manages X?", "Who has access to Y?", "What team owns Z?"
Freshness How current must the graph be? Real-time for org changes, daily for CMDB/app data
Scale How large? 50K nodes/customer, 500K edges/customer, 350 customers

2 Back-of-Envelope Estimation

Scale Numbers

  • 17.5M nodes total (350 customers x 50K nodes avg)
  • 175M edges total (350 customers x 500K edges avg)
  • 1M graph queries/day across all tenants
  • <50ms for 2-hop traversal queries
  • Graph size per tenant: ~500 MB (50K nodes, 500K edges)
  • Total graph storage: ~175 GB

3 High-Level Architecture

  ENTERPRISE KNOWLEDGE GRAPH
  ═══════════════════════════════════════════════════════════════════

  DATA SOURCES                SYNC PIPELINE               GRAPH STORE
  ┌──────────┐              ┌──────────────┐           ┌──────────────┐
  │ HR System│──webhook──-->│              │           │              │
  │ (Workday)│              │   Change     │           │   Neo4j      │
  ├──────────┤              │   Detection  │           │              │
  │ AD/LDAP  │──webhook──-->│      +       │──────────>│   Nodes:     │
  ├──────────┤              │   Conflict   │           │   Person     │
  │ ITSM     │──polling──-->│   Resolution │           │   Team       │
  │ (CMDB)   │              │      +       │           │   Role       │
  ├──────────┤              │   Validation │           │   Application│
  │ Ticketing│──polling──-->│              │           │   Permission │
  └──────────┘              └──────────────┘           │   Document   │
                                                       │   Ticket     │
                                                       └──────┬───────┘
                                                              │
                                                    ┌─────────v────────┐
                                                    │   Query Engine   │
                                                    │   Cypher queries │
                                                    │   Cached paths   │
                                                    └─────────┬────────┘
                                                              │
                                                    ┌─────────v────────┐
                                                    │   AI Agent       │
                                                    │   Context-aware  │
                                                    │   decisions      │
                                                    └──────────────────┘

4 Deep Dive 1: Graph Schema

Node Types

Node Type Key Properties Source System
Person name, email, employee_id, department, location, title, status HR (Workday), AD/LDAP
Team name, team_id, type (engineering/ops/support), size HR, ServiceNow
Role name, role_id, level (viewer/editor/admin), scope IAM, AD
Application name, app_id, type (SaaS/internal), criticality, owner_team CMDB
Permission permission_id, type (read/write/admin), scope, expiry IAM, AD, App-specific
Document title, doc_id, type, created_by, space, last_modified Confluence, SharePoint
Ticket ticket_id, type, status, priority, assignee, created_date ServiceNow, Jira

Edge Types (Relationships)

Edge Type From → To Properties
MANAGES Person → Person since_date
MEMBER_OF Person → Team role_in_team, since_date
HAS_ROLE Person → Role granted_date, granted_by
HAS_ACCESS_TO Person/Role → Application access_level, granted_date, expiry
OWNS Team → Application ownership_type (primary/secondary)
CREATED_BY Document/Ticket → Person created_date

Example Graph Path

  EXAMPLE: "Who can approve Jane's access to Salesforce?"
  ═══════════════════════════════════════════════════════

  (Jane)──MEMBER_OF──>(Engineering Team)
    │                       │
    │                       └──OWNS──>(Internal Tools)
    │
    └──MANAGES──>(Sarah Chen - Manager)
                      │
                      └──HAS_ROLE──>(Approver Role)
                                         │
                                         └──HAS_ACCESS_TO──>(Salesforce)

  Traversal: Jane → MANAGES → Sarah → HAS_ROLE → Approver
  Answer: "Sarah Chen (Jane's manager) can approve Salesforce access.
           She has the Approver role with admin-level access."

  ANOTHER EXAMPLE: "What systems does the Security team own?"
  ═══════════════════════════════════════════════════════

  (Security Team)──OWNS──>(Okta)
        │           └──>(CrowdStrike)
        │           └──>(Vault)
        │           └──>(PagerDuty)
        │
        └──MEMBER_OF──>(Alice - Lead)
        └──MEMBER_OF──>(Bob - Engineer)
        └──MEMBER_OF──>(Carol - Analyst)

5 Deep Dive 2: Sync Pipeline

Dual Sync Strategy

Conflict Resolution

Source of Truth Hierarchy

When two sources disagree about the same fact, the source system of record wins:

Org structure (manager, department): HR system (Workday) is truth
Group memberships: AD/LDAP is truth
Application ownership: CMDB is truth
Permissions: IAM system is truth

If Workday says Jane reports to Sarah but ServiceNow says Jane reports to Mike, Workday wins. Always.

Change Detection

6 Deep Dive 3: Agent Integration

How the AI Agent Uses the Graph

1 Context Queries

When a user starts a conversation, the agent enriches context by querying the graph:

  User: jane@acme.com starts a conversation

  Agent queries graph:
  ─────────────────────────────────────────
  MATCH (p:Person {email: "jane@acme.com"})
  OPTIONAL MATCH (p)-[:MEMBER_OF]->(t:Team)
  OPTIONAL MATCH (p)-[:MANAGES]->(m:Person)
  OPTIONAL MATCH (p)-[:HAS_ACCESS_TO]->(a:Application)
  RETURN p, t, m, a

  Result enriches conversation context:
  "Jane is in Engineering, managed by Sarah,
   has access to Jira, GitHub, Salesforce."

2 Approval Routing (Traverse MANAGES)

3 Permission Checking (Traverse HAS_ACCESS_TO)

4 Smart Suggestions

7 Scaling & ML

Scaling Strategies

ML Enhancements

8 Cheat Sheet

Enterprise Knowledge Graph — Key Numbers

  • 17.5M nodes, 175M edges across 350 tenants
  • 1M graph queries/day, <50ms for 2-hop traversal
  • 7 node types: Person, Team, Role, Application, Permission, Document, Ticket
  • 6 edge types: MANAGES, MEMBER_OF, HAS_ROLE, HAS_ACCESS_TO, OWNS, CREATED_BY
  • Neo4j with per-tenant partitioning
  • Webhooks (real-time) + nightly batch (completeness)
  • Conflict resolution: source system of record wins
  • Agent uses graph for: context, approval routing, permission checks, suggestions
  • Graph embeddings (Node2Vec) for "similar people" queries
  • Link prediction for access recommendations