AI Ticket Triage System

"Design a system that routes IT tickets to the correct assignment group (out of hundreds) with >95% accuracy. Handle small data per customer and learn across organizations."

Why It Matters

Companies have 50-500 assignment groups. Manual triage takes 5-15 minutes per ticket. Misroutes add DAYS to resolution. The key challenge: each customer has too few tickets to train a good model alone. The answer is COLLECTIVE LEARNING.

Table of Contents

  1. Clarifying Questions & Scope
  2. Key Insight: The Small Data Problem
  3. Back-of-Envelope Estimation
  4. High-Level Architecture
  5. Deep Dive 1: Feature Engineering
  6. Deep Dive 2: Collective Learning
  7. Deep Dive 3: Confidence Routing
  8. Example Output
  9. Scaling & Feedback Loop
  10. Cheat Sheet

1 Clarifying Questions & Scope

Dimension Clarification Assumption
Assignment Groups How many groups per customer? 50-500 groups per customer
Ticket Volume Daily ticket volume per customer? 100-10K tickets/day per customer
Cross-Customer Learning Can we learn patterns across orgs? Yes — Collective Learning (share weights, not data)
Input Fields What ticket data is available? All fields: short desc, description, category, subcategory, priority, department, location
Low Confidence What happens when model is unsure? Low confidence tickets routed to human for manual triage

2 Key Insight: The Small Data Problem

THE MAIN CHALLENGE IS SMALL DATA. A single organization with 2,000 employees might have only 1,000 training examples across 50 groups. That's just 20 tickets per group on average — far too few for any ML model to learn reliable patterns. The solution: COLLECTIVE LEARNING across all customers.

Think of it this way: individually, each customer has too little data. But collectively, 350 customers generate millions of tickets. The challenge is learning shared patterns (like "password" relates to "access management") while respecting that each customer's group names and routing rules are different.

3 Back-of-Envelope Estimation

Scale Numbers

  • 350 customers x 1,000 tickets/day avg = 350K tickets/day
  • Peak load: ~15 tickets/second
  • Inference latency: <100ms (classification, NOT generation)
  • Nightly retraining per customer (~35 min total pipeline)
Confidence Band Threshold Action
HIGH >0.95 Auto-route immediately (no human in loop)
MEDIUM 0.70 - 0.95 Flag for human review with suggestion
LOW <0.70 Route to manual triage queue

4 High-Level Architecture

  TICKET TRIAGE PIPELINE — 4 LAYERS
  ═══════════════════════════════════════════════════════════════════

  LAYER 1: FEATURE EXTRACTION
  ┌──────────────────────────────────────────────────────────────┐
  │  Incoming Ticket                                             │
  │  ┌─────────┬────────────┬──────┬─────────┬────────┬───────┐ │
  │  │ Short   │ Description│ Cat  │ SubCat  │Priority│ Dept  │ │
  │  │ Desc    │            │      │         │        │ + Loc │ │
  │  └────┬────┴─────┬──────┴──┬───┴────┬────┴───┬────┴───┬───┘ │
  │       └──────────┴─────────┴────────┴────────┴────────┘     │
  │                         ALL FIELDS                           │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
  LAYER 2: BERT ENCODER          v
  ┌──────────────────────────────────────────────────────────────┐
  │  [SHORT] Cannot connect to VPN [DESC] Getting timeout error  │
  │  when trying to access corporate VPN from home [CAT] Network │
  │  [SUBCAT] VPN [PRIORITY] P2 [DEPT] Engineering [LOC] Remote  │
  │                                                              │
  │  Pre-trained BERT (shared across ALL customers)              │
  │  + Fine-tuned classification head (PER customer)             │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
  LAYER 3: CONFIDENCE ROUTER     v
  ┌──────────────────────────────────────────────────────────────┐
  │  ┌──────────┐  ┌─────────────┐  ┌──────────────────────┐    │
  │  │ >0.95    │  │ 0.70-0.95   │  │ <0.70                │    │
  │  │AUTO-ROUTE│  │FLAG + SUGGEST│  │MANUAL TRIAGE         │    │
  │  │ (60-70%) │  │ (20-25%)    │  │ (5-10%)              │    │
  │  └──────────┘  └─────────────┘  └──────────────────────┘    │
  └──────────────────────────────┬───────────────────────────────┘
                                 │
  LAYER 4: FEEDBACK LOOP         v
  ┌──────────────────────────────────────────────────────────────┐
  │  Human corrections → Labeled data → Nightly retrain          │
  │  Misroutes tracked → Per-group accuracy dashboard            │
  └──────────────────────────────────────────────────────────────┘

5 Deep Dive 1: Feature Engineering

Use ALL Fields — Not Just Description

Classical ML approaches failed because they used only 1-2 fields (typically just the short description). The breakthrough insight is that ALL fields matter, especially structured fields like department and location.

Input Format

  TOKENIZED INPUT TO BERT:

  [SHORT] Cannot connect to VPN
  [DESC] Getting timeout error when trying to access corporate VPN
         from home office. Started after laptop update yesterday.
  [CAT] Network
  [SUBCAT] VPN
  [PRIORITY] P2
  [DEPT] Engineering
  [LOC] Remote

CRITICAL EXAMPLE — Why ALL Fields Matter:

The exact same ticket text "Cannot connect to VPN" routes to DIFFERENT groups depending on context:

Engineering + Remote → routes to "Network Security" (they manage VPN certificates for remote engineers)

Sales + London → routes to "EMEA Desktop Support" (regional team handles office connectivity)

Without department and location, you'd route both to the same group — and be wrong 50% of the time.

Feature Importance Ranking

Feature Impact Why
Short Description HIGH Core intent signal — what the user needs
Department HIGH Determines which team variant handles it
Location HIGH Regional routing (APAC vs EMEA vs Americas)
Category / SubCategory MEDIUM Pre-classification signal (if available)
Description MEDIUM Additional context, but noisy and verbose
Priority LOW-MEDIUM Some groups only handle P1s (e.g., "Major Incident")

6 Deep Dive 2: Collective Learning

The Problem

  THE SMALL DATA PROBLEM
  ═══════════════════════════════════════════════════

  Single Customer (Acme Corp):
  ┌─────────────────────────────────────────────────┐
  │  50 assignment groups                            │
  │  × 20 tickets per group (average)                │
  │  = 1,000 total training examples                 │
  │                                                  │
  │  That's like trying to teach someone 50 topics   │
  │  with only 20 flashcards each. NOT ENOUGH.       │
  └─────────────────────────────────────────────────┘

  All Customers Combined:
  ┌─────────────────────────────────────────────────┐
  │  350 customers × 1,000 tickets/day               │
  │  × 365 days = 127.75 MILLION tickets/year        │
  │                                                  │
  │  PLENTY of data to learn that "password"          │
  │  relates to "access management" concepts.         │
  └─────────────────────────────────────────────────┘

The Solution: 3-Stage Training

1 Pre-train BERT on ALL customers' data

The shared BERT base learns universal IT patterns across all 350 customers. It learns that "password" relates to "access", "VPN" relates to "network", "printer" relates to "hardware". These are universal IT concepts that transfer across organizations.

2 Fine-tune per customer with customer-specific classification head

Each customer gets their own classification head (final layers) that maps the shared representations to THEIR specific assignment groups:

  SHARED BERT BASE (trained on ALL customers)
  ┌──────────────────────────────────────────────┐
  │  "password" → [access_concept_vector]         │
  │  "VPN"      → [network_concept_vector]        │
  │  "printer"  → [hardware_concept_vector]       │
  └──────────────────────┬───────────────────────┘
                         │
          ┌──────────────┼──────────────┐
          │              │              │
  ┌───────v──────┐ ┌────v─────────┐ ┌──v──────────────┐
  │ Acme Head    │ │ Beta Head    │ │ Gamma Head       │
  │              │ │              │ │                  │
  │ "password" → │ │ "password" → │ │ "password" →     │
  │ Identity Team│ │ IAM Group    │ │ Access Mgmt Team │
  └──────────────┘ └──────────────┘ └──────────────────┘

3 Transfer learning for brand-new customers (0 tickets)

New customer onboards with zero historical data. Use the shared BERT base + a generic classification head trained on similar-sized companies. Within 100 tickets of feedback, the customer-specific head starts outperforming the generic one.

Privacy Guarantee

We share WEIGHTS, not DATA. No customer ever sees another customer's tickets. The shared BERT base is trained on aggregated patterns — it learns that "password" relates to access concepts, not that "John from Acme" had a password issue. This is the same principle behind federated learning.

7 Deep Dive 3: Confidence Routing

Three-Band Routing

Band Confidence Action % of Tickets
AUTO-ROUTE >0.95 Route immediately, no human review 60-70%
FLAG 0.70 - 0.95 Suggest group, human confirms/corrects 20-25%
MANUAL <0.70 Route to manual triage queue 5-10%

Why Variable Thresholds?

Thresholds are tuned per customer. A customer with 500 groups needs higher confidence than one with 50 groups (more room for confusion). A customer in healthcare needs higher confidence than one in retail (higher cost of misroute). Result: 96% accuracy on auto-routed tickets across the board.

Confidence Calibration

8 Example Output

Ticket Predicted Group Confidence Action
"Password reset for SAP" Identity & Access Mgmt 0.98 AUTO-ROUTE
"Laptop screen flickering" Desktop Support - HQ 0.96 AUTO-ROUTE
"Need access to Salesforce" SaaS Provisioning 0.82 FLAG (suggest)
"Application running slow" App Support? Infra? 0.61 MANUAL TRIAGE
"New hire setup for Tokyo" APAC Onboarding 0.93 FLAG (suggest)

9 Scaling & Feedback Loop

Model Architecture & Serving

Data Flywheel

  THE DATA FLYWHEEL
  ═══════════════════════════════════════════════════

  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │  Ticket      │────>│  ML Model    │────>│  Auto-Route  │
  │  Submitted   │     │  Predicts    │     │  or Flag     │
  └──────────────┘     └──────────────┘     └──────┬───────┘
         ^                                         │
         │                                         v
  ┌──────┴───────┐     ┌──────────────┐     ┌──────────────┐
  │  Nightly     │<────│  Labeled     │<────│  Human       │
  │  Retrain     │     │  Data Store  │     │  Correction  │
  └──────────────┘     └──────────────┘     └──────────────┘

  Every correction makes the model smarter.
  More accuracy → more auto-routes → less human work → faster resolution.

Monitoring Metrics

Metric Example Value Action Trigger
Overall Accuracy 96.2% Alert if drops below 94%
Auto-Route Rate 67% Investigate if drops below 55%
Per-Group Accuracy (worst) "EMEA Infra": 89% Flag groups below 90% for review
Common Misroute Pair "Desktop Support" ↔ "Hardware" Consider merging groups or adding features
New Group Detection 3 new groups this month Auto-trigger retraining with new labels

10 Cheat Sheet

AI Ticket Triage — Key Numbers

  • 350K tickets/day, 15 tickets/sec peak
  • <100ms inference (classification, not generation)
  • Collective Learning: shared BERT base + per-customer head
  • Use ALL fields: short desc + desc + cat + subcat + priority + dept + location
  • 3-band confidence: >0.95 auto, 0.70-0.95 flag, <0.70 manual
  • 96% accuracy on auto-routed tickets
  • 60-70% of tickets auto-routed (no human needed)
  • Nightly retraining per customer, weekly shared base update
  • New customer: transfer learning from shared base, effective within 100 tickets
  • Share WEIGHTS not DATA (privacy preserved)