"Design a memory system with three layers: short-term (current conversation), working (intermediate computation results), and long-term (user preferences, learned patterns across sessions)."
| Memory Layer | Storage | TTL | Access Pattern |
|---|---|---|---|
| Short-term | Redis | 24 hours | Sequential reads, append-only during conversation |
| Working | Redis | 1 hour | Random reads/writes during task execution |
| Long-term | PG + pgvector | Configurable (90d-forever) | Semantic search, structured queries |
3-LAYER MEMORY ARCHITECTURE
═══════════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────┐
│ CONTEXT WINDOW MANAGER │
│ Assembles the optimal context for each LLM call: │
│ 1. System prompt (fixed) │
│ 2. Relevant long-term memories (semantic search) │
│ 3. Working memory (current task state) │
│ 4. Recent conversation (last N turns) │
│ 5. Current user message │
└───────┬──────────────────┬──────────────────┬──────────────┘
│ │ │
┌───────v──────┐ ┌───────v──────┐ ┌────────v─────────┐
│ SHORT-TERM │ │ WORKING │ │ LONG-TERM │
│ │ │ │ │ │
│ Redis │ │ Redis │ │ PostgreSQL │
│ TTL: 24hr │ │ TTL: 1hr │ │ + pgvector │
│ │ │ │ │ │
│ Conversation │ │ Intermediate │ │ User preferences │
│ messages │ │ results │ │ Interaction history│
│ (ordered) │ │ (key-value) │ │ Learned patterns │
│ │ │ │ │ Semantic search │
│ ~8 GB │ │ ~16.8 GB │ │ ~3.78 TB │
└──────────────┘ └──────────────┘ └───────────────────┘
SHORT-TERM MEMORY (Redis)
═══════════════════════════════════════════
Key: conv:{conversation_id}
Type: Redis List (ordered messages)
TTL: 24 hours
LPUSH conv:abc123 {
"role": "user",
"content": "I need VPN access",
"timestamp": "2026-03-15T10:00:00Z",
"tokens": 8
}
LPUSH conv:abc123 {
"role": "assistant",
"content": "I'll help you with VPN access. Which office?",
"timestamp": "2026-03-15T10:00:02Z",
"tokens": 14,
"tool_calls": []
}
LRANGE conv:abc123 0 -1 → Full conversation history
LRANGE conv:abc123 0 9 → Last 10 messages
When a conversation exceeds the context budget, older messages are summarized:
When an agent executes a multi-step workflow, it needs to remember intermediate results. Working memory stores these temporarily.
Step 1: Look up user in HR system → Working memory stores: { "manager": "Sarah Chen", "manager_id": "SC-4521" }
Step 2: Query approval system with manager_id → Working memory stores: { "pending_approvals": [{ "id": "APR-892", "type": "VPN Access", "status": "pending" }] }
Step 3: Compose answer using both results from working memory: "Your manager is Sarah Chen. She has one pending approval for you: VPN Access request (APR-892)."
WORKING MEMORY (Redis Hash)
═══════════════════════════════════════════
Key: work:{conversation_id}:{task_id}
Type: Redis Hash
TTL: 1 hour
HSET work:abc123:task-001 "step_1_result" '{
"manager": "Sarah Chen",
"manager_id": "SC-4521",
"source": "workday_api",
"retrieved_at": "2026-03-15T10:00:01Z"
}'
HSET work:abc123:task-001 "step_2_result" '{
"pending_approvals": [...],
"source": "approval_system",
"retrieved_at": "2026-03-15T10:00:03Z"
}'
HSET work:abc123:task-001 "plan" '{
"steps": ["lookup_manager", "check_approvals", "compose_answer"],
"current_step": 2,
"status": "in_progress"
}'
Observation: User asked about VPN 3 times in the last month (March 1, March 8, March 14).
Stored as: "User has recurring VPN connectivity issues. Previous solutions: certificate renewal (March 1), DNS cache flush (March 8)."
Used when: Next time user mentions VPN, agent proactively says: "I see you've had VPN issues before. Last time, flushing the DNS cache resolved it. Would you like to try that first?"
IMPORTANT: Store summaries, not raw conversations. Raw messages contain too much noise and PII. Summarization extracts the useful signal: preferences, patterns, resolved issues, key decisions. This also dramatically reduces storage costs.