Copilot, Memory, Context
Memory and context layer for production copilots
Stateful memory and context retrieval for a copilot that talks to thousands of users a day.
Client
Vertical SaaS copilot, US
Duration
7 weeks
Year
2026
The problem
A copilot was stuffing entire chat histories and large documents into every prompt. Costs spiked, latency drifted, and the model lost the thread on long conversations. There was no separation between what the user just said, what the user said last week, and what the product knew about them.
The solution
Designed a three-tier memory layer. Short-term working memory in a turn buffer, long-term semantic memory in a vector store keyed by user and tenant, and episodic memory as a structured summary written back to DynamoDB after each session. A context builder Lambda assembles a per-turn context window using a budget, recency, and relevance score, with explicit slots for user profile, tools, and retrieved knowledge. Eval pipeline regression-tests context quality on a frozen set.
Architecture
Stack
Outcomes
−47 percent
Token spend per session
+22 points
User-rated answer quality
180 ms
P95 context build time