Copilot, Memory, Context

Memory and context layer for production copilots

Stateful memory and context retrieval for a copilot that talks to thousands of users a day.

Client

Vertical SaaS copilot, US

Duration

7 weeks

Year

2026

The problem

A copilot was stuffing entire chat histories and large documents into every prompt. Costs spiked, latency drifted, and the model lost the thread on long conversations. There was no separation between what the user just said, what the user said last week, and what the product knew about them.

The solution

Designed a three-tier memory layer. Short-term working memory in a turn buffer, long-term semantic memory in a vector store keyed by user and tenant, and episodic memory as a structured summary written back to DynamoDB after each session. A context builder Lambda assembles a per-turn context window using a budget, recency, and relevance score, with explicit slots for user profile, tools, and retrieved knowledge. Eval pipeline regression-tests context quality on a frozen set.

Architecture

Copilot memory and context build

Stack

AWS BedrockDynamoDBOpenSearch (kNN)S3LambdaStep Functions

Outcomes

−47 percent

Token spend per session

+22 points

User-rated answer quality

180 ms

P95 context build time

Want something like this?

More case studies on the projects page.

See all projects