Agent Memory Systems
Master AI agent memory patterns including context window management, RAG, episodic memory, and persistent storage for stateful agents.
Agent Memory Systems
Memory transforms stateless AI calls into stateful agents. The right memory strategy depends on your context window budget, task duration, and whether the agent needs to remember across sessions.
Context Window Management
The simplest memory — keep recent conversation in context. The challenge is fitting relevant information into the available window.
Sliding Window: Keep the last N turns. Oldest messages are dropped first.
System: You are a helpful assistant. Keep the last 5 messages in context.
Conversation history (last 5 turns):
User: What's the capital of France?
Assistant: Paris.
User: What's its population?
Assistant: About 2.1 million.
User: What about the metro area?
Assistant: Over 13 million.
User: What's the best time to visit?
Assistant:
Window size depends on your context budget. For a 128K model, a window of 50 turns costs roughly 15-20K tokens. For an 8K model, keep it to 5-10 turns.
Summarization: Periodically summarize older context into a condensed form.
When the conversation exceeds 10 messages:
1. Summarize the key facts and decisions from messages 1-10
2. Replace those messages with: [Summary: {summary}]
3. Continue with the current conversation
Current summary: The user is planning a trip to Paris and wants
information about attractions, transportation, and accommodation.
Target budget: Keep summaries under 500 tokens.
Selective Retention: Keep important facts while discarding trivial details.
Maintain a "key facts" section that updates after each exchange:
- User's name: {name}
- Current project: {project}
- Important preferences: {preferences}
- Decisions made: {decisions}
- Pending action items: {items}
Discard: greetings, confirmations, small talk, off-topic tangents.
Update the key facts section every 3 messages.
Hybrid approach: Combine window + summary for best results.
Maintain two memory sections:
1. Recent (last 10 messages) — verbatim, no summarization
2. Long-term summary — condensed version of everything before the window
When composing a response, use both sections as context.
Update the long-term summary when it grows beyond 1000 tokens.
Memory budget calculation:
| Model Context | Window Size | Summary Budget | Total Budget |
|---|---|---|---|
| 8K tokens | 10 turns (~4K) | 500 tokens | 4.5K (56%) |
| 32K tokens | 25 turns (~10K) | 1K tokens | 11K (34%) |
| 128K tokens | 50 turns (~20K) | 2K tokens | 22K (17%) |
RAG as Memory
Retrieve relevant information from a knowledge base when needed. Best for factual recall that doesn't fit in the context window.
Before answering, search your knowledge base for relevant information:
1. Convert the user's question into a search query
2. Retrieve the top 3 most relevant documents
3. Use the retrieved information to answer
Retrieved documents:
{document_1}
{document_2}
{document_3}
User question: {question}
Real-world example — documentation chatbot with version awareness:
Tools:
- search_docs(query: string, product_version: string) → [doc_chunks]
User: "How do I set up authentication in v3?"
1. search_docs("authentication setup", "v3.0")
→ Retrieved 3 chunks about auth configuration
2. Answer using those chunks, noting the version.
If the user asks about a different version later,
retrieve from that version's docs instead.
Cache invalidation: RAG memory can become stale. Include freshness checks.
For each retrieved document, check its last_updated date.
If the document is older than 30 days, add a caveat:
"This information was last updated on {date}. Verify against current docs."
Best for: Factual Q&A, documentation assistants, customer support with knowledge bases, any scenario with more information than fits in context.
Episodic Memory
Store and recall specific past interactions — what worked, what didn't, user preferences.
You have interacted with this user before. Here are relevant past interactions:
Past session 1 (2 days ago):
- Project: React dashboard with real-time data
- Preferred: TypeScript, Zustand for state, Tailwind CSS
- Disliked: Redux, class components, verbose comments
Past session 2 (yesterday):
- Asked about chart libraries
- Recommended: Recharts for simplicity
- User chose: Nivo for animation support
Use this history to tailor your recommendations and tone.
When a user returns after a previous session:
1. Greet them by name and reference their last project
2. Ask if they want to continue where they left off
3. Suggest next steps based on previous work
4. Do not assume — ask if their priorities have changed
Session merging: When relevant facts accumulate across sessions, merge them into a single profile.
Merge new information from this session into the user's profile:
- New facts learned this session: {new_facts}
- Contradictions with stored facts: {contradictions}
- Resolve contradictions by asking the user which is correct
Best for: Personal assistants, coding companions, any system that benefits from knowing user history.
Persistent Memory
Long-term storage using external databases, knowledge graphs, or files. Survives across sessions and even across different agent instances.
Knowledge Graph: Store entities and their relationships.
Maintain a knowledge graph of user information:
- Entities: people, projects, preferences, facts, dates
- Relationships: "works on", "prefers", "located in", "mentioned in"
When new information arrives:
1. Extract entities and relationships from the conversation
2. Check if entities already exist in the graph
3. Update existing entities or create new ones
4. Resolve conflicts by asking the user
Graph state (current session):
- User: Bruce → works on → Prompt Genius
- User: Bruce → prefers → TypeScript
- Project: Prompt Genius → uses → Next.js
- Project: Prompt Genius → uses → Cloudflare Pages
File-based memory: Save and load from structured files. Simpler than a graph, good for small-scale use.
You have access to a persistent memory file: {user_id}_memory.json
On session start:
1. Load {user_id}_memory.json
2. Parse the JSON into your working context
During the session, track:
- New facts learned (with timestamps)
- Changes to existing facts
- Action items created or completed
- User feedback on your responses
On session end:
1. Update {user_id}_memory.json with new information
2. Do not overwrite — merge changes carefully
3. Keep the file under 100KB to avoid token waste
Example schema:
{
"user_name": "Bruce",
"projects": [{"name": "Prompt Genius", "tech_stack": ["Next.js", "Cloudflare"]}],
"preferences": {"verbosity": "concise", "tone": "direct"},
"session_history": [{"date": "2026-05-01", "summary": "Worked on routing config"}],
"action_items": [{"task": "Deploy v2.1", "status": "pending"}]
}
Best for: Long-term assistants, personal knowledge bases, applications where users expect the system to remember them across days or weeks.
Privacy & Security
Memory storage introduces data risks. Address them explicitly.
- Data retention policy — Define how long memories are stored. Provide a "forget me" command.
- User consent — Ask before storing personal information. "Would you like me to remember your preferred framework?"
- Memory scrubbing — Allow users to view, edit, or delete stored memories.
- Cross-session leakage — Ensure one user's memories don't leak into another user's session. Use strict user_id isolation.
- Sensitive data — Never store passwords, API keys, or personal identifiable information in memory unless encrypted.
Forget command: "Forget everything about {topic}"
If the user asks to forget, remove all related entities and facts from storage.
Memory Budget Planning
Choose your strategy based on available context window:
| Memory Approach | Token Cost per Turn | Setup Complexity | Cross-Session |
|---|---|---|---|
| Sliding window | ~50-100 tokens (per message) | None | No |
| Summarization | ~200-500 tokens (periodic) | Low | No |
| RAG retrieval | ~500-1500 tokens (per retrieval) | Medium | Yes |
| Episodic (summary injection) | ~300-800 tokens (session start) | Medium | Yes |
| Knowledge graph | ~200-500 tokens (query results) | High | Yes |
| File-based JSON | ~500-2000 tokens (full load) | Low | Yes |
For an 8K model, stick to sliding window + optional summarization. For 128K+ models, you can afford RAG plus episodic memory.
Hybrid Architecture Example
A customer support agent using all 4 memory types:
Memory layers (checked in order):
1. Context window (last 10 messages) — recent conversation state
→ "User just said the payment failed"
2. Episodic memory (past sessions)
→ "User had the same issue 3 days ago. Resolution was card expiration."
3. RAG (knowledge base)
→ "Payment failure docs: check card, check balance, check network"
4. Persistent memory (user profile)
→ "User prefers email confirmations, not SMS"
Decision flow:
- Use context window for immediate replies
- Check episodic if user says "this happened before"
- Query RAG for technical answers
- Reference persistent memory for personalization
Best Practices
- Layer your memory - Use context window for recent, RAG for facts, persistent storage for long-term.
- Be selective - More memory isn't always better; noise hurts quality.
- Update proactively - Refresh stored information when it becomes stale.
- Respect privacy - Give users control over what's stored and for how long.
- Handle misses - When memory is empty or irrelevant, fall back to general knowledge.
- Budget proactively - Calculate token costs before choosing a strategy, not after.
Related Articles
Business RFP Template: ChatGPT Guide for Proposals
Master creating effective Request for Proposal documents with ChatGPT. Learn to structure RFPs, define requirements, and evaluate vendor responses professionally.
1990s Grunge & Early Digital SREF Codes
Alternative culture and early digital aesthetics with high gloss magazines, authentic 90s film, and raw grunge energy.
Product Mockup Prompts: E-commerce Photography
Create professional e-commerce product photography and packaging mockups with Nano Banana. Master lighting and composition.