Agent Memory Architecture: Working, Long-term, and Shared Memory Trade-offs
A systematic comparison of the three categories of agent memory -- working, long-term, and shared -- covering storage media, lifecycle, retrieval methods, typical frameworks, and design patterns, fully addressing agent personalization and multi-agent collaboration engineering.
Agent Memory Architecture: Working, Long-term, and Shared Memory Trade-offs
An agent without memory is a "goldfish" -- every conversation starts from zero, unable to build true continuity. Agent memory architecture determines whether a system can learn across sessions, share knowledge between agents, and maintain consistency in long-running tasks. This article systematically compares three categories of agent memory -- working memory, long-term memory, and shared memory -- and the design trade-offs in different scenarios.
Why Memory Is Core to Agent Systems
Traditional LLMs are stateless -- every prompt is an independent request with no history. But agent systems need to solve three fundamental problems:
1. Context continuity within a single session. In multi-turn conversations between user and agent, earlier messages determine later responses. Without "working memory," the agent cannot resolve references, omissions, or context.
2. Cross-session user memory. A customer service agent should remember a user's previous issue next time; a personal assistant agent should remember the user's preferences and past decisions. Cross-session "long-term memory" is the foundation of agent personalization.
3. Knowledge sharing between multiple agents. In multi-agent systems, knowledge retrieved by one agent should be reusable by others; experience learned by one agent should accumulate into a shared knowledge base.
These three needs correspond to three memory architectures: working memory (short-term), long-term memory (cross-session), and shared memory (cross-agent). Their storage media, retrieval methods, and life cycles are entirely different.
Category 1: Working Memory
Working memory stores the current session's context -- the most basic form of agent memory.
from dataclasses import dataclass, field
from typing import Literal
from datetime import datetime
@dataclass
class Message:
role: Literal["user", "assistant", "system", "tool"]
content: str
timestamp: datetime = field(default_factory=datetime.now)
metadata: dict = field(default_factory=dict)
@property
def tokens(self) -> int:
return len(self.content) // 4
class WorkingMemory:
def __init__(self, max_tokens: int = 8000, max_messages: int = 50):
self.messages: list[Message] = []
self.max_tokens = max_tokens
self.max_messages = max_messages
self.summary: str = ""
def add(self, message: Message) -> None:
self.messages.append(message)
self._maybe_compress()
def get_context(self) -> list[dict]:
context = []
if self.summary:
context.append({
"role": "system",
"content": f"[Conversation summary so far] {self.summary}",
})
for msg in self.messages:
context.append({
"role": msg.role,
"content": msg.content,
})
return context
def total_tokens(self) -> int:
return sum(m.tokens for m in self.messages) + (len(self.summary) // 4)
def _maybe_compress(self) -> None:
if self.total_tokens() > self.max_tokens or len(self.messages) > self.max_messages:
recent = self.messages[-4:]
old = self.messages[:-4]
self.summary = self._summarize(old, self.summary)
self.messages = recent
def _summarize(self, old_messages, prev_summary) -> str:
return prev_summary
Working memory design points:
- Window size: typically 4K-32K tokens, limited by the LLM context window
- Compression strategy: when the window is exceeded, compress into "summary plus recent N turns"
- Structured storage: not just strings, preserve role, metadata, and tool_call_id
- Lifecycle: destroyed at session end, not persisted
Working memory pain points:
- Window bottleneck: long conversations quickly exhaust it
- Information loss: summaries inevitably lose detail
- Session discontinuity: switching devices or starting a new session erases everything
Category 2: Long-term Memory
Long-term memory stores cross-session user preferences, history, and knowledge, making the agent truly "know" the user.
import numpy as np
from datetime import datetime
class LongTermMemory:
def __init__(self, embedding_model, vector_store):
self.embedding_model = embedding_model
self.vector_store = vector_store
self.embeddings_dim = 1536
async def remember(
self,
content: str,
user_id: str,
memory_type: str = "fact",
importance: float = 0.5,
metadata: dict = None,
) -> str:
embedding = await self.embedding_model.aembed(content)
memory_id = f"{user_id}:{datetime.now().isoformat()}:{hash(content) % 10000}"
await self.vector_store.upsert(
id=memory_id,
vector=embedding,
payload={
"content": content,
"user_id": user_id,
"memory_type": memory_type,
"importance": importance,
"created_at": datetime.now().isoformat(),
"access_count": 0,
**(metadata or {}),
},
)
return memory_id
async def recall(
self,
query: str,
user_id: str,
top_k: int = 5,
memory_types: list[str] | None = None,
min_importance: float = 0.0,
) -> list[dict]:
query_emb = await self.embedding_model.aembed(query)
filters = {
"user_id": user_id,
"importance": {"$gte": min_importance},
}
if memory_types:
filters["memory_type"] = {"$in": memory_types}
results = await self.vector_store.search(
vector=query_emb,
top_k=top_k,
filter=filters,
)
for r in results:
await self.vector_store.update(
r["id"],
{"$inc": {"access_count": 1}},
)
return results
Long-term memory design patterns:
- Fact memory: static user information ("My name is Alex", "I live in Shanghai")
- Preference memory: user preferences ("I like concise answers")
- Episodic memory: significant events ("Bought tickets last week")
- Skill memory: user behavior patterns ("Usually asks in Chinese")
Key challenges in long-term memory:
- When to write: when does a short-term memory "graduate" to long-term? Too-frequent writes pollute; too-infrequent writes lose data
- Deduplication and merging: when the same fact appears multiple times, merge rather than store duplicates
- Expiration: some memories become outdated ("tomorrow's meeting" is meaningless the day after)
- Conflict resolution: user preferences can change ("I used to like coffee, now I like tea") -- update rather than append
The Mem0 framework specifically addresses these challenges: it provides LLM-driven memory extraction, merging, and conflict resolution.
Category 3: Shared Memory
Shared memory addresses knowledge sharing between multiple agents. Knowledge retrieved by one agent should be usable by another; experience learned by one agent should be queryable by others.
class SharedMemory:
def __init__(self, vector_store, namespace: str = "shared"):
self.vector_store = vector_store
self.namespace = namespace
self.access_log: list[dict] = []
async def publish(
self,
content: str,
source_agent: str,
knowledge_type: str = "general",
confidence: float = 0.8,
tags: list[str] = None,
) -> str:
embedding = await embed(content)
doc_id = f"{self.namespace}:{source_agent}:{datetime.now().isoformat()}"
await self.vector_store.upsert(
id=doc_id,
vector=embedding,
payload={
"content": content,
"source_agent": source_agent,
"knowledge_type": knowledge_type,
"confidence": confidence,
"tags": tags or [],
"created_at": datetime.now().isoformat(),
"access_count": 0,
},
)
return doc_id
async def query(
self,
query: str,
requesting_agent: str,
top_k: int = 5,
min_confidence: float = 0.6,
) -> list[dict]:
query_emb = await embed(query)
results = await self.vector_store.search(
vector=query_emb,
top_k=top_k,
filter={
"namespace": self.namespace,
"confidence": {"$gte": min_confidence},
},
)
for r in results:
r["_meta"] = {
"requested_by": requesting_agent,
"requested_at": datetime.now().isoformat(),
}
return results
Use cases for shared memory:
- Research agent swarms: documents found by a research agent should be directly citable by a writing agent
- Customer service agent swarms: logistics info checked by an order agent should be queryable by after-sales agents
- Code agent swarms: API patterns designed by an architect agent should be reusable by engineer agents
- Cross-session learning: experience learned by today's agent should be usable by tomorrow's agent
Shared memory design principles:
- Source tracking: every piece of knowledge must be tagged with its source agent for auditability
- Confidence labeling: agent-generated knowledge carries a confidence score ("I'm 80% sure"); low-confidence knowledge should not be trusted by other agents
- Access control: some memories are only accessible to specific agents (sensitive business data)
- Conflict detection: multiple agents may produce contradictory knowledge; an arbitration mechanism is required
Comparison of the Three Memory Categories
| Dimension | Working Memory | Long-term Memory | Shared Memory |
|---|---|---|---|
| Storage media | RAM | Vector database | Vector database plus metadata |
| Lifecycle | Single session | Cross-session / permanent | Cross-agent / permanent |
| Retrieval | Sequential / FIFO | Semantic search | Semantic search plus source filtering |
| Write timing | Real-time | Asynchronous (event-triggered) | Asynchronous (agent-initiated) |
| Typical frameworks | LangChain Memory | Mem0, Letta, Zep | Zep, Mem0, OpenMemory |
| Capacity | 4K-32K tokens | Millions of entries | Millions of entries |
| Consistency | Strong | Weak (needs merging) | Weak (needs arbitration) |
| Privacy | Not persistent | User isolation | Role-based access control |
Implementation Path
Phase 1: Implement working memory, solving the in-session context problem. Phase 2: Introduce long-term memory, storing user preferences and facts. Phase 3: Implement memory merging and deduplication, avoiding long-term memory pollution. Phase 4: Build shared memory in multi-agent systems. Phase 5: Establish memory auditing and expiration cleanup procedures. Phase 6: Add memory visualization, letting users view, modify, and delete their own memories.
Summary
Agent memory is not a single "context window" -- it is a layered memory system. Working memory maintains in-session context, long-term memory accumulates user preferences and facts, and shared memory enables multi-agent collaboration. The three differ in storage media, lifecycle, and retrieval methods, and cannot all be solved by one approach.
Frameworks like Mem0, Letta, Zep, and OpenMemory provide out-of-the-box implementations. When choosing, consider data ownership, privacy compliance, retrieval accuracy, and performance overhead.
Reference tools: Mem0 (LLM-driven long-term memory framework), Zep (production-grade long-term memory), Letta (stateful agent framework), memvid (video-encoded memory), and OpenMemory (local-first shared memory) cover the engineering implementations of the three memory categories.
Projects in this article
Mem0
59.8k ⭐Mem0 is a long-term memory layer for AI agents, supporting cross-session memory management and personalized context retrieval.
Zep
4.7k ⭐Zep is an AI agent memory management platform providing long-term memory, context management, and conversation history understanding through knowledge graph technology.
Letta
23.6k ⭐Letta (formerly MemGPT) is an open-source framework for building stateful AI agents with advanced reasoning and transparent long-term memory. It allows you to visually test, debug, and observe agents.
MemVid
15.7k ⭐MemVid is a long-term memory layer for AI agents that uses video encoding for lightweight single-file storage, replacing complex RAG pipelines with instant retrieval.
OpenMemory
4.3k ⭐Local persistent memory store for LLM applications including Claude Desktop, GitHub Copilot, Codex, and more. Provides durable context memory capabilities for AI agents.