Agent Memory Architecture: Working, Long-term, and Shared Memory Trade-offs

An agent without memory is a "goldfish" -- every conversation starts from zero, unable to build true continuity. Agent memory architecture determines whether a system can learn across sessions, share knowledge between agents, and maintain consistency in long-running tasks. This article systematically compares three categories of agent memory -- working memory, long-term memory, and shared memory -- and the design trade-offs in different scenarios.

Why Memory Is Core to Agent Systems

Traditional LLMs are stateless -- every prompt is an independent request with no history. But agent systems need to solve three fundamental problems:

1. Context continuity within a single session. In multi-turn conversations between user and agent, earlier messages determine later responses. Without "working memory," the agent cannot resolve references, omissions, or context.

2. Cross-session user memory. A customer service agent should remember a user's previous issue next time; a personal assistant agent should remember the user's preferences and past decisions. Cross-session "long-term memory" is the foundation of agent personalization.

3. Knowledge sharing between multiple agents. In multi-agent systems, knowledge retrieved by one agent should be reusable by others; experience learned by one agent should accumulate into a shared knowledge base.

These three needs correspond to three memory architectures: working memory (short-term), long-term memory (cross-session), and shared memory (cross-agent). Their storage media, retrieval methods, and life cycles are entirely different.

Category 1: Working Memory

Working memory stores the current session's context -- the most basic form of agent memory.

from dataclasses import dataclass, field
from typing import Literal
from datetime import datetime

@dataclass
class Message:
    role: Literal["user", "assistant", "system", "tool"]
    content: str
    timestamp: datetime = field(default_factory=datetime.now)
    metadata: dict = field(default_factory=dict)
    
    @property
    def tokens(self) -> int:
        return len(self.content) // 4

class WorkingMemory:
    def __init__(self, max_tokens: int = 8000, max_messages: int = 50):
        self.messages: list[Message] = []
        self.max_tokens = max_tokens
        self.max_messages = max_messages
        self.summary: str = ""
    
    def add(self, message: Message) -> None:
        self.messages.append(message)
        self._maybe_compress()
    
    def get_context(self) -> list[dict]:
        context = []
        if self.summary:
            context.append({
                "role": "system",
                "content": f"[Conversation summary so far] {self.summary}",
            })
        for msg in self.messages:
            context.append({
                "role": msg.role,
                "content": msg.content,
            })
        return context
    
    def total_tokens(self) -> int:
        return sum(m.tokens for m in self.messages) + (len(self.summary) // 4)
    
    def _maybe_compress(self) -> None:
        if self.total_tokens() > self.max_tokens or len(self.messages) > self.max_messages:
            recent = self.messages[-4:]
            old = self.messages[:-4]
            self.summary = self._summarize(old, self.summary)
            self.messages = recent
    
    def _summarize(self, old_messages, prev_summary) -> str:
        return prev_summary

Working memory design points:

Window size: typically 4K-32K tokens, limited by the LLM context window
Compression strategy: when the window is exceeded, compress into "summary plus recent N turns"
Structured storage: not just strings, preserve role, metadata, and tool_call_id
Lifecycle: destroyed at session end, not persisted

Working memory pain points:

Window bottleneck: long conversations quickly exhaust it
Information loss: summaries inevitably lose detail
Session discontinuity: switching devices or starting a new session erases everything

Category 2: Long-term Memory

Long-term memory stores cross-session user preferences, history, and knowledge, making the agent truly "know" the user.

import numpy as np
from datetime import datetime

class LongTermMemory:
    def __init__(self, embedding_model, vector_store):
        self.embedding_model = embedding_model
        self.vector_store = vector_store
        self.embeddings_dim = 1536
    
    async def remember(
        self,
        content: str,
        user_id: str,
        memory_type: str = "fact",
        importance: float = 0.5,
        metadata: dict = None,
    ) -> str:
        embedding = await self.embedding_model.aembed(content)
        memory_id = f"{user_id}:{datetime.now().isoformat()}:{hash(content) % 10000}"
        
        await self.vector_store.upsert(
            id=memory_id,
            vector=embedding,
            payload={
                "content": content,
                "user_id": user_id,
                "memory_type": memory_type,
                "importance": importance,
                "created_at": datetime.now().isoformat(),
                "access_count": 0,
                **(metadata or {}),
            },
        )
        return memory_id
    
    async def recall(
        self,
        query: str,
        user_id: str,
        top_k: int = 5,
        memory_types: list[str] | None = None,
        min_importance: float = 0.0,
    ) -> list[dict]:
        query_emb = await self.embedding_model.aembed(query)
        
        filters = {
            "user_id": user_id,
            "importance": {"$gte": min_importance},
        }
        if memory_types:
            filters["memory_type"] = {"$in": memory_types}
        
        results = await self.vector_store.search(
            vector=query_emb,
            top_k=top_k,
            filter=filters,
        )
        
        for r in results:
            await self.vector_store.update(
                r["id"],
                {"$inc": {"access_count": 1}},
            )
        
        return results

Long-term memory design patterns:

Fact memory: static user information ("My name is Alex", "I live in Shanghai")
Preference memory: user preferences ("I like concise answers")
Episodic memory: significant events ("Bought tickets last week")
Skill memory: user behavior patterns ("Usually asks in Chinese")

Key challenges in long-term memory:

When to write: when does a short-term memory "graduate" to long-term? Too-frequent writes pollute; too-infrequent writes lose data
Deduplication and merging: when the same fact appears multiple times, merge rather than store duplicates
Expiration: some memories become outdated ("tomorrow's meeting" is meaningless the day after)
Conflict resolution: user preferences can change ("I used to like coffee, now I like tea") -- update rather than append

The Mem0 framework specifically addresses these challenges: it provides LLM-driven memory extraction, merging, and conflict resolution.

Category 3: Shared Memory

Shared memory addresses knowledge sharing between multiple agents. Knowledge retrieved by one agent should be usable by another; experience learned by one agent should be queryable by others.

class SharedMemory:
    def __init__(self, vector_store, namespace: str = "shared"):
        self.vector_store = vector_store
        self.namespace = namespace
        self.access_log: list[dict] = []
    
    async def publish(
        self,
        content: str,
        source_agent: str,
        knowledge_type: str = "general",
        confidence: float = 0.8,
        tags: list[str] = None,
    ) -> str:
        embedding = await embed(content)
        doc_id = f"{self.namespace}:{source_agent}:{datetime.now().isoformat()}"
        
        await self.vector_store.upsert(
            id=doc_id,
            vector=embedding,
            payload={
                "content": content,
                "source_agent": source_agent,
                "knowledge_type": knowledge_type,
                "confidence": confidence,
                "tags": tags or [],
                "created_at": datetime.now().isoformat(),
                "access_count": 0,
            },
        )
        return doc_id
    
    async def query(
        self,
        query: str,
        requesting_agent: str,
        top_k: int = 5,
        min_confidence: float = 0.6,
    ) -> list[dict]:
        query_emb = await embed(query)
        results = await self.vector_store.search(
            vector=query_emb,
            top_k=top_k,
            filter={
                "namespace": self.namespace,
                "confidence": {"$gte": min_confidence},
            },
        )
        
        for r in results:
            r["_meta"] = {
                "requested_by": requesting_agent,
                "requested_at": datetime.now().isoformat(),
            }
        
        return results

Use cases for shared memory:

Research agent swarms: documents found by a research agent should be directly citable by a writing agent
Customer service agent swarms: logistics info checked by an order agent should be queryable by after-sales agents
Code agent swarms: API patterns designed by an architect agent should be reusable by engineer agents
Cross-session learning: experience learned by today's agent should be usable by tomorrow's agent

Shared memory design principles:

Source tracking: every piece of knowledge must be tagged with its source agent for auditability
Confidence labeling: agent-generated knowledge carries a confidence score ("I'm 80% sure"); low-confidence knowledge should not be trusted by other agents
Access control: some memories are only accessible to specific agents (sensitive business data)
Conflict detection: multiple agents may produce contradictory knowledge; an arbitration mechanism is required

Comparison of the Three Memory Categories

Dimension	Working Memory	Long-term Memory	Shared Memory
Storage media	RAM	Vector database	Vector database plus metadata
Lifecycle	Single session	Cross-session / permanent	Cross-agent / permanent
Retrieval	Sequential / FIFO	Semantic search	Semantic search plus source filtering
Write timing	Real-time	Asynchronous (event-triggered)	Asynchronous (agent-initiated)
Typical frameworks	LangChain Memory	Mem0, Letta, Zep	Zep, Mem0, OpenMemory
Capacity	4K-32K tokens	Millions of entries	Millions of entries
Consistency	Strong	Weak (needs merging)	Weak (needs arbitration)
Privacy	Not persistent	User isolation	Role-based access control

Implementation Path

Phase 1: Implement working memory, solving the in-session context problem. Phase 2: Introduce long-term memory, storing user preferences and facts. Phase 3: Implement memory merging and deduplication, avoiding long-term memory pollution. Phase 4: Build shared memory in multi-agent systems. Phase 5: Establish memory auditing and expiration cleanup procedures. Phase 6: Add memory visualization, letting users view, modify, and delete their own memories.

Summary

Agent memory is not a single "context window" -- it is a layered memory system. Working memory maintains in-session context, long-term memory accumulates user preferences and facts, and shared memory enables multi-agent collaboration. The three differ in storage media, lifecycle, and retrieval methods, and cannot all be solved by one approach.

Frameworks like Mem0, Letta, Zep, and OpenMemory provide out-of-the-box implementations. When choosing, consider data ownership, privacy compliance, retrieval accuracy, and performance overhead.

Reference tools: Mem0 (LLM-driven long-term memory framework), Zep (production-grade long-term memory), Letta (stateful agent framework), memvid (video-encoded memory), and OpenMemory (local-first shared memory) cover the engineering implementations of the three memory categories.

Agent Memory Architecture: Working, Long-term, and Shared Memory Trade-offs

Agent Memory Architecture: Working, Long-term, and Shared Memory Trade-offs

Why Memory Is Core to Agent Systems

Category 1: Working Memory

Category 2: Long-term Memory

Category 3: Shared Memory

Comparison of the Three Memory Categories

Implementation Path

Summary

Projects in this article

Mem0

Zep

Letta

MemVid

OpenMemory