> ## Documentation Index
> Fetch the complete documentation index at: https://docs.swarms.world/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Memory

> Configure persistent agent memory, context compression, archives, and conversation history

Swarms agents have disk-backed persistent memory through the `Conversation` class. When `persistent_memory=True` (the default), each agent writes its active interaction log to a `MEMORY.md` file and reloads that file when another process starts an agent with the same `agent_name`. Set `persistent_memory=False` for ephemeral agents that should keep no on-disk state.

Use this page to understand:

* How `MEMORY.md` is created, loaded, and updated
* How context compression keeps memory within the model context window
* How archived transcripts preserve raw chat history
* How to inspect, compact, export, or disable memory in code
* When to use persistent memory versus RAG-based long-term memory

<Note>
  Persistent memory is keyed by `agent_name`. Reusing the same `agent_name` (with `persistent_memory=True`) resumes the same memory across process restarts. Changing the name starts a separate memory folder; setting `persistent_memory=False` keeps the agent fully ephemeral.
</Note>

## The `persistent_memory` flag

`persistent_memory` is the top-level switch that controls whether the agent reads from and writes to `MEMORY.md`.

<ParamField path="persistent_memory" type="bool" default="True">
  Enables disk-backed persistent memory. When `True`, the agent creates `MEMORY.md` on first run, preloads it on subsequent runs, and writes new turns through to disk. When `False`, the agent runs fully in-process — no `MEMORY.md`, no `archive/`, fresh state every run.

  ```python theme={null}
  from swarms import Agent

  # Persistent agent (default behavior).
  # On first run it creates MEMORY.md. On subsequent runs it picks up
  # where it left off — the model sees the prior conversation as a
  # system preamble.
  persistent_agent = Agent(
      agent_name="ResearchAssistant",
      agent_description="Remembers context across sessions",
      model_name="gpt-4.1",
      max_loops=1,
      persistent_memory=True,  # default — state survives restarts
  )

  # Ephemeral agent — no disk writes, no preload, fresh every run.
  ephemeral_agent = Agent(
      agent_name="OneShotAgent",
      model_name="gpt-4.1",
      max_loops=1,
      persistent_memory=False,
  )
  ```
</ParamField>

## Memory stack

An agent can use several memory layers at the same time:

| Layer                            | Purpose                                                   | Persistence         |
| -------------------------------- | --------------------------------------------------------- | ------------------- |
| `conversation_history`           | In-memory messages for the current run                    | Current process     |
| `MEMORY.md`                      | Active user, agent, and tool interaction log              | Disk-backed         |
| `archive/history_<timestamp>.md` | Raw transcripts saved before compaction                   | Disk-backed         |
| `Conversation.compact()`         | Replaces raw active history with a summary                | Disk-backed summary |
| `ContextCompressor`              | Automatically calls compaction near the context limit     | Runtime behavior    |
| `long_term_memory`               | Optional vector database for external knowledge retrieval | Depends on database |

`MEMORY.md` is not the same thing as RAG. Persistent memory records the agent's own interaction history. RAG retrieves knowledge from external documents or a vector database.

## Disk layout

Agent memory lives under the workspace directory:

```text theme={null}
$WORKSPACE_DIR/agents/{agent_name}/
|-- MEMORY.md
`-- archive/
    |-- history_2026-04-20_14-30-45.md
    |-- history_2026-04-20_16-12-08.md
    `-- ...
```

<ParamField path="agent_name" type="str" default="swarm-worker-01">
  Stable name used to identify the agent's memory folder.

  ```python theme={null}
  from swarms import Agent

  agent = Agent(
      agent_name="ResearchAgent",
      model_name="claude-sonnet-4-6",
  )

  # Memory path:
  # $WORKSPACE_DIR/agents/ResearchAgent/MEMORY.md
  ```
</ParamField>

### Key design points

* The folder is keyed by `agent_name`, not by `id`.
* `MEMORY.md` is append-updated during normal operation.
* Every `conversation.add(role, content)` writes to in-memory history and to disk.
* Compression archives the current `MEMORY.md` before replacing it with a compact summary.
* The agent's static `system_prompt`, `rules`, and constructor configuration are not repeatedly appended to `MEMORY.md`.

## Lifecycle

### 1. File creation

On first construction of an agent with a new `agent_name`, Swarms creates:

```text theme={null}
$WORKSPACE_DIR/agents/{agent_name}/MEMORY.md
```

The file starts with a small header and an interaction log section:

```markdown theme={null}
# Agent Memory

**Conversation:** ResearchAgent_id_<uuid>_conversation
**Created:** 2026-04-20T18:33:12

---

## Interaction Log
```

If the file already exists, Swarms leaves it in place.

### 2. Preload on construction

During `Conversation.__init__`, Swarms reads the existing `MEMORY.md` and injects it into `conversation_history` as a single `System` message.

The resulting prompt order is:

```text theme={null}
[0] System: <system_prompt>
[1] User: <rules>                       # if provided
[2] User: <custom_rules_prompt>         # if provided
[3] System: [Persistent Memory - MEMORY.md]
            ... full MEMORY.md contents ...
```

The preload is added directly to memory, so it is not written back to disk again. When `return_history_as_string()` builds the prompt, the model sees the system prompt, rules, persistent memory, and current task in order.

### 3. Write-through on new messages

Every `conversation.add(role, content)` call:

1. Appends the message to `conversation_history`
2. Appends a timestamped block to `MEMORY.md`

The on-disk format looks like this:

```markdown theme={null}
### User - 2026-04-20T18:35:04
Research cloud database options for low-latency analytics.

---

### ResearchAgent - 2026-04-20T18:35:21
I recommend evaluating BigQuery, ClickHouse Cloud, and AlloyDB...

---
```

Disk writes are serialized with a per-conversation lock. Construction-time messages such as system prompts and rules are suppressed from disk so static identity does not get duplicated on every restart.

## Context compression

Without compression, a long-running agent could eventually exceed the model's context window. Swarms can attach a `ContextCompressor` that summarizes the current transcript and compacts the active memory.

<ParamField path="context_compression" type="bool" default="True">
  Enables automatic compression when memory approaches the configured context limit.

  ```python theme={null}
  from swarms import Agent

  agent = Agent(
      agent_name="ResearchAgent",
      model_name="claude-sonnet-4-6",
      max_loops=5,
      context_compression=True,
  )
  ```
</ParamField>

### When compression runs

Compression can run when all of these are true:

* `context_compression=True`
* The token usage of `short_memory.return_history_as_string()` is greater than or equal to `threshold * context_length`
* The agent is at the top of a loop iteration

The default threshold is `0.9`, so compression starts when the active prompt reaches about 90% of the context window.

Compression works for both `max_loops="auto"` and integer `max_loops` runs. The `context_compression` flag is the gate.

### What compression does

When compression fires:

1. The current transcript is summarized with an LLM call.
2. `Conversation.compact(summary=...)` is called.
3. The current `MEMORY.md` is copied to `archive/history_<timestamp>.md`.
4. The active `MEMORY.md` is deleted and recreated with a fresh header.
5. `conversation_history` is rebuilt with the system prompt, rules, and custom rules.
6. The summary is appended as one `System` message to both memory and `MEMORY.md`.

After compaction, active memory is small again:

```text theme={null}
conversation_history:
  [0] System: <system_prompt>
  [1] User: <rules>                     # if provided
  [2] User: <custom_rules_prompt>       # if provided
  [3] System: [Compressed Memory Summary] ...<summary>

MEMORY.md:
  # Agent Memory
  ...
  ## Interaction Log
  ### System - <timestamp>
  [Compressed Memory Summary] ...<summary>

archive/history_<previous-timestamp>.md:
  Full pre-compaction transcript
```

On the next process restart, Swarms loads the compact summary from `MEMORY.md` instead of the raw pre-compaction transcript. The archive keeps the full transcript available without filling the active context window.

## Configure compression

Compression is enabled by default:

```python theme={null}
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)
```

Disable compression when you want the active `MEMORY.md` to remain un-compacted:

```python theme={null}
from swarms import Agent

agent = Agent(
    agent_name="StaticAgent",
    model_name="claude-sonnet-4-6",
    max_loops="auto",
    context_compression=False,
)
```

When compression is enabled, the agent attaches a `ContextCompressor(threshold=0.9)`. You can replace it after construction to tune the threshold, summarizer model, temperature, or summary length:

```python theme={null}
from swarms import Agent
from swarms.agents.context_compressor import ContextCompressor

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

agent._context_compressor = ContextCompressor(
    threshold=0.75,
    summarizer_model="claude-haiku-4-5",
    summarizer_temperature=0.1,
    summarizer_max_tokens=3000,
)
```

## Access memory in code

The `Conversation` object is available as `agent.short_memory`.

```python theme={null}
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
)

agent.run("Research low-latency data warehouse options.")
agent.run("Narrow the recommendation to GCP.")

# Path to the active on-disk memory file
print(agent.short_memory.memory_md_path)

# Full prompt-ready history
print(agent.short_memory.return_history_as_string())

# Structured message list
messages = agent.short_memory.to_dict()
print(messages)

# Last response content
print(agent.short_memory.get_final_message_content())
```

### Manual compaction

You can compact memory yourself at any time:

```python theme={null}
agent.short_memory.compact(
    summary=(
        "Researched cloud data warehouses. "
        "The user prefers GCP for latency and operations reasons. "
        "Shortlist: BigQuery, AlloyDB, and ClickHouse Cloud."
    )
)
```

Manual compaction follows the same archive, wipe, and re-seed flow as automatic compression.

### Export and load conversations

`MEMORY.md` is the active persistent memory file. You can also export or load conversation history in other formats:

```python theme={null}
# Save conversation snapshots
agent.short_memory.export(force=True)
agent.short_memory.save_as_json(force=True)
agent.short_memory.save_as_yaml(force=True)

# Load a prior exported conversation
agent.short_memory.load("conversation_agent-123.json")
```

### Search memory

Use built-in search helpers for quick inspection:

```python theme={null}
results = agent.short_memory.search("GCP")
matches = agent.short_memory.search_keyword_in_conversation("latency")
```

## Disable disk-backed memory

The clean way to keep an agent fully in-process is `persistent_memory=False`. Nothing is preloaded, nothing is written to `MEMORY.md`, and no `archive/` directory is created:

```python theme={null}
from swarms import Agent

agent = Agent(
    agent_name="EphemeralAgent",
    model_name="gpt-4.1",
    persistent_memory=False,
)

agent.run("This updates conversation_history but does not write to MEMORY.md.")
```

If you have already constructed a persistent agent and want to stop further disk writes for the rest of the run, you can also clear `memory_md_path`:

```python theme={null}
agent.short_memory.memory_md_path = None
```

This stops future writes but does not retroactively delete `MEMORY.md`. Neither approach disables `conversation_history` — that always tracks the current run in memory.

## Persistent memory vs RAG

Use `MEMORY.md` for the agent's own interaction history. Use RAG when the agent needs to retrieve information from documents, databases, or external knowledge stores.

```python theme={null}
from swarms import Agent
from swarms_memory import ChromaDB

vector_db = ChromaDB(
    output_dir="agent_memory",
    docs_folder="knowledge_base",
)

agent = Agent(
    agent_name="KnowledgeAgent",
    model_name="claude-sonnet-4-6",
    long_term_memory=vector_db,
    rag_every_loop=False,
    max_loops=1,
)

response = agent.run(
    "Summarize what our renewable energy documents say about storage."
)
```

<ParamField path="long_term_memory" type="BaseVectorDatabase" default="None">
  Vector database used for document retrieval.
</ParamField>

<ParamField path="rag_every_loop" type="bool" default="False">
  Query long-term memory on every loop iteration instead of only at the beginning.
</ParamField>

<ParamField path="memory_chunk_size" type="int" default="2000">
  Chunk size used when processing memory documents for retrieval.
</ParamField>

## Best practices

* Use stable, descriptive `agent_name` values for agents that should remember previous work.
* Keep `context_compression=True` for autonomous or long-running agents.
* Tune `ContextCompressor.threshold` lower for agents with large tool outputs or long responses.
* Compact manually after major milestones to preserve the important state and reduce prompt size.
* Use RAG for external knowledge. Do not rely on `MEMORY.md` as a document database.
* Set `memory_md_path = None` for privacy-sensitive or one-off agents that should not write a transcript.

## Why it works this way

### Why key memory by `agent_name`?

`id` values can change between process starts. `agent_name` is user-controlled and stable, so it gives the agent a durable identity.

### Why preload memory as one `System` message?

The model needs to understand that the content is prior memory, not a current user request. A single system-level memory preamble is compact and less ambiguous than replaying old turns as active messages.

### Why wipe `MEMORY.md` during compaction?

If compaction only appended a summary, the next run would load both the summary and the raw transcript it summarizes. Wiping the active file keeps the working context small, while `archive/` preserves the raw log.

## Next steps

<CardGroup cols={2}>
  <Card title="Agent Configuration" icon="sliders" href="/agents/agent-configuration">
    Configure core agent parameters such as `agent_name`, `max_loops`, and context limits.
  </Card>

  <Card title="Conversation API" icon="comments" href="/api/conversation">
    Explore the underlying `Conversation` class and its export, load, and search helpers.
  </Card>
</CardGroup>