Core Concepts
Understanding how the Memory Engine works
Memory Types
The system distinguishes between two primary types of memory:
Semantic Memory
Semantic memories store factual knowledge extracted from content. These are useful for retrieving information like user preferences, project details, or general facts.
- Example: "The user prefers to write code in TypeScript."
- Storage: Stored as atomic facts with associated vector embeddings.
Procedural Memory
Procedural memories track the history of an agent's execution. They record the steps an agent took, the actions performed, and the outcomes.
- Example: "Step 1: Searched for
auth.ts-> Step 2: Found login function." - Storage: Stored with metadata like
stepNumber,action,taskObjective, andcontext.
Fact Extraction
When a new memory is created (e.g., via POST /memory), the content is not just stored as-is. The system uses an LLM to parse the raw text and extract atomic facts.
Why atomic facts? Raw text often contains noise or multiple distinct pieces of information. Breaking it down into atomic statements improves retrieval accuracy. For example, a paragraph about a meeting might be broken down into:
- "Meeting held on Oct 12th."
- "Team decided to use React."
- "Next sprint starts Monday."
Each fact is embedded and stored separately, linking back to the original source memory.
Deduplication
To prevent bloating the database with identical information, the engine uses content hashing.
- A normalized hash is generated for every incoming memory content.
- If a memory with the same hash already exists for the same
userId(and optionalagentId/runId), the system detects it as a duplicate. - Duplicates are skipped, but the system returns the existing memory ID.
Vector Search & RAG
Embeddings
The engine uses an embedding model (e.g., openai/text-embedding-3-small) to convert text (facts) into high-dimensional vectors. These vectors represent the meaning of the text.
Semantic Search
When you search or ask a question, your query is also converted into a vector. The system queries Qdrant to find vectors that are mathematically close to your query vector. This allows finding relevant information even if the wording is different.
Retrieval-Augmented Generation (RAG)
The ask and answer endpoints perform a full RAG flow:
- Retrieve: Find relevant memories using vector search.
- Rerank (Optional): Use an LLM to score the relevance of retrieved memories to the specific question, filtering out less relevant results.
- Generate: Feed the top-ranked memories as context to an LLM to generate a natural language answer.