Overview
Satori provides persistent memory for AI applications through a combination of vector embeddings, semantic search, and intelligent context injection. This page explains the core mechanisms that make memory work.Memory Lifecycle
1
Semantic search for relevant memories
When a user sends a message, Satori converts it into a vector embedding and searches for similar memories using cosine similarity.This finds the 5 most relevant memories based on semantic meaning, not just keyword matching.
2
Context injection into system prompt
Retrieved memories are formatted and injected into the system prompt:
3
LLM processes with context
The language model receives both the current message and relevant historical context, allowing it to provide personalized responses.
4
Automatic memory storage
When the LLM detects important information, it calls the
add_memory tool:Vector Embeddings Explained
Embeddings are numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings.How Embeddings Work
- Concept
- Example
- Technical Details
When you save a memory like “I love TypeScript”, Satori:
- Sends the text to OpenAI’s embedding model
- Receives a 1536-dimensional vector (array of numbers)
- Stores both the text and vector in PostgreSQL with pgvector
Embeddings capture meaning, not just keywords. “I prefer TS” and “TypeScript is my favorite” will have similar embeddings even though they share no common words.
Semantic Search vs Keyword Search
Keyword Search
Query: “programming languages”Matches:
- “I like programming languages”
- “Programming languages are fun”
- “I prefer TypeScript” ❌
- “Python is my favorite” ❌
Semantic Search
Query: “programming languages”Matches:
- “I like programming languages”
- “I prefer TypeScript” ✅
- “Python is my favorite” ✅
- “I’m learning Rust” ✅
Search Parameters
When searching for memories, you can control the results:The natural language query to search for. This is converted to an embedding and compared against stored memories.
Maximum number of memories to return. Range: 1-100.
Minimum similarity score (0-1) for a memory to be considered relevant. Higher values = stricter matching.
Context Injection Patterns
There are two main approaches to using memory context:Pattern 1: Pre-fetch and Inject (Recommended)
Pros: Reliable, predictable, works with all models
Pattern 2: Tool-based Search (Advanced)
Memory Storage Format
Each memory is stored with rich metadata:Unique identifier (UUID) for the memory
The actual text content of the memory
1536-dimensional vector representation of the content
User identifier for memory isolation
Tenant identifier (API key owner)
Optional custom metadata for filtering and organization
ISO 8601 timestamp of when the memory was created
ISO 8601 timestamp of the last update
Performance Considerations
Embedding Generation
- Latency: ~50-100ms per embedding
- Cost: $0.00002 per 1K tokens (very cheap)
- Caching: Consider caching embeddings for frequently searched queries
Vector Search
- Latency: ~10-50ms for typical datasets
- Scalability: IVFFlat index performs well up to millions of vectors
- Optimization: Adjust the
listsparameter in the index for your dataset size
Best Practices
Write clear system prompts
Write clear system prompts
Explicitly tell the LLM when to save memories:
Pre-fetch context for every message
Pre-fetch context for every message
Always fetch relevant context before calling the LLM, even if you think there might not be relevant memories:
Use descriptive memory content
Use descriptive memory content
Store memories in complete sentences that make sense out of context:
Monitor memory quality
Monitor memory quality
Periodically review stored memories to ensure quality: