> ## Documentation Index > Fetch the complete documentation index at: https://docs.usesatori.sh/llms.txt > Use this file to discover all available pages before exploring further. # How It Works > Learn how Satori uses embeddings and semantic search to power AI memory ## Overview Satori provides persistent memory for AI applications through a combination of vector embeddings, semantic search, and intelligent context injection. This page explains the core mechanisms that make memory work. ## Memory Lifecycle ```mermaid theme={null} graph LR UserInput[User Input] --> Search[Semantic Search] Search --> Context[Inject Context] Context --> LLM[LLM Processing] LLM --> Decision{Important Info?} Decision -->|Yes| Save[Save Memory] Decision -->|No| Response[Stream Response] Save --> Response Response --> User[User Sees Response] ``` When a user sends a message, Satori converts it into a vector embedding and searches for similar memories using cosine similarity. ```typescript theme={null} const context = await getContext(config, userMessage, { limit: 5 }); ``` This finds the 5 most relevant memories based on semantic meaning, not just keyword matching. Retrieved memories are formatted and injected into the system prompt: ```typescript theme={null} system: `You are a helpful assistant with memory. What you know about this user: ${memoryContext} Use add_item to save important information.` ``` The language model receives both the current message and relevant historical context, allowing it to provide personalized responses. When the LLM detects important information, it calls the `add_item` tool: ```typescript theme={null} // LLM automatically calls this add_item({ memory: "User prefers TypeScript over JavaScript" }) ``` ## Vector Embeddings Explained Embeddings are numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings. ### How Embeddings Work When you save a memory like "I love TypeScript", Satori: 1. Sends the text to OpenAI's embedding model 2. Receives a 1536-dimensional vector (array of numbers) 3. Stores both the text and vector in PostgreSQL with pgvector Later, when searching for "programming preferences", the query is also converted to a vector and compared using cosine similarity. ```typescript theme={null} // Memory: "I love TypeScript" // Embedding: [0.023, -0.145, 0.891, ..., 0.234] (1536 numbers) // Query: "What languages does the user like?" // Query embedding: [0.019, -0.142, 0.887, ..., 0.229] // Cosine similarity: 0.94 (very similar!) // This memory will be returned as relevant ``` * **Model**: `text-embedding-3-small` (OpenAI) * **Dimensions**: 1536 * **Similarity metric**: Cosine similarity * **Index type**: IVFFlat (pgvector) * **Storage**: PostgreSQL with pgvector extension Embeddings capture meaning, not just keywords. "I prefer TS" and "TypeScript is my favorite" will have similar embeddings even though they share no common words. ## Semantic Search vs Keyword Search **Query:** "programming languages" **Matches:** * "I like programming languages" * "Programming languages are fun" **Misses:** * "I prefer TypeScript" ❌ * "Python is my favorite" ❌ **Query:** "programming languages" **Matches:** * "I like programming languages" * "I prefer TypeScript" ✅ * "Python is my favorite" ✅ * "I'm learning Rust" ✅ ## Search Parameters When searching for memories, you can control the results: The natural language query to search for. This is converted to an embedding and compared against stored memories. Maximum number of memories to return. Range: 1-100. ```typescript theme={null} // Get top 5 most relevant memories const context = await getContext(config, query, { limit: 5 }); ``` Minimum similarity score (0-1) for a memory to be considered relevant. Higher values = stricter matching. ```typescript theme={null} // Only return very similar memories const context = await getContext(config, query, { threshold: 0.85 }); ``` Start with the default threshold (0.7) and adjust based on your needs. Lower values (0.6) cast a wider net, higher values (0.85) are more precise. ## Context Injection Patterns There are two main approaches to using memory context: ### Pattern 1: Pre-fetch and Inject (Recommended) ```typescript theme={null} // Fetch memories before LLM call const memoryContext = await getContext(config, userMessage); // Inject into system prompt const result = await streamText({ model: openai('gpt-4o'), system: `You are a helpful assistant. What you know about this user: ${memoryContext}`, messages, tools, }); ``` **Pros:** Reliable, predictable, works with all models ### Pattern 2: Tool-based Search (Advanced) ```typescript theme={null} // Let the LLM decide when to search const tools = { ...memoryTools(config), search_memory: tool({ description: 'Search for relevant memories', parameters: z.object({ query: z.string() }), execute: async ({ query }) => { return await client.searchMemories(query); }, }), }; ``` This pattern is less reliable because the LLM may not always call the search tool when needed. Use Pattern 1 for production applications. ## Memory Storage Format Each memory is stored with rich metadata: Unique identifier (UUID) for the memory The actual text content of the memory 1536-dimensional vector representation of the content User identifier for memory isolation Tenant identifier (API key owner) Optional custom metadata for filtering and organization ```typescript theme={null} { tags: ['preference', 'language'], category: 'programming', importance: 'high' } ``` ISO 8601 timestamp of when the memory was created ISO 8601 timestamp of the last update ## Performance Considerations ### Embedding Generation * **Latency**: \~50-100ms per embedding * **Cost**: \$0.00002 per 1K tokens (very cheap) * **Caching**: Consider caching embeddings for frequently searched queries ### Vector Search * **Latency**: \~10-50ms for typical datasets * **Scalability**: IVFFlat index performs well up to millions of vectors * **Optimization**: Adjust the `lists` parameter in the index for your dataset size ```sql theme={null} -- Optimize for larger datasets CREATE INDEX memories_embedding_idx ON memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000); -- Increase for more vectors ``` For datasets under 100K memories, the default configuration performs excellently without any tuning. ## Best Practices Explicitly tell the LLM when to save memories: ```typescript theme={null} system: `Save memories when the user: - Shares preferences or opinions - Provides personal information - Mentions important dates or events - Expresses goals or intentions` ``` Always fetch relevant context before calling the LLM, even if you think there might not be relevant memories: ```typescript theme={null} // Always do this const context = await getContext(config, userMessage); ``` Store memories in complete sentences that make sense out of context: ```typescript theme={null} // Good "User prefers TypeScript over JavaScript for type safety" // Bad "prefers TS" // Too vague ``` Periodically review stored memories to ensure quality: ```typescript theme={null} const allMemories = await client.getAllMemories(); console.log('Total memories:', allMemories.length); ``` ## Next Steps Learn how API keys and tenant isolation work Understand multi-tenant data separation See complete integration examples Explore the search API in detail