Skip to main content

Overview

Satori provides persistent memory for AI applications through a combination of vector embeddings, semantic search, and intelligent context injection. This page explains the core mechanisms that make memory work.

Memory Lifecycle

1

Semantic search for relevant memories

When a user sends a message, Satori converts it into a vector embedding and searches for similar memories using cosine similarity.
const context = await getMemoryContext(config, userMessage, { limit: 5 });
This finds the 5 most relevant memories based on semantic meaning, not just keyword matching.
2

Context injection into system prompt

Retrieved memories are formatted and injected into the system prompt:
system: `You are a helpful assistant with memory.

What you know about this user:
${memoryContext}

Use add_memory to save important information.`
3

LLM processes with context

The language model receives both the current message and relevant historical context, allowing it to provide personalized responses.
4

Automatic memory storage

When the LLM detects important information, it calls the add_memory tool:
// LLM automatically calls this
add_memory({
  memory: "User prefers TypeScript over JavaScript"
})

Vector Embeddings Explained

Embeddings are numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings.

How Embeddings Work

When you save a memory like “I love TypeScript”, Satori:
  1. Sends the text to OpenAI’s embedding model
  2. Receives a 1536-dimensional vector (array of numbers)
  3. Stores both the text and vector in PostgreSQL with pgvector
Later, when searching for “programming preferences”, the query is also converted to a vector and compared using cosine similarity.
Embeddings capture meaning, not just keywords. “I prefer TS” and “TypeScript is my favorite” will have similar embeddings even though they share no common words.

Keyword Search

Query: “programming languages”Matches:
  • “I like programming languages”
  • “Programming languages are fun”
Misses:
  • “I prefer TypeScript” ❌
  • “Python is my favorite” ❌

Semantic Search

Query: “programming languages”Matches:
  • “I like programming languages”
  • “I prefer TypeScript” ✅
  • “Python is my favorite” ✅
  • “I’m learning Rust” ✅

Search Parameters

When searching for memories, you can control the results:
query
string
required
The natural language query to search for. This is converted to an embedding and compared against stored memories.
limit
number
default:"10"
Maximum number of memories to return. Range: 1-100.
// Get top 5 most relevant memories
const context = await getMemoryContext(config, query, { limit: 5 });
threshold
number
default:"0.7"
Minimum similarity score (0-1) for a memory to be considered relevant. Higher values = stricter matching.
// Only return very similar memories
const context = await getMemoryContext(config, query, { 
  threshold: 0.85 
});
Start with the default threshold (0.7) and adjust based on your needs. Lower values (0.6) cast a wider net, higher values (0.85) are more precise.

Context Injection Patterns

There are two main approaches to using memory context:
// Fetch memories before LLM call
const memoryContext = await getMemoryContext(config, userMessage);

// Inject into system prompt
const result = await streamText({
  model: openai('gpt-4o'),
  system: `You are a helpful assistant.
  
What you know about this user:
${memoryContext}`,
  messages,
  tools,
});
Pros: Reliable, predictable, works with all models

Pattern 2: Tool-based Search (Advanced)

// Let the LLM decide when to search
const tools = {
  ...memoryTools(config),
  search_memory: tool({
    description: 'Search for relevant memories',
    parameters: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      return await client.searchMemories(query);
    },
  }),
};
This pattern is less reliable because the LLM may not always call the search tool when needed. Use Pattern 1 for production applications.

Memory Storage Format

Each memory is stored with rich metadata:
id
string
Unique identifier (UUID) for the memory
content
string
The actual text content of the memory
embedding
number[]
1536-dimensional vector representation of the content
userId
string
User identifier for memory isolation
clerkUserId
string
Tenant identifier (API key owner)
metadata
object
Optional custom metadata for filtering and organization
{
  tags: ['preference', 'language'],
  category: 'programming',
  importance: 'high'
}
createdAt
timestamp
ISO 8601 timestamp of when the memory was created
updatedAt
timestamp
ISO 8601 timestamp of the last update

Performance Considerations

Embedding Generation

  • Latency: ~50-100ms per embedding
  • Cost: $0.00002 per 1K tokens (very cheap)
  • Caching: Consider caching embeddings for frequently searched queries
  • Latency: ~10-50ms for typical datasets
  • Scalability: IVFFlat index performs well up to millions of vectors
  • Optimization: Adjust the lists parameter in the index for your dataset size
-- Optimize for larger datasets
CREATE INDEX memories_embedding_idx 
ON memories 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 1000);  -- Increase for more vectors
For datasets under 100K memories, the default configuration performs excellently without any tuning.

Best Practices

Explicitly tell the LLM when to save memories:
system: `Save memories when the user:
- Shares preferences or opinions
- Provides personal information
- Mentions important dates or events
- Expresses goals or intentions`
Always fetch relevant context before calling the LLM, even if you think there might not be relevant memories:
// Always do this
const context = await getMemoryContext(config, userMessage);
Store memories in complete sentences that make sense out of context:
// Good
"User prefers TypeScript over JavaScript for type safety"

// Bad
"prefers TS"  // Too vague
Periodically review stored memories to ensure quality:
const allMemories = await client.getAllMemories();
console.log('Total memories:', allMemories.length);

Next Steps