How It Works

Overview

Satori provides persistent memory for AI applications through a combination of vector embeddings, semantic search, and intelligent context injection. This page explains the core mechanisms that make memory work.

Memory Lifecycle

Semantic search for relevant memories

When a user sends a message, Satori converts it into a vector embedding and searches for similar memories using cosine similarity.

const context = await getContext(config, userMessage, { limit: 5 });

This finds the 5 most relevant memories based on semantic meaning, not just keyword matching.

Context injection into system prompt

Retrieved memories are formatted and injected into the system prompt:

system: `You are a helpful assistant with memory.

What you know about this user:
${memoryContext}

Use add_item to save important information.`

LLM processes with context

The language model receives both the current message and relevant historical context, allowing it to provide personalized responses.

Automatic memory storage

When the LLM detects important information, it calls the add_item tool:

// LLM automatically calls this
add_item({
  memory: "User prefers TypeScript over JavaScript"
})

Vector Embeddings Explained

Embeddings are numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings.

How Embeddings Work

Concept
Example
Technical Details

When you save a memory like “I love TypeScript”, Satori:

Sends the text to OpenAI’s embedding model
Receives a 1536-dimensional vector (array of numbers)
Stores both the text and vector in PostgreSQL with pgvector

Later, when searching for “programming preferences”, the query is also converted to a vector and compared using cosine similarity.

// Memory: "I love TypeScript"
// Embedding: [0.023, -0.145, 0.891, ..., 0.234] (1536 numbers)

// Query: "What languages does the user like?"
// Query embedding: [0.019, -0.142, 0.887, ..., 0.229]

// Cosine similarity: 0.94 (very similar!)
// This memory will be returned as relevant

Model: text-embedding-3-small (OpenAI)
Dimensions: 1536
Similarity metric: Cosine similarity
Index type: IVFFlat (pgvector)
Storage: PostgreSQL with pgvector extension

Embeddings capture meaning, not just keywords. “I prefer TS” and “TypeScript is my favorite” will have similar embeddings even though they share no common words.

Semantic Search vs Keyword Search

Keyword Search

Query: “programming languages”Matches:

“I like programming languages”
“Programming languages are fun”

Misses:

“I prefer TypeScript” ❌
“Python is my favorite” ❌

Semantic Search

Query: “programming languages”Matches:

“I like programming languages”
“I prefer TypeScript” ✅
“Python is my favorite” ✅
“I’m learning Rust” ✅

Search Parameters

When searching for memories, you can control the results:

query

string

required

The natural language query to search for. This is converted to an embedding and compared against stored memories.

limit

number

default:"10"

Maximum number of memories to return. Range: 1-100.

// Get top 5 most relevant memories
const context = await getContext(config, query, { limit: 5 });

threshold

number

default:"0.7"

Minimum similarity score (0-1) for a memory to be considered relevant. Higher values = stricter matching.

// Only return very similar memories
const context = await getContext(config, query, { 
  threshold: 0.85 
});

Start with the default threshold (0.7) and adjust based on your needs. Lower values (0.6) cast a wider net, higher values (0.85) are more precise.

Context Injection Patterns

There are two main approaches to using memory context:

Pattern 1: Pre-fetch and Inject (Recommended)

// Fetch memories before LLM call
const memoryContext = await getContext(config, userMessage);

// Inject into system prompt
const result = await streamText({
  model: openai('gpt-4o'),
  system: `You are a helpful assistant.
  
What you know about this user:
${memoryContext}`,
  messages,
  tools,
});

Pros: Reliable, predictable, works with all models

Pattern 2: Tool-based Search (Advanced)

// Let the LLM decide when to search
const tools = {
  ...memoryTools(config),
  search_memory: tool({
    description: 'Search for relevant memories',
    parameters: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      return await client.searchMemories(query);
    },
  }),
};

This pattern is less reliable because the LLM may not always call the search tool when needed. Use Pattern 1 for production applications.

Memory Storage Format

Each memory is stored with rich metadata:

string

Unique identifier (UUID) for the memory

content

string

The actual text content of the memory

embedding

number[]

1536-dimensional vector representation of the content

userId

string

User identifier for memory isolation

clerkUserId

string

Tenant identifier (API key owner)

metadata

object

Optional custom metadata for filtering and organization

{
  tags: ['preference', 'language'],
  category: 'programming',
  importance: 'high'
}

createdAt

timestamp

ISO 8601 timestamp of when the memory was created

updatedAt

timestamp

ISO 8601 timestamp of the last update

Performance Considerations

Embedding Generation

Latency: ~50-100ms per embedding
Cost: $0.00002 per 1K tokens (very cheap)
Caching: Consider caching embeddings for frequently searched queries

Vector Search

Latency: ~10-50ms for typical datasets
Scalability: IVFFlat index performs well up to millions of vectors
Optimization: Adjust the lists parameter in the index for your dataset size

-- Optimize for larger datasets
CREATE INDEX memories_embedding_idx 
ON memories 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 1000);  -- Increase for more vectors

For datasets under 100K memories, the default configuration performs excellently without any tuning.

Best Practices

Write clear system prompts

Explicitly tell the LLM when to save memories:

system: `Save memories when the user:
- Shares preferences or opinions
- Provides personal information
- Mentions important dates or events
- Expresses goals or intentions`

Pre-fetch context for every message

Always fetch relevant context before calling the LLM, even if you think there might not be relevant memories:

// Always do this
const context = await getContext(config, userMessage);

Use descriptive memory content

Store memories in complete sentences that make sense out of context:

// Good
"User prefers TypeScript over JavaScript for type safety"

// Bad
"prefers TS"  // Too vague

Monitor memory quality

Periodically review stored memories to ensure quality:

const allMemories = await client.getAllMemories();
console.log('Total memories:', allMemories.length);

Next Steps

Authentication

Learn how API keys and tenant isolation work

Memory Isolation

Understand multi-tenant data separation

Integration Guide

See complete integration examples

API Reference

Explore the search API in detail

Getting Started

Core Concepts

Integration Guides

Examples

Help

Overview

Memory Lifecycle

Vector Embeddings Explained

How Embeddings Work

Semantic Search vs Keyword Search

Keyword Search

Semantic Search

Search Parameters

Context Injection Patterns

Pattern 1: Pre-fetch and Inject (Recommended)

Pattern 2: Tool-based Search (Advanced)

Memory Storage Format

Performance Considerations

Embedding Generation

Vector Search

Best Practices

Next Steps

Authentication

Memory Isolation

Integration Guide

API Reference

Getting Started

Core Concepts

Integration Guides

Examples

Help

​Overview

​Memory Lifecycle

​Vector Embeddings Explained

​How Embeddings Work

​Semantic Search vs Keyword Search

Keyword Search

Semantic Search

​Search Parameters

​Context Injection Patterns

​Pattern 1: Pre-fetch and Inject (Recommended)

​Pattern 2: Tool-based Search (Advanced)

​Memory Storage Format

​Performance Considerations

​Embedding Generation

​Vector Search

​Best Practices

​Next Steps

Authentication

Memory Isolation

Integration Guide

API Reference

Overview

Memory Lifecycle

Vector Embeddings Explained

How Embeddings Work

Semantic Search vs Keyword Search

Search Parameters

Context Injection Patterns

Pattern 1: Pre-fetch and Inject (Recommended)

Pattern 2: Tool-based Search (Advanced)

Memory Storage Format

Performance Considerations

Embedding Generation

Vector Search

Best Practices

Next Steps