> ## Documentation Index
> Fetch the complete documentation index at: https://docs.usesatori.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# How It Works

> Learn how Satori uses embeddings and semantic search to power AI memory

## Overview

Satori provides persistent memory for AI applications through a combination of vector embeddings, semantic search, and intelligent context injection. This page explains the core mechanisms that make memory work.

## Memory Lifecycle

```mermaid theme={null}
graph LR
    UserInput[User Input] --> Search[Semantic Search]
    Search --> Context[Inject Context]
    Context --> LLM[LLM Processing]
    LLM --> Decision{Important Info?}
    Decision -->|Yes| Save[Save Memory]
    Decision -->|No| Response[Stream Response]
    Save --> Response
    Response --> User[User Sees Response]
```

<Steps>
  <Step title="Semantic search for relevant memories">
    When a user sends a message, Satori converts it into a vector embedding and searches for similar memories using cosine similarity.

    ```typescript theme={null}
    const context = await getContext(config, userMessage, { limit: 5 });
    ```

    This finds the 5 most relevant memories based on semantic meaning, not just keyword matching.
  </Step>

  <Step title="Context injection into system prompt">
    Retrieved memories are formatted and injected into the system prompt:

    ```typescript theme={null}
    system: `You are a helpful assistant with memory.

    What you know about this user:
    ${memoryContext}

    Use add_item to save important information.`
    ```
  </Step>

  <Step title="LLM processes with context">
    The language model receives both the current message and relevant historical context, allowing it to provide personalized responses.
  </Step>

  <Step title="Automatic memory storage">
    When the LLM detects important information, it calls the `add_item` tool:

    ```typescript theme={null}
    // LLM automatically calls this
    add_item({
      memory: "User prefers TypeScript over JavaScript"
    })
    ```
  </Step>
</Steps>

## Vector Embeddings Explained

Embeddings are numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings.

### How Embeddings Work

<Tabs>
  <Tab title="Concept">
    When you save a memory like "I love TypeScript", Satori:

    1. Sends the text to OpenAI's embedding model
    2. Receives a 1536-dimensional vector (array of numbers)
    3. Stores both the text and vector in PostgreSQL with pgvector

    Later, when searching for "programming preferences", the query is also converted to a vector and compared using cosine similarity.
  </Tab>

  <Tab title="Example">
    ```typescript theme={null}
    // Memory: "I love TypeScript"
    // Embedding: [0.023, -0.145, 0.891, ..., 0.234] (1536 numbers)

    // Query: "What languages does the user like?"
    // Query embedding: [0.019, -0.142, 0.887, ..., 0.229]

    // Cosine similarity: 0.94 (very similar!)
    // This memory will be returned as relevant
    ```
  </Tab>

  <Tab title="Technical Details">
    * **Model**: `text-embedding-3-small` (OpenAI)
    * **Dimensions**: 1536
    * **Similarity metric**: Cosine similarity
    * **Index type**: IVFFlat (pgvector)
    * **Storage**: PostgreSQL with pgvector extension
  </Tab>
</Tabs>

<Info>
  Embeddings capture meaning, not just keywords. "I prefer TS" and "TypeScript is my favorite" will have similar embeddings even though they share no common words.
</Info>

## Semantic Search vs Keyword Search

<CardGroup cols={2}>
  <Card title="Keyword Search" icon="text">
    **Query:** "programming languages"

    **Matches:**

    * "I like programming languages"
    * "Programming languages are fun"

    **Misses:**

    * "I prefer TypeScript" ❌
    * "Python is my favorite" ❌
  </Card>

  <Card title="Semantic Search" icon="brain">
    **Query:** "programming languages"

    **Matches:**

    * "I like programming languages"
    * "I prefer TypeScript" ✅
    * "Python is my favorite" ✅
    * "I'm learning Rust" ✅
  </Card>
</CardGroup>

## Search Parameters

When searching for memories, you can control the results:

<ParamField query="query" type="string" required>
  The natural language query to search for. This is converted to an embedding and compared against stored memories.
</ParamField>

<ParamField query="limit" type="number" default="10">
  Maximum number of memories to return. Range: 1-100.

  ```typescript theme={null}
  // Get top 5 most relevant memories
  const context = await getContext(config, query, { limit: 5 });
  ```
</ParamField>

<ParamField query="threshold" type="number" default="0.7">
  Minimum similarity score (0-1) for a memory to be considered relevant. Higher values = stricter matching.

  ```typescript theme={null}
  // Only return very similar memories
  const context = await getContext(config, query, { 
    threshold: 0.85 
  });
  ```
</ParamField>

<Tip>
  Start with the default threshold (0.7) and adjust based on your needs. Lower values (0.6) cast a wider net, higher values (0.85) are more precise.
</Tip>

## Context Injection Patterns

There are two main approaches to using memory context:

### Pattern 1: Pre-fetch and Inject (Recommended)

```typescript theme={null}
// Fetch memories before LLM call
const memoryContext = await getContext(config, userMessage);

// Inject into system prompt
const result = await streamText({
  model: openai('gpt-4o'),
  system: `You are a helpful assistant.
  
What you know about this user:
${memoryContext}`,
  messages,
  tools,
});
```

<Check>
  **Pros:** Reliable, predictable, works with all models
</Check>

### Pattern 2: Tool-based Search (Advanced)

```typescript theme={null}
// Let the LLM decide when to search
const tools = {
  ...memoryTools(config),
  search_memory: tool({
    description: 'Search for relevant memories',
    parameters: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      return await client.searchMemories(query);
    },
  }),
};
```

<Warning>
  This pattern is less reliable because the LLM may not always call the search tool when needed. Use Pattern 1 for production applications.
</Warning>

## Memory Storage Format

Each memory is stored with rich metadata:

<ResponseField name="id" type="string">
  Unique identifier (UUID) for the memory
</ResponseField>

<ResponseField name="content" type="string">
  The actual text content of the memory
</ResponseField>

<ResponseField name="embedding" type="number[]">
  1536-dimensional vector representation of the content
</ResponseField>

<ResponseField name="userId" type="string">
  User identifier for memory isolation
</ResponseField>

<ResponseField name="clerkUserId" type="string">
  Tenant identifier (API key owner)
</ResponseField>

<ResponseField name="metadata" type="object">
  Optional custom metadata for filtering and organization

  ```typescript theme={null}
  {
    tags: ['preference', 'language'],
    category: 'programming',
    importance: 'high'
  }
  ```
</ResponseField>

<ResponseField name="createdAt" type="timestamp">
  ISO 8601 timestamp of when the memory was created
</ResponseField>

<ResponseField name="updatedAt" type="timestamp">
  ISO 8601 timestamp of the last update
</ResponseField>

## Performance Considerations

### Embedding Generation

* **Latency**: \~50-100ms per embedding
* **Cost**: \$0.00002 per 1K tokens (very cheap)
* **Caching**: Consider caching embeddings for frequently searched queries

### Vector Search

* **Latency**: \~10-50ms for typical datasets
* **Scalability**: IVFFlat index performs well up to millions of vectors
* **Optimization**: Adjust the `lists` parameter in the index for your dataset size

```sql theme={null}
-- Optimize for larger datasets
CREATE INDEX memories_embedding_idx 
ON memories 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 1000);  -- Increase for more vectors
```

<Tip>
  For datasets under 100K memories, the default configuration performs excellently without any tuning.
</Tip>

## Best Practices

<AccordionGroup>
  <Accordion title="Write clear system prompts">
    Explicitly tell the LLM when to save memories:

    ```typescript theme={null}
    system: `Save memories when the user:
    - Shares preferences or opinions
    - Provides personal information
    - Mentions important dates or events
    - Expresses goals or intentions`
    ```
  </Accordion>

  <Accordion title="Pre-fetch context for every message">
    Always fetch relevant context before calling the LLM, even if you think there might not be relevant memories:

    ```typescript theme={null}
    // Always do this
    const context = await getContext(config, userMessage);
    ```
  </Accordion>

  <Accordion title="Use descriptive memory content">
    Store memories in complete sentences that make sense out of context:

    ```typescript theme={null}
    // Good
    "User prefers TypeScript over JavaScript for type safety"

    // Bad
    "prefers TS"  // Too vague
    ```
  </Accordion>

  <Accordion title="Monitor memory quality">
    Periodically review stored memories to ensure quality:

    ```typescript theme={null}
    const allMemories = await client.getAllMemories();
    console.log('Total memories:', allMemories.length);
    ```
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Authentication" icon="key" href="/concepts/authentication">
    Learn how API keys and tenant isolation work
  </Card>

  <Card title="Memory Isolation" icon="shield" href="/concepts/memory-isolation">
    Understand multi-tenant data separation
  </Card>

  <Card title="Integration Guide" icon="code" href="/guides/vercel-ai-sdk">
    See complete integration examples
  </Card>

  <Card title="API Reference" icon="book" href="/api-reference/memory/search">
    Explore the search API in detail
  </Card>
</CardGroup>
