Chat System Architecture

Overview

Omi’s chat system is a sophisticated agentic AI pipeline that enables users to have intelligent conversations about their recorded memories, calendar events, health data, and more. This document provides a complete technical understanding of how questions flow through the system.

Classifies

Determines if context is needed

Routes

Simple, Agentic, or Persona path

Tools

22+ integrated data sources

Retrieves

Vector search & metadata filters

Cites

Links to source conversations

Streams

Real-time thinking & response

System Architecture Diagram

The Three Routing Paths

No Context
Agentic
Persona

Path 1: No Context Conversation

When triggered: Simple greetings, general advice, brainstorming questionsClassification criteria (from requires_context() function):

Greetings: “Hi”, “Hello”, “How are you?”
General knowledge: “What’s the capital of France?”
Advice without personal context: “Tips for productivity”

Processing:

# Location: backend/utils/retrieval/graph.py
def no_context_conversation(state: ChatState, config: RunnableConfig):
    # Direct LLM call without tool access
    # Fast response, no memory retrieval

This path provides the fastest responses since no external data retrieval is needed.

Path 2: Agentic Context-Dependent Conversation

When triggered: Questions requiring personal data, temporal queries, integration lookupsClassification criteria:

References to “my”, “I”, personal data
Temporal references: “yesterday”, “last week”, “this morning”
Questions about conversations, memories, calendar, health
Requests involving connected services

Processing:

# Location: backend/utils/retrieval/agentic.py
# LangGraph ReAct agent with full tool access
# LLM autonomously decides which tools to call
# Can make multiple tool calls to gather comprehensive context

This is the most powerful path - the LLM can call 22+ tools to gather comprehensive context before answering.

Path 3: Persona Question

When triggered: Questions directed at persona-based apps (e.g., “Ask Einstein”)Processing:

Uses the app’s configured persona_prompt
Character-consistent responses
May have limited tool access based on app configuration

Persona apps can customize which tools are available, allowing for focused conversational experiences.

Classification Logic

The requires_context() function determines the routing path:

# Location: backend/utils/retrieval/graph.py
def requires_context(messages: list) -> bool:
    """
    Uses GPT-4-mini for fast classification.

    Returns True if question needs:
    - Personal memories/conversations
    - Calendar/email/health data
    - Temporal context
    - User-specific information
    """

The Agentic Tool System

How Tool Calling Works

The LangGraph ReAct agent follows this cycle:

Receive Question

System prompt provides tool descriptions, user’s timezone, and citation instructions

Decide Tools

LLM autonomously decides which tool(s) to call based on question intent

Execute Tools

Tool calls are executed and results returned to the agent

Synthesize or Continue

Agent synthesizes response OR makes additional tool calls if more context needed

Generate Answer

Final answer generated with proper [1][2] citations linking to source conversations

Available Tools (22+)

Tools are loaded dynamically based on user’s enabled integrations and installed apps.

Conversation & Memory Tools

Core tools for retrieving user’s conversations and extracted memories.

Tool	Purpose	Key Parameters
`get_conversations_tool`	Retrieve by date range	`start_date`, `end_date`, `limit`, `include_transcript`
`vector_search_conversations_tool`	Semantic search	`query`, `start_date`, `end_date`, `limit`
`search_conversations_tool`	Full-text keyword search	`query`, `limit`, `offset`
`get_memories_tool`	Personal facts about user	`limit`, `offset`

Action Item Tools

Manage tasks and to-dos extracted from conversations.

Tool	Purpose
`get_action_items_tool`	Retrieve pending tasks
`create_action_item_tool`	Create new task
`update_action_item_tool`	Mark complete/update

Calendar Tools (Google Calendar)

Full CRUD operations on user’s Google Calendar.

Tool	Purpose
`get_calendar_events_tool`	Fetch events by date/person
`create_calendar_event_tool`	Create meetings with attendees
`update_calendar_event_tool`	Modify existing events
`delete_calendar_event_tool`	Cancel meetings

Integration Tools

Connect to external services for richer context.

Tool	Service	Purpose
`get_gmail_messages_tool`	Gmail	Search emails
`get_whoop_sleep_tool`	Whoop	Sleep data
`get_whoop_recovery_tool`	Whoop	Recovery scores
`get_whoop_workout_tool`	Whoop	Workout history
`search_notion_pages_tool`	Notion	Search workspace
`get_twitter_tweets_tool`	Twitter/X	Recent tweets
`get_github_pull_requests_tool`	GitHub	Open PRs
`get_github_issues_tool`	GitHub	Open issues
`perplexity_search_tool`	Perplexity	Web search

Dynamic App Tools

Third-party apps can define custom tools that become available when users enable them.

# Location: backend/utils/retrieval/tools/app_tools.py
def load_app_tools(uid: str) -> List[Callable]:
    """
    Loads tools from user's enabled apps.
    Each app can define chat_tools in its configuration.
    """

See Chat Tools for Apps to learn how to build custom tools.

Safety Guards

# Maximum 10 tool calls per question (prevents runaway loops)
# Maximum 500K tokens in context (prevents context overflow)
# 30-second timeout per external API call

Vector Search Deep Dive

Configuration

Setting	Value
Database	Pinecone (serverless)
Embedding Model	`text-embedding-3-large` (OpenAI)
Vector Dimensions	3,072
Namespace	`"ns1"`
Vector ID Format	`{uid}-{conversation_id}`

What Gets Embedded vs Stored as Metadata

Data	Embedded?	Metadata?
Title	Yes	No
Overview/Summary	Yes	No
Action Items	Yes	No
Full Transcript	No (too large)	No
People Mentioned	No	Yes
Topics	No	Yes
Entities	No	Yes
Dates Mentioned	No	Yes
`created_at`	No	Yes (Unix timestamp)

Vector Creation (Write Path)

# Location: backend/utils/conversations/process_conversation.py
# Triggered after conversation processing completes

def save_structured_vector(uid: str, conversation: Conversation):
    """
    1. Generate embedding from conversation.structured
       (title + overview + action_items + events)
    2. Extract metadata via LLM (people, topics, entities, dates)
    3. Upsert to Pinecone with metadata
    """

Vectors are created ONCE during initial processing, not on every edit. Reprocessed conversations do NOT create new vectors.

Vector Query (Read Path)

# Location: backend/database/vector_db.py

def query_vectors(query: str, uid: str, starts_at: int, ends_at: int, k: int):
    """
    1. Embed query using text-embedding-3-large
    2. Query Pinecone with uid filter and optional date range
    3. Return top-k conversation IDs ranked by similarity
    """

def query_vectors_by_metadata(uid, vector, dates_filter, people, topics, entities, dates, limit):
    """
    Advanced query with metadata filters.
    Includes fallback: if no results with filters, retries without them.
    """

Memories System

Memories are distinct from Conversations. They are structured facts extracted about the user over time.

Memory Categories

Category	Examples
`interesting`	Hobbies, opinions, stories
`system`	Preferences, habits
`manual`	User-defined facts

Extraction Rules

# From backend/utils/llm/chat.py

# Maximum 15 words per memory
# Must pass "shareability test" - worth telling someone
# Max 2 interesting + 2 system memories per conversation
# NO duplicate/near-duplicate facts
# NO mundane details (eating, sleeping, commuting)

Memory Retrieval in Chat

# Tool: get_memories_tool
# Returns formatted list of known facts about user
# Used when questions ask "What do you know about me?"

Chat Sessions & Context

Session Structure

# Location: backend/database/chat.py

# Chat sessions group related messages
# Each session tracks:
# - message_ids: List of message IDs
# - file_ids: Uploaded files for this session
# - openai_thread_id: For file-based chat

Context Window

# Last 10 messages included in context
# Enables follow-up questions without re-stating context
# Older messages summarized or excluded

Citation System

The LLM generates citations in [1][2] format:

# Citation rules:
# - No space before citation: "discussed this[1]" not "discussed this [1]"
# - Citations map to conversation IDs
# - Post-processing extracts citations → memories_id field
# - Frontend displays linked conversation cards

System Prompt Structure

The main system prompt includes:

# Location: backend/utils/llm/chat.py - _get_agentic_qa_prompt()

# 1. Current datetime in user's timezone
# 2. Tool usage instructions
# 3. DateTime formatting rules for tool calls
# 4. Conversation retrieval strategies (5-step strategy)
# 5. Citation format instructions
# 6. Memory extraction guidelines

DateTime Formatting Rules

Critical for correct tool behavior. All dates must use ISO format with timezone.

# Good: "2024-01-19T00:00:00-08:00"
# Bad: "yesterday", "last week" (must be converted)

# The system prompt instructs the LLM to convert relative
# references to absolute ISO timestamps before tool calls

Conversation Retrieval Strategy

The system prompt guides the LLM through a 5-step strategy:

Assess the question - Determine type (temporal, topic, person, etc.)
Choose primary tool - get_conversations for date-based, vector_search for topic-based
Apply filters - Use start_date/end_date when temporal bounds are known
Request transcripts - Only when detailed content is needed
Cite sources - Always cite conversations used in the answer

LLM Models Used

Model	Use Case	Location
`gpt-4.1-mini`	Fast classification, date extraction	`requires_context()`, filters
`gpt-4.1`	Medium complexity, initial QA	QA with RAG context
`gpt-5.1`	Agentic workflows with tool calling	Main chat agent
`text-embedding-3-large`	Vector embeddings (3,072 dims)	Pinecone queries
Gemini Flash 1.5	Persona responses	Via OpenRouter
Claude 3.5 Sonnet	Persona responses	Via OpenRouter

Streaming Response Format

The backend streams responses in Server-Sent Events (SSE) format:

think: Searching conversations        # Tool call indicator
data: Yesterday you discussed...      # Response text chunks
done: {base64 encoded JSON}           # Final message with metadata

The Flutter app parses these to show:

Loading indicators with tool names
Streaming response text
Final message with linked memories

Key File Locations

Component	File Path
Chat Router	`backend/routers/chat.py`
LangGraph Router	`backend/utils/retrieval/graph.py`
Agentic System	`backend/utils/retrieval/agentic.py`
Tools Directory	`backend/utils/retrieval/tools/`
Conversation Tools	`backend/utils/retrieval/tools/conversation_tools.py`
Memory Tools	`backend/utils/retrieval/tools/memory_tools.py`
Calendar Tools	`backend/utils/retrieval/tools/calendar_tools.py`
App Tools Loader	`backend/utils/retrieval/tools/app_tools.py`
LLM Clients	`backend/utils/llm/clients.py`
Chat Prompts	`backend/utils/llm/chat.py`
Vector Database	`backend/database/vector_db.py`

Example: Question Flow

User asks: “What did I discuss with John yesterday about the project?”

Classification

requires_context() → TRUE (temporal + person + topic reference)Route to: agentic_context_dependent_conversation

Agent Decides Tools

System prompt provides: current datetime, tool descriptionsAgent thinks: “Need conversations from yesterday about project with John”Agent calls: vector_search_conversations_tool

query: “John project discussion”
start_date: “2024-01-19T00:00:00-08:00”
end_date: “2024-01-19T23:59:59-08:00”

Tool Execution

Embed query → [0.012, -0.034, 0.056, ...]
Query Pinecone with uid filter + date range
Fetch full conversations from Firestore
Format for LLM context

Response Generation

LLM synthesizes answer with citations:“Yesterday you discussed the Q1 roadmap with John[1]. He mentioned the frontend refactoring is ahead of schedule[1][2]…”

Post-Processing

Extract citations → memories_id: ["conv_456", "conv_789"]
Save message to Firestore
Stream final response with linked conversation cards

Chat Tools for Apps

Learn how to build custom chat tools for your Omi apps

Storing Conversations

How conversations and memories are stored

Backend Deep Dive

General backend architecture overview

Real-time Transcription

WebSocket transcription and STT providers

Get Started

Core Development

Developer API

MCP Integration

Build Apps

Hardware

DIY Guide

Info

​Overview

Classifies

Routes

Tools

Retrieves

Cites

Streams

​System Architecture Diagram

​The Three Routing Paths

​Path 1: No Context Conversation

​Path 2: Agentic Context-Dependent Conversation

​Path 3: Persona Question

​Classification Logic

​The Agentic Tool System

​How Tool Calling Works

​Available Tools (22+)

​Safety Guards

​Vector Search Deep Dive

​Configuration

​What Gets Embedded vs Stored as Metadata

​Vector Creation (Write Path)

​Vector Query (Read Path)

​Memories System

​Memory Categories

​Extraction Rules

​Memory Retrieval in Chat

​Chat Sessions & Context

​Session Structure

​Context Window

​Citation System

​System Prompt Structure

​DateTime Formatting Rules

​Conversation Retrieval Strategy

​LLM Models Used

​Streaming Response Format

​Key File Locations

​Example: Question Flow

​Related Documentation

Chat Tools for Apps

Storing Conversations

Backend Deep Dive

Real-time Transcription

Overview

System Architecture Diagram

The Three Routing Paths

Path 1: No Context Conversation

Path 2: Agentic Context-Dependent Conversation

Path 3: Persona Question

Classification Logic

The Agentic Tool System

How Tool Calling Works

Available Tools (22+)

Safety Guards

Vector Search Deep Dive

Configuration

What Gets Embedded vs Stored as Metadata

Vector Creation (Write Path)

Vector Query (Read Path)

Memories System

Memory Categories

Extraction Rules

Memory Retrieval in Chat

Chat Sessions & Context

Session Structure

Context Window

Citation System

System Prompt Structure

DateTime Formatting Rules

Conversation Retrieval Strategy

LLM Models Used

Streaming Response Format

Key File Locations

Example: Question Flow

Related Documentation