Skip to main content

Overview

Omi’s chat system is a sophisticated agentic AI pipeline that enables users to have intelligent conversations about their recorded memories, calendar events, health data, and more. This document provides a complete technical understanding of how questions flow through the system.

Classifies

Determines if context is needed

Routes

Simple, Agentic, or Persona path

Tools

22+ integrated data sources

Retrieves

Vector search & metadata filters

Cites

Links to source conversations

Streams

Real-time thinking & response

System Architecture Diagram

The Three Routing Paths

Path 1: No Context Conversation

When triggered: Simple greetings, general advice, brainstorming questionsClassification criteria (from requires_context() function):
  • Greetings: “Hi”, “Hello”, “How are you?”
  • General knowledge: “What’s the capital of France?”
  • Advice without personal context: “Tips for productivity”
Processing:
# Location: backend/utils/retrieval/graph.py
def no_context_conversation(state: ChatState, config: RunnableConfig):
    # Direct LLM call without tool access
    # Fast response, no memory retrieval
This path provides the fastest responses since no external data retrieval is needed.

Classification Logic

The requires_context() function determines the routing path:
# Location: backend/utils/retrieval/graph.py
def requires_context(messages: list) -> bool:
    """
    Uses GPT-4-mini for fast classification.

    Returns True if question needs:
    - Personal memories/conversations
    - Calendar/email/health data
    - Temporal context
    - User-specific information
    """

The Agentic Tool System

How Tool Calling Works

The LangGraph ReAct agent follows this cycle:

Receive Question

System prompt provides tool descriptions, user’s timezone, and citation instructions

Decide Tools

LLM autonomously decides which tool(s) to call based on question intent

Execute Tools

Tool calls are executed and results returned to the agent

Synthesize or Continue

Agent synthesizes response OR makes additional tool calls if more context needed

Generate Answer

Final answer generated with proper [1][2] citations linking to source conversations

Available Tools (22+)

Tools are loaded dynamically based on user’s enabled integrations and installed apps.
Core tools for retrieving user’s conversations and extracted memories.
ToolPurposeKey Parameters
get_conversations_toolRetrieve by date rangestart_date, end_date, limit, include_transcript
vector_search_conversations_toolSemantic searchquery, start_date, end_date, limit
search_conversations_toolFull-text keyword searchquery, limit, offset
get_memories_toolPersonal facts about userlimit, offset
Manage tasks and to-dos extracted from conversations.
ToolPurpose
get_action_items_toolRetrieve pending tasks
create_action_item_toolCreate new task
update_action_item_toolMark complete/update
Full CRUD operations on user’s Google Calendar.
ToolPurpose
get_calendar_events_toolFetch events by date/person
create_calendar_event_toolCreate meetings with attendees
update_calendar_event_toolModify existing events
delete_calendar_event_toolCancel meetings
Connect to external services for richer context.
ToolServicePurpose
get_gmail_messages_toolGmailSearch emails
get_whoop_sleep_toolWhoopSleep data
get_whoop_recovery_toolWhoopRecovery scores
get_whoop_workout_toolWhoopWorkout history
search_notion_pages_toolNotionSearch workspace
get_twitter_tweets_toolTwitter/XRecent tweets
get_github_pull_requests_toolGitHubOpen PRs
get_github_issues_toolGitHubOpen issues
perplexity_search_toolPerplexityWeb search
Third-party apps can define custom tools that become available when users enable them.
# Location: backend/utils/retrieval/tools/app_tools.py
def load_app_tools(uid: str) -> List[Callable]:
    """
    Loads tools from user's enabled apps.
    Each app can define chat_tools in its configuration.
    """
See Chat Tools for Apps to learn how to build custom tools.

Safety Guards

# Maximum 10 tool calls per question (prevents runaway loops)
# Maximum 500K tokens in context (prevents context overflow)
# 30-second timeout per external API call

Vector Search Deep Dive

Configuration

SettingValue
DatabasePinecone (serverless)
Embedding Modeltext-embedding-3-large (OpenAI)
Vector Dimensions3,072
Namespace"ns1"
Vector ID Format{uid}-{conversation_id}

What Gets Embedded vs Stored as Metadata

DataEmbedded?Metadata?
TitleYesNo
Overview/SummaryYesNo
Action ItemsYesNo
Full TranscriptNo (too large)No
People MentionedNoYes
TopicsNoYes
EntitiesNoYes
Dates MentionedNoYes
created_atNoYes (Unix timestamp)

Vector Creation (Write Path)

# Location: backend/utils/conversations/process_conversation.py
# Triggered after conversation processing completes

def save_structured_vector(uid: str, conversation: Conversation):
    """
    1. Generate embedding from conversation.structured
       (title + overview + action_items + events)
    2. Extract metadata via LLM (people, topics, entities, dates)
    3. Upsert to Pinecone with metadata
    """
Vectors are created ONCE during initial processing, not on every edit. Reprocessed conversations do NOT create new vectors.

Vector Query (Read Path)

# Location: backend/database/vector_db.py

def query_vectors(query: str, uid: str, starts_at: int, ends_at: int, k: int):
    """
    1. Embed query using text-embedding-3-large
    2. Query Pinecone with uid filter and optional date range
    3. Return top-k conversation IDs ranked by similarity
    """

def query_vectors_by_metadata(uid, vector, dates_filter, people, topics, entities, dates, limit):
    """
    Advanced query with metadata filters.
    Includes fallback: if no results with filters, retries without them.
    """

Memories System

Memories are distinct from Conversations. They are structured facts extracted about the user over time.

Memory Categories

CategoryExamples
interestingHobbies, opinions, stories
systemPreferences, habits
manualUser-defined facts

Extraction Rules

# From backend/utils/llm/chat.py

# Maximum 15 words per memory
# Must pass "shareability test" - worth telling someone
# Max 2 interesting + 2 system memories per conversation
# NO duplicate/near-duplicate facts
# NO mundane details (eating, sleeping, commuting)

Memory Retrieval in Chat

# Tool: get_memories_tool
# Returns formatted list of known facts about user
# Used when questions ask "What do you know about me?"

Chat Sessions & Context

Session Structure

# Location: backend/database/chat.py

# Chat sessions group related messages
# Each session tracks:
# - message_ids: List of message IDs
# - file_ids: Uploaded files for this session
# - openai_thread_id: For file-based chat

Context Window

# Last 10 messages included in context
# Enables follow-up questions without re-stating context
# Older messages summarized or excluded

Citation System

The LLM generates citations in [1][2] format:
# Citation rules:
# - No space before citation: "discussed this[1]" not "discussed this [1]"
# - Citations map to conversation IDs
# - Post-processing extracts citations → memories_id field
# - Frontend displays linked conversation cards

System Prompt Structure

The main system prompt includes:
# Location: backend/utils/llm/chat.py - _get_agentic_qa_prompt()

# 1. Current datetime in user's timezone
# 2. Tool usage instructions
# 3. DateTime formatting rules for tool calls
# 4. Conversation retrieval strategies (5-step strategy)
# 5. Citation format instructions
# 6. Memory extraction guidelines

DateTime Formatting Rules

Critical for correct tool behavior. All dates must use ISO format with timezone.
# Good: "2024-01-19T00:00:00-08:00"
# Bad: "yesterday", "last week" (must be converted)

# The system prompt instructs the LLM to convert relative
# references to absolute ISO timestamps before tool calls

Conversation Retrieval Strategy

The system prompt guides the LLM through a 5-step strategy:
  1. Assess the question - Determine type (temporal, topic, person, etc.)
  2. Choose primary tool - get_conversations for date-based, vector_search for topic-based
  3. Apply filters - Use start_date/end_date when temporal bounds are known
  4. Request transcripts - Only when detailed content is needed
  5. Cite sources - Always cite conversations used in the answer

LLM Models Used

ModelUse CaseLocation
gpt-4.1-miniFast classification, date extractionrequires_context(), filters
gpt-4.1Medium complexity, initial QAQA with RAG context
gpt-5.1Agentic workflows with tool callingMain chat agent
text-embedding-3-largeVector embeddings (3,072 dims)Pinecone queries
Gemini Flash 1.5Persona responsesVia OpenRouter
Claude 3.5 SonnetPersona responsesVia OpenRouter

Streaming Response Format

The backend streams responses in Server-Sent Events (SSE) format:
think: Searching conversations        # Tool call indicator
data: Yesterday you discussed...      # Response text chunks
done: {base64 encoded JSON}           # Final message with metadata
The Flutter app parses these to show:
  • Loading indicators with tool names
  • Streaming response text
  • Final message with linked memories

Key File Locations

ComponentFile Path
Chat Routerbackend/routers/chat.py
LangGraph Routerbackend/utils/retrieval/graph.py
Agentic Systembackend/utils/retrieval/agentic.py
Tools Directorybackend/utils/retrieval/tools/
Conversation Toolsbackend/utils/retrieval/tools/conversation_tools.py
Memory Toolsbackend/utils/retrieval/tools/memory_tools.py
Calendar Toolsbackend/utils/retrieval/tools/calendar_tools.py
App Tools Loaderbackend/utils/retrieval/tools/app_tools.py
LLM Clientsbackend/utils/llm/clients.py
Chat Promptsbackend/utils/llm/chat.py
Vector Databasebackend/database/vector_db.py

Example: Question Flow

User asks: “What did I discuss with John yesterday about the project?”

Classification

requires_context()TRUE (temporal + person + topic reference)Route to: agentic_context_dependent_conversation

Agent Decides Tools

System prompt provides: current datetime, tool descriptionsAgent thinks: “Need conversations from yesterday about project with John”Agent calls: vector_search_conversations_tool
  • query: “John project discussion”
  • start_date: “2024-01-19T00:00:00-08:00”
  • end_date: “2024-01-19T23:59:59-08:00”

Tool Execution

  1. Embed query → [0.012, -0.034, 0.056, ...]
  2. Query Pinecone with uid filter + date range
  3. Fetch full conversations from Firestore
  4. Format for LLM context

Response Generation

LLM synthesizes answer with citations:“Yesterday you discussed the Q1 roadmap with John[1]. He mentioned the frontend refactoring is ahead of schedule[1][2]…”

Post-Processing

  1. Extract citations → memories_id: ["conv_456", "conv_789"]
  2. Save message to Firestore
  3. Stream final response with linked conversation cards