Skip to main content

Overview

Omi uses a dual-collection architecture for storing user data:

Conversations

Primary storage for recorded interactions - transcripts, audio, structured summaries

Memories

Secondary storage for extracted facts/learnings FROM conversations
This separation allows for efficient retrieval of both full conversation context and quick access to key facts about the user.

Architecture Diagram

Firestore Structure

users/
├── {uid}/
│   ├── conversations/                    # PRIMARY - Recorded interactions
│   │   └── {conversation_id}/
│   │       ├── id
│   │       ├── created_at
│   │       ├── started_at
│   │       ├── finished_at
│   │       ├── source
│   │       ├── language
│   │       ├── status
│   │       ├── structured
│   │       ├── transcript_segments
│   │       ├── geolocation
│   │       ├── photos/ (subcollection)
│   │       ├── audio_files
│   │       ├── apps_results
│   │       ├── discarded
│   │       ├── visibility
│   │       ├── is_locked
│   │       └── data_protection_level
│   │
│   ├── memories/                         # SECONDARY - Extracted facts
│   │   └── {memory_id}/
│   │       ├── id
│   │       ├── uid
│   │       ├── conversation_id
│   │       ├── content
│   │       ├── category
│   │       ├── tags
│   │       ├── visibility
│   │       ├── created_at
│   │       ├── updated_at
│   │       ├── reviewed
│   │       ├── user_review
│   │       ├── scoring
│   │       └── data_protection_level
│   │
│   └── action_items/                     # Standalone action items
│       └── {action_item_id}/
│           ├── description
│           ├── completed
│           ├── conversation_id
│           ├── created_at
│           ├── due_at
│           └── completed_at

Part 1: Storing Conversations

Processing Flow

API Request

The app sends a POST request to /v1/conversations with transcript data

Processing

process_conversation() in utils/conversations/process_conversation.py handles the logic

Structure Extraction

LLM extracts title, overview, action items, and events from the transcript

Storage

upsert_conversation() in database/conversations.py saves to Firestore

Vector Embedding

Conversation is embedded and stored in Pinecone for semantic search

Conversation Model Fields

FieldTypeDescription
idstringUnique conversation identifier
created_atdatetimeWhen the conversation record was created
started_atdatetimeWhen the actual conversation started
finished_atdatetimeWhen the conversation ended
sourceenumSource device (omi, phone, desktop, openglass, etc.)
languagestringLanguage code of the conversation
statusenumProcessing status: in_progress, processing, completed, failed
structuredobjectExtracted structured information (see below)
transcript_segmentsarrayList of transcript segments
geolocationobjectLocation data (latitude, longitude, address)
photosarrayPhotos captured during conversation
audio_filesarrayAudio file references
apps_resultsarrayResults from summarization apps
external_dataobjectData from external integrations
discardedbooleanWhether conversation was marked as low-quality
visibilityenumprivate, shared, or public
is_lockedbooleanWhether conversation is locked from editing
data_protection_levelstringstandard or enhanced (encrypted)

Structured Information

The structured field contains LLM-extracted information:
FieldTypeDescription
titlestringShort descriptive title for the conversation
overviewstringSummary of key points discussed
emojistringEmoji representing the conversation
categoryenumCategory (personal, work, health, etc.)
action_itemsarrayTasks or to-dos mentioned
eventsarrayCalendar events to be created

Transcript Segments

Each segment in transcript_segments includes:
FieldTypeDescription
textstringTranscribed text content
speakerstringSpeaker label (e.g., “SPEAKER_00”)
startfloatStart time in seconds
endfloatEnd time in seconds
is_userbooleanWhether spoken by the device owner
person_idstringID of identified person (if matched)

Action Items

Action items are stored both inline (in structured.action_items) and in a standalone collection:
FieldTypeDescription
descriptionstringThe action item text
completedbooleanWhether the item is done
created_atdatetimeWhen extracted
due_atdatetimeOptional due date
completed_atdatetimeWhen marked complete
conversation_idstringSource conversation

Events

Calendar events extracted from conversations:
FieldTypeDescription
titlestringEvent title
descriptionstringEvent description
startdatetimeStart date/time
durationintegerDuration in minutes
createdbooleanWhether added to calendar

Part 2: Extracting & Storing Memories

Memories are facts about the user extracted from conversations. They represent learnings, preferences, habits, and other personal information.

Memory Extraction Process

During process_conversation(), the system:

Analyze Transcript

Reviews the conversation transcript for personal information

Extract Facts

Identifies facts worth remembering about the user (~15 words max)

Store with Link

Saves to memories collection with a link back to the source conversation

Memory Model Fields

FieldTypeDescription
idstringUnique memory identifier
uidstringUser ID
conversation_idstringSource conversation (links back)
contentstringThe actual fact/learning (max ~15 words)
categoryenuminteresting, system, or manual
tagsarrayCategorization tags
visibilitystringprivate or public
created_atdatetimeWhen memory was created
updated_atdatetimeLast modification time
reviewedbooleanWhether user has reviewed
user_reviewbooleanUser’s approval (true/false/null)
editedbooleanWhether user edited the content
scoringstringRanking score for retrieval
manually_addedbooleanWhether user created manually
is_lockedbooleanPrevent automatic deletion
app_idstringSource app (if from integration)
data_protection_levelstringEncryption level

Memory Categories

Interesting

Notable facts about the user: hobbies, opinions, stories

System

Preferences and patterns: work habits, sleep schedule

Manual

User-created memories: explicitly added facts
Legacy categories (core, hobbies, lifestyle, interests, habits, work, skills, learnings, other) are automatically mapped to the new primary categories for backward compatibility.

Memory Extraction Rules

The system follows these guidelines when extracting memories:
  • Maximum ~15 words per memory
  • Must pass the “shareability test” - would this be worth telling someone?
  • Maximum 2 interesting + 2 system memories per conversation
  • No duplicate or near-duplicate facts
  • Skip mundane details (eating, sleeping, commuting)

Part 3: Data Protection & Encryption

Both conversations and memories support encryption for sensitive data.

Standard Protection Level

No encryption, stored as plaintext. This is the default for most users.
  • Fastest read/write performance
  • Data visible in Firestore console
  • Suitable for general use

Implementation

# Conversations: database/conversations.py
def _prepare_conversation_for_write(conversation_data, data_protection_level):
    if data_protection_level == 'enhanced':
        # Encrypt transcript_segments before storage
        ...

def _prepare_conversation_for_read(conversation_data, data_protection_level):
    if data_protection_level == 'enhanced':
        # Decrypt transcript_segments after retrieval
        ...

Part 4: Vector Embeddings

Conversations are also stored as vector embeddings in Pinecone for semantic search.

What Gets Embedded

DataEmbedded?Stored in Metadata?
TitleYesNo
OverviewYesNo
Action ItemsYesNo
Full TranscriptNo (too large)No
People MentionedNoYes
TopicsNoYes
EntitiesNoYes
created_atNoYes

Vector Creation

Vectors are created in a background thread after conversation processing:
# utils/conversations/process_conversation.py
threading.Thread(
    target=save_structured_vector,
    args=(uid, conversation)
).start()
The save_structured_vector() function:
  1. Generates embedding from conversation.structured (title + overview + action_items + events)
  2. Extracts metadata via LLM (people, topics, entities, dates)
  3. Upserts to Pinecone with metadata filters
Vectors are created ONCE during initial processing. Reprocessed conversations do NOT update their vectors.

Key Code Locations

ComponentFile Path
Conversation Modelbackend/models/conversation.py
Memory Modelbackend/models/memories.py
Process Conversationbackend/utils/conversations/process_conversation.py
Database - Conversationsbackend/database/conversations.py
Database - Memoriesbackend/database/memories.py
Router - Conversationsbackend/routers/conversations.py
Router - Memoriesbackend/routers/memories.py
Vector Databasebackend/database/vector_db.py

API Endpoints

Conversations

MethodEndpointDescription
POST/v1/conversationsProcess and store a new conversation
GET/v1/conversationsList user’s conversations
GET/v1/conversations/{id}Get specific conversation
PATCH/v1/conversations/{id}/titleUpdate conversation title
DELETE/v1/conversations/{id}Delete a conversation

Memories

MethodEndpointDescription
POST/v3/memoriesCreate a manual memory
GET/v3/memoriesList user’s memories
PATCH/v3/memories/{id}Edit a memory
DELETE/v3/memories/{id}Delete a memory
PATCH/v3/memories/{id}/visibilityChange memory visibility