Overview
Omi uses a dual-collection architecture for storing user data:
Conversations Primary storage for recorded interactions - transcripts, audio, structured summaries
Memories Secondary storage for extracted facts/learnings FROM conversations
This separation allows for efficient retrieval of both full conversation context and quick access to key facts about the user.
Architecture Diagram
Firestore Structure
users/
├── {uid}/
│ ├── conversations/ # PRIMARY - Recorded interactions
│ │ └── {conversation_id}/
│ │ ├── id
│ │ ├── created_at
│ │ ├── started_at
│ │ ├── finished_at
│ │ ├── source
│ │ ├── language
│ │ ├── status
│ │ ├── structured
│ │ ├── transcript_segments
│ │ ├── geolocation
│ │ ├── photos/ (subcollection)
│ │ ├── audio_files
│ │ ├── apps_results
│ │ ├── discarded
│ │ ├── visibility
│ │ ├── is_locked
│ │ └── data_protection_level
│ │
│ ├── memories/ # SECONDARY - Extracted facts
│ │ └── {memory_id}/
│ │ ├── id
│ │ ├── uid
│ │ ├── conversation_id
│ │ ├── content
│ │ ├── category
│ │ ├── tags
│ │ ├── visibility
│ │ ├── created_at
│ │ ├── updated_at
│ │ ├── reviewed
│ │ ├── user_review
│ │ ├── scoring
│ │ └── data_protection_level
│ │
│ └── action_items/ # Standalone action items
│ └── {action_item_id}/
│ ├── description
│ ├── completed
│ ├── conversation_id
│ ├── created_at
│ ├── due_at
│ └── completed_at
Part 1: Storing Conversations
Processing Flow
API Request
The app sends a POST request to /v1/conversations with transcript data
Processing
process_conversation() in utils/conversations/process_conversation.py handles the logic
Structure Extraction
LLM extracts title, overview, action items, and events from the transcript
Storage
upsert_conversation() in database/conversations.py saves to Firestore
Vector Embedding
Conversation is embedded and stored in Pinecone for semantic search
Conversation Model Fields
Field Type Description idstring Unique conversation identifier created_atdatetime When the conversation record was created started_atdatetime When the actual conversation started finished_atdatetime When the conversation ended sourceenum Source device (omi, phone, desktop, openglass, etc.) languagestring Language code of the conversation statusenum Processing status: in_progress, processing, completed, failed structuredobject Extracted structured information (see below) transcript_segmentsarray List of transcript segments geolocationobject Location data (latitude, longitude, address) photosarray Photos captured during conversation audio_filesarray Audio file references apps_resultsarray Results from summarization apps external_dataobject Data from external integrations discardedboolean Whether conversation was marked as low-quality visibilityenum private, shared, or publicis_lockedboolean Whether conversation is locked from editing data_protection_levelstring standard or enhanced (encrypted)
The structured field contains LLM-extracted information:
Field Type Description titlestring Short descriptive title for the conversation overviewstring Summary of key points discussed emojistring Emoji representing the conversation categoryenum Category (personal, work, health, etc.) action_itemsarray Tasks or to-dos mentioned eventsarray Calendar events to be created
Transcript Segments
Each segment in transcript_segments includes:
Field Type Description textstring Transcribed text content speakerstring Speaker label (e.g., “SPEAKER_00”) startfloat Start time in seconds endfloat End time in seconds is_userboolean Whether spoken by the device owner person_idstring ID of identified person (if matched)
Action Items
Action items are stored both inline (in structured.action_items) and in a standalone collection:
Field Type Description descriptionstring The action item text completedboolean Whether the item is done created_atdatetime When extracted due_atdatetime Optional due date completed_atdatetime When marked complete conversation_idstring Source conversation
Events
Calendar events extracted from conversations:
Field Type Description titlestring Event title descriptionstring Event description startdatetime Start date/time durationinteger Duration in minutes createdboolean Whether added to calendar
Memories are facts about the user extracted from conversations. They represent learnings, preferences, habits, and other personal information.
During process_conversation(), the system:
Analyze Transcript
Reviews the conversation transcript for personal information
Extract Facts
Identifies facts worth remembering about the user (~15 words max)
Store with Link
Saves to memories collection with a link back to the source conversation
Memory Model Fields
Field Type Description idstring Unique memory identifier uidstring User ID conversation_idstring Source conversation (links back) contentstring The actual fact/learning (max ~15 words) categoryenum interesting, system, or manualtagsarray Categorization tags visibilitystring private or publiccreated_atdatetime When memory was created updated_atdatetime Last modification time reviewedboolean Whether user has reviewed user_reviewboolean User’s approval (true/false/null) editedboolean Whether user edited the content scoringstring Ranking score for retrieval manually_addedboolean Whether user created manually is_lockedboolean Prevent automatic deletion app_idstring Source app (if from integration) data_protection_levelstring Encryption level
Memory Categories
Interesting Notable facts about the user: hobbies, opinions, stories
System Preferences and patterns: work habits, sleep schedule
Manual User-created memories: explicitly added facts
Legacy categories (core, hobbies, lifestyle, interests, habits, work, skills, learnings, other) are automatically mapped to the new primary categories for backward compatibility.
The system follows these guidelines when extracting memories:
Maximum ~15 words per memory
Must pass the “shareability test” - would this be worth telling someone?
Maximum 2 interesting + 2 system memories per conversation
No duplicate or near-duplicate facts
Skip mundane details (eating, sleeping, commuting)
Part 3: Data Protection & Encryption
Both conversations and memories support encryption for sensitive data.
Standard Protection Level No encryption, stored as plaintext. This is the default for most users.
Fastest read/write performance
Data visible in Firestore console
Suitable for general use
Enhanced Protection Level AES encryption for sensitive fields. Provides additional security for sensitive conversations. Encrypted Fields:
Conversations : transcript_segments (the actual transcript text)
Memories : content (the memory text)
Enhanced encryption adds processing overhead to read/write operations.
Implementation
# Conversations: database/conversations.py
def _prepare_conversation_for_write ( conversation_data , data_protection_level ):
if data_protection_level == 'enhanced' :
# Encrypt transcript_segments before storage
...
def _prepare_conversation_for_read ( conversation_data , data_protection_level ):
if data_protection_level == 'enhanced' :
# Decrypt transcript_segments after retrieval
...
Part 4: Vector Embeddings
Conversations are also stored as vector embeddings in Pinecone for semantic search.
What Gets Embedded
Data Embedded? Stored in Metadata? Title Yes No Overview Yes No Action Items Yes No Full Transcript No (too large) No People Mentioned No Yes Topics No Yes Entities No Yes created_at No Yes
Vector Creation
Vectors are created in a background thread after conversation processing:
# utils/conversations/process_conversation.py
threading.Thread(
target = save_structured_vector,
args = (uid, conversation)
).start()
The save_structured_vector() function:
Generates embedding from conversation.structured (title + overview + action_items + events)
Extracts metadata via LLM (people, topics, entities, dates)
Upserts to Pinecone with metadata filters
Vectors are created ONCE during initial processing. Reprocessed conversations do NOT update their vectors.
Key Code Locations
Component File Path Conversation Model backend/models/conversation.pyMemory Model backend/models/memories.pyProcess Conversation backend/utils/conversations/process_conversation.pyDatabase - Conversations backend/database/conversations.pyDatabase - Memories backend/database/memories.pyRouter - Conversations backend/routers/conversations.pyRouter - Memories backend/routers/memories.pyVector Database backend/database/vector_db.py
API Endpoints
Conversations
Method Endpoint Description POST /v1/conversationsProcess and store a new conversation GET /v1/conversationsList user’s conversations GET /v1/conversations/{id}Get specific conversation PATCH /v1/conversations/{id}/titleUpdate conversation title DELETE /v1/conversations/{id}Delete a conversation
Memories
Method Endpoint Description POST /v3/memoriesCreate a manual memory GET /v3/memoriesList user’s memories PATCH /v3/memories/{id}Edit a memory DELETE /v3/memories/{id}Delete a memory PATCH /v3/memories/{id}/visibilityChange memory visibility