Overview
Omi uses a dual-collection architecture for storing user data:Conversations
Primary storage for recorded interactions - transcripts, audio, structured summaries
Memories
Secondary storage for extracted facts/learnings FROM conversations
Architecture Diagram
Firestore Structure
Part 1: Storing Conversations
Processing Flow
API Request
The app sends a POST request to
/v1/conversations with transcript dataProcessing
process_conversation() in utils/conversations/process_conversation.py handles the logicStructure Extraction
LLM extracts title, overview, action items, and events from the transcript
Storage
upsert_conversation() in database/conversations.py saves to FirestoreVector Embedding
Conversation is embedded and stored in Pinecone for semantic search
Conversation Model Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique conversation identifier |
created_at | datetime | When the conversation record was created |
started_at | datetime | When the actual conversation started |
finished_at | datetime | When the conversation ended |
source | enum | Source device (omi, phone, desktop, openglass, etc.) |
language | string | Language code of the conversation |
status | enum | Processing status: in_progress, processing, completed, failed |
structured | object | Extracted structured information (see below) |
transcript_segments | array | List of transcript segments |
geolocation | object | Location data (latitude, longitude, address) |
photos | array | Photos captured during conversation |
audio_files | array | Audio file references |
apps_results | array | Results from summarization apps |
external_data | object | Data from external integrations |
discarded | boolean | Whether conversation was marked as low-quality |
visibility | enum | private, shared, or public |
is_locked | boolean | Whether conversation is locked from editing |
data_protection_level | string | standard or enhanced (encrypted) |
Structured Information
Thestructured field contains LLM-extracted information:
| Field | Type | Description |
|---|---|---|
title | string | Short descriptive title for the conversation |
overview | string | Summary of key points discussed |
emoji | string | Emoji representing the conversation |
category | enum | Category (personal, work, health, etc.) |
action_items | array | Tasks or to-dos mentioned |
events | array | Calendar events to be created |
Transcript Segments
Each segment intranscript_segments includes:
| Field | Type | Description |
|---|---|---|
text | string | Transcribed text content |
speaker | string | Speaker label (e.g., “SPEAKER_00”) |
start | float | Start time in seconds |
end | float | End time in seconds |
is_user | boolean | Whether spoken by the device owner |
person_id | string | ID of identified person (if matched) |
Action Items
Action items are stored both inline (instructured.action_items) and in a standalone collection:
| Field | Type | Description |
|---|---|---|
description | string | The action item text |
completed | boolean | Whether the item is done |
created_at | datetime | When extracted |
due_at | datetime | Optional due date |
completed_at | datetime | When marked complete |
conversation_id | string | Source conversation |
Events
Calendar events extracted from conversations:| Field | Type | Description |
|---|---|---|
title | string | Event title |
description | string | Event description |
start | datetime | Start date/time |
duration | integer | Duration in minutes |
created | boolean | Whether added to calendar |
Part 2: Extracting & Storing Memories
Memories are facts about the user extracted from conversations. They represent learnings, preferences, habits, and other personal information.Memory Extraction Process
Duringprocess_conversation(), the system:
Analyze Transcript
Reviews the conversation transcript for personal information
Extract Facts
Identifies facts worth remembering about the user (~15 words max)
Store with Link
Saves to
memories collection with a link back to the source conversationMemory Model Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique memory identifier |
uid | string | User ID |
conversation_id | string | Source conversation (links back) |
content | string | The actual fact/learning (max ~15 words) |
category | enum | interesting, system, or manual |
tags | array | Categorization tags |
visibility | string | private or public |
created_at | datetime | When memory was created |
updated_at | datetime | Last modification time |
reviewed | boolean | Whether user has reviewed |
user_review | boolean | User’s approval (true/false/null) |
edited | boolean | Whether user edited the content |
scoring | string | Ranking score for retrieval |
manually_added | boolean | Whether user created manually |
is_locked | boolean | Prevent automatic deletion |
app_id | string | Source app (if from integration) |
data_protection_level | string | Encryption level |
Memory Categories
Interesting
Notable facts about the user: hobbies, opinions, stories
System
Preferences and patterns: work habits, sleep schedule
Manual
User-created memories: explicitly added facts
Legacy categories (
core, hobbies, lifestyle, interests, habits, work, skills, learnings, other) are automatically mapped to the new primary categories for backward compatibility.Memory Extraction Rules
The system follows these guidelines when extracting memories:- Maximum ~15 words per memory
- Must pass the “shareability test” - would this be worth telling someone?
- Maximum 2
interesting+ 2systemmemories per conversation - No duplicate or near-duplicate facts
- Skip mundane details (eating, sleeping, commuting)
Part 3: Data Protection & Encryption
Both conversations and memories support encryption for sensitive data.- Standard
- Enhanced
Standard Protection Level
No encryption, stored as plaintext. This is the default for most users.- Fastest read/write performance
- Data visible in Firestore console
- Suitable for general use
Implementation
Part 4: Vector Embeddings
Conversations are also stored as vector embeddings in Pinecone for semantic search.What Gets Embedded
| Data | Embedded? | Stored in Metadata? |
|---|---|---|
| Title | Yes | No |
| Overview | Yes | No |
| Action Items | Yes | No |
| Full Transcript | No (too large) | No |
| People Mentioned | No | Yes |
| Topics | No | Yes |
| Entities | No | Yes |
| created_at | No | Yes |
Vector Creation
Vectors are created in a background thread after conversation processing:save_structured_vector() function:
- Generates embedding from
conversation.structured(title + overview + action_items + events) - Extracts metadata via LLM (people, topics, entities, dates)
- Upserts to Pinecone with metadata filters
Key Code Locations
| Component | File Path |
|---|---|
| Conversation Model | backend/models/conversation.py |
| Memory Model | backend/models/memories.py |
| Process Conversation | backend/utils/conversations/process_conversation.py |
| Database - Conversations | backend/database/conversations.py |
| Database - Memories | backend/database/memories.py |
| Router - Conversations | backend/routers/conversations.py |
| Router - Memories | backend/routers/memories.py |
| Vector Database | backend/database/vector_db.py |
API Endpoints
Conversations
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/conversations | Process and store a new conversation |
| GET | /v1/conversations | List user’s conversations |
| GET | /v1/conversations/{id} | Get specific conversation |
| PATCH | /v1/conversations/{id}/title | Update conversation title |
| DELETE | /v1/conversations/{id} | Delete a conversation |
Memories
| Method | Endpoint | Description |
|---|---|---|
| POST | /v3/memories | Create a manual memory |
| GET | /v3/memories | List user’s memories |
| PATCH | /v3/memories/{id} | Edit a memory |
| DELETE | /v3/memories/{id} | Delete a memory |
| PATCH | /v3/memories/{id}/visibility | Change memory visibility |