Welcome to the Omi backend! This document provides a comprehensive overview of Omi’s architecture and code, guiding you through its key components, functionalities, and how it all works together to power a unique and intelligent AI assistant experience.
/listen
endpoint in routers/transcribe.py
)./v1/memories
: When the conversation session ends, the Omi app sends a POST request to the /v1/memories
endpoint in routers/memories.py
.routers/memories.py
): The create_memory
function in this file receives the request and performs basic validation on the data.utils/memories/process_memory.py
):
create_memory
function delegates the core memory processing logic to the process_memory
function. This function is where the real magic happens!title
: A short, descriptive title.overview
: A concise summary of the main points.category
: A relevant category to organize memories (work, personal, etc.).action_items
: Any tasks or to-dos mentioned.events
: Events that might need to be added to a calendar.database/memories.py
: The Memory Guardian 🛡️upsert_memory
: Creates or updates a memory document in Firestore, ensuring efficient storage and handling of updates.get_memory
: Retrieves a specific memory by its ID.get_memories
: Fetches a list of memories for a user, allowing for filtering, pagination, and optional inclusion of discarded memories.database/vector_db.py
: The Embedding Expert 🌲upsert_vector
: Adds or updates a memory embedding in Pinecone.upsert_vectors
: Efficiently adds or updates multiple embeddings.query_vectors
: Performs similarity search to find memories relevant to a user query.delete_vector
: Removes a memory embedding.utils/llm.py
: The AI Maestro 🧠llm.py
leverages OpenAI’s ChatOpenAI
model (specifically gpt-4o
in the code, but you can use other models) for language understanding, generation, and reasoning.OpenAIEmbeddings
model to generate vector embeddings for memories and user queries.llm.py
is Essential:
utils/other/storage.py
: The Cloud Storage Manager ☁️upload_profile_audio(file_path: str, uid: str)
:
BUCKET_SPEECH_PROFILES
environment variable.uid
).get_profile_audio_if_exists(uid: str) -> str
:
None
if the profile does not exist.upload_profile_audio
function is called when a user uploads a new speech profile recording through the /v3/upload-audio
endpoint (defined in routers/speech_profile.py
).get_profile_audio_if_exists
function is used to retrieve a user’s speech profile when needed, for example, during speaker identification in real-time transcription or post-processing.database/redis_db.py
: The Data Speedster 🚀database/redis_db.py
module handles Omi’s interactions with Redis, which is primarily used for caching, managing user
settings, and storing user speech profiles.
Data Stored and Retrieved from Redis:
store_user_speech_profile
, get_user_speech_profile
: For storing and retrieving speech profiles.
store_user_speech_profile_duration
, get_user_speech_profile_duration
: For managing speech profile durations.
enable_plugin
, disable_plugin
, get_enabled_plugins
: For handling plugin enable/disable states.
get_plugin_reviews
: Retrieves reviews for a plugin.
cache_user_name
, get_cached_user_name
: For caching user names.
Why Redis is Important:
routers/transcribe.py
: The Real-Time Transcription Engine 🎙️/listen
Endpoint: The Omi app initiates a WebSocket connection with the backend at the /listen
endpoint, which is defined in the websocket_endpoint
function of routers/transcribe.py
.process_audio_dg
Function: The process_audio_dg
function (found in utils/stt/streaming.py
) manages the interaction with Deepgram.process_audio_dg
function configures various Deepgram options, including:
punctuate
: Automatically adds punctuation to the transcribed text.no_delay
: Minimizes latency for real-time feedback.language
: Sets the language for transcription.interim_results
: (Set to False
in the code) Controls whether to send interim (partial) transcription results or only final results.diarize
: Enables speaker diarization (identifying different speakers in the audio).encoding
, sample_rate
: Sets audio encoding and sample rate for compatibility with Deepgram./listen
endpoint.websocket_endpoint
function receives the audio chunks and immediately forwards them to Deepgram using the process_audio_dg
function.no_delay
option in Deepgram and the efficient handling of data in
the backend are essential for minimizing delays.routers/transcribe.py
: Manages real-time audio transcription using Deepgram, sending the transcribed text back to the Omi app for display.routers/workflow.py
, routers/screenpipe.py
: Define API endpoints for external integrations to trigger memory creation.