Backend
Memory Post-Processing
This document outlines the post-processing workflow for memories in the Omi application.
π Process Overview
-
Post-processing request initiated
-
Request handled by
routers/postprocessing.py
-
Audio pre-processed and stored
-
FAL.ai WhisperX transcription performed
-
Transcript post-processed
-
Speech profile matching for speaker identification
-
Memory updated and reprocessed
-
Optional emotional analysis
π Detailed Steps
1. Post-Processing Request
- Omi App sends POST request to
/v1/memories/{memory_id}/post-processing
- Request includes:
- Audio recording for post-processing
- Flag for emotional analysis
2. Request Handling
postprocess_memory
function inrouters/postprocessing.py
processes the request- Retrieves existing memory data from Firebase Firestore using
database/memories.py
3. Pre-Processing and Storage
User Permission Check
- Checks if user allows audio storage (
database/users.py
) - If permitted, audio uploaded to
memories_recordings_bucket
in Google Cloud Storage
Audio Upload for Processing
- Audio uploaded to
postprocessing_audio_bucket
in Google Cloud Storage - Handled by
utils/other/storage.py
Cleanup
- Background thread started to delete uploaded audio after set time (e.g., 5 minutes)
4. FAL.ai WhisperX Transcription
fal_whisperx
function inutils/stt/pre_recorded.py
sends audio to FAL.ai- WhisperX model performs high-quality transcription and speaker diarization
- Returns list of transcribed words with speaker labels
5. Transcript Post-Processing
fal_postprocessing
function in utils/stt/pre_recorded.py
:
- Cleans transcript data
- Groups words into segments based on speaker and timing
- Converts segments to
TranscriptSegment
objects
6. Speech Profile Matching
get_speech_profile_matching_predictions
in utils/stt/speech_profile.py
:
- Downloads userβs speech profile and known people profiles
- Uses Speechbrain model to compare speaker embeddings
- Updates segments with
is_user
andperson_id
flags
7. Memory Update and Reprocessing
- Memory object updated with improved transcript and speaker identification
- Updated data saved to Firebase Firestore
- If FAL.ai transcription successful:
process_memory
inutils/memories/process_memory.py
re-processes memory- Re-extracts structured data (title, overview, etc.)
- Re-generates embeddings
- Updates memory in vector database
8. Emotional Analysis (Optional)
If requested:
process_user_emotion
function called asynchronously- Uses Hume API to analyze userβs emotions in the recording
- Can trigger notifications based on detected emotions