This document outlines the real-time audio transcription process in the Omi application.
/listen
endpoint./listen
Endpointrouters/transcribe.py
websocket_endpoint
function sets up the connection_websocket_util
function to manage the connection_websocket_util
Functionutils/other/storage.py
to retrieve profile from Google Cloud Storagereceive_audio
: Receives audio chunks and sends to Deepgramsend_heartbeat
: Sends periodic messages to keep connection aliveprocess_audio_dg
Functionutils/stt/streaming.py
DEEPGRAM_API_KEY
on_message
callback for handling transcriptsOption | Value | Description |
---|---|---|
language | Variable | Audio language |
sample_rate | 8000 or 16000 Hz | Audio sample rate |
codec | Opus or Linear16 | Audio codec |
channels | Variable | Number of audio channels |
punctuate | True | Automatic punctuation |
no_delay | True | Low-latency transcription |
endpointing | 100 | Sentence boundary detection |
interim_results | False | Only final transcripts sent |
smart_format | True | Enhanced transcript formatting |
profanity_filter | False | No profanity filtering |
diarize | True | Speaker identification |
filler_words | False | Remove filler words |
multichannel | channels > 1 | Enable if multiple channels |
model | βnova-2-generalβ | Deepgram model selection |
on_message
callbackon_message
receives raw transcript dataField | Description |
---|---|
speaker | Speaker label (e.g., βSPEAKER_00β) |
start | Segment start time (seconds) |
end | Segment end time (seconds) |
text | Combined, punctuated text |
is_user | Boolean indicating if segment is from the user |
person_id | ID of matched person from user profiles (if applicable) |
on_message
This overview provides a comprehensive understanding of Omiβs real-time transcription process, which can be adapted when integrating alternative audio transcription services.