Overview
Omi allows you to stream audio bytes from your DevKit directly to your backend or any external service. This enables custom audio processing like:Custom Speech Recognition
Use your own ASR models instead of Omi’s default transcription
Voice Activity Detection
Implement custom VAD logic for specialized use cases
Audio Analysis
Extract features, spectrograms, or embeddings in real-time
Cloud Storage
Store raw audio for later processing or compliance
Technical Specifications
| Specification | Value |
|---|---|
| HTTP Method | POST |
| Content-Type | application/octet-stream |
| Audio Format | Raw PCM16 (16-bit signed, little-endian) |
| Bytes per Sample | 2 |
| Sample Rate | 16,000 Hz (DevKit1 v1.0.4+, DevKit2) or 8,000 Hz (DevKit1 v1.0.2) |
| Channels | Mono (1 channel) |
The sample rate is passed as a query parameter so your endpoint can handle different device versions.
Setup Guide
Create Your Endpoint
Create a webhook that accepts POST requests with binary audio data.Request format:Your endpoint should:
- Accept
application/octet-streamcontent type - Read
sample_rateanduidfrom query parameters - Process the raw bytes (buffer, save, or analyze)
- Return 200 OK quickly to avoid timeouts
Configure in Omi App
- Open the Omi App
- Go to Settings → Developer Mode
- Scroll to Realtime audio bytes
- Enter your webhook URL
- Set the Every x seconds field (e.g.,
10for 10-second chunks)
Test Your Integration
Start speaking while wearing your Omi device. Audio bytes should arrive at your webhook at the configured interval.
Working with Audio Bytes
Converting to WAV
The received bytes are raw PCM16 audio. To create a playable WAV file, prepend a WAV header:Accumulating Chunks
If you need continuous audio (not chunked), accumulate bytes across requests:Example: Save to Google Cloud Storage
A complete example that saves audio files to Google Cloud Storage.Create GCS Bucket
Follow the Saving Audio Guide steps 1-5 to create a bucket with proper permissions.
Fork the Example Repository
Clone and Deploy
Clone the repository and deploy to your preferred cloud provider (GCP, AWS, DigitalOcean) or run locally with ngrok.The repository includes a Dockerfile for easy deployment.
Set Environment Variables
Configure these environment variables during deployment:
| Variable | Description |
|---|---|
GOOGLE_APPLICATION_CREDENTIALS_JSON | GCP service account credentials (base64 encoded) |
GCS_BUCKET_NAME | Your GCS bucket name |
Configure Omi App
Set the endpoint in Developer Settings → Realtime audio bytes:
Verify
Audio files should now appear in your GCS bucket every X seconds (based on your configured interval).
Processing Ideas
Custom Speech Recognition
Custom Speech Recognition
Feed audio to your own ASR models for specialized vocabulary or languages:
Voice Activity Detection
Voice Activity Detection
Detect speech vs. silence for custom endpointing:
Audio Embeddings
Audio Embeddings
Extract embeddings for speaker identification or audio similarity:
Real-time Sentiment
Real-time Sentiment
Analyze emotional tone from audio features:
Best Practices
Respond Quickly
Return 200 OK immediately, process async. Slow responses may cause timeouts.
Handle Missing Data
Network issues may cause gaps. Design your processing to handle incomplete audio.
Buffer Appropriately
Choose chunk interval based on your use case. Larger chunks = fewer requests but higher latency.
Monitor Usage
Audio streaming generates significant data. Monitor storage and bandwidth costs.