Skip to main content

Listen + Pusher Pipeline — Sequence Diagrams

Last updated: 2026-03-16 (PR #5624 E2E testing) These diagrams document the real behavior observed during E2E testing with live services (backend, pusher, Deepgram, embedding API). Update when the pipeline changes.

1. Connection + Streaming + Transcription

2. Conversation Lifecycle (Silence Timeout Path)

This is the normal path when the client stays connected but stops speaking.

3. Disconnect Path

What happens when the WS connection closes.

4. Speaker ID Lifecycle (2-Session Flow)

Speaker identification requires two sessions: one to store the embedding, one to match against it.

5. Private Cloud Sync (Audio Upload)

When private_cloud_sync_enabled is set for the user.

6. Event Wire Protocol

Server → Client (JSON over WS text frames)

TypeFormatExample
TranscriptsJSON array[{id, text, speaker, speaker_id, is_user, start, end}, ...]
EventsJSON object{type: "...", ...}
KeepalivePlain text"ping" (not JSON — filter before parsing)

Event Types

EventFieldsWhen
service_status{type, status: "ready"}After WS connect, services initialized
memory_processing_started{type}Conversation sent to pusher for LLM
memory_created{type, memory: {id, structured: {title, overview, ...}}}LLM processing complete
speaker_label_suggestion{type, person_id, person_name, distance, segments}Speaker matched via embedding

Client → Server

TypeFormatNotes
AudioBinary framesPCM16LE bytes
Silence keepaliveb'\x00' * 320Resets last_activity_time but NOT finished_at

7. Timing Constants

ConstantValueLocationPurpose
conversation_timeout120s (min)transcribe.pySilence before lifecycle triggers
last_activity_time timeout90stranscribe.pyWS inactivity disconnect
SPEAKER_SAMPLE_MIN_AGE120spusher.py:39Wait before extracting embedding
SPEAKER_SAMPLE_PROCESS_INTERVAL15spusher.py:40Queue poll interval
lifecycle_manager poll5stranscribe.py:1683Check finished_at interval
Pusher audio batch60spusherGCS upload batch size
Speaker match threshold0.45transcribe.pyCosine distance cutoff