Streaming - Websocket
Bodhi’s Streaming API delivers low-latency, real-time speech recognition in 12+ Indian languages; built for fast, accurate voice experiences.
Last updated
Bodhi’s Streaming API delivers low-latency, real-time speech recognition in 12+ Indian languages; built for fast, accurate voice experiences.
Last updated
This script demonstrates how to stream audio data from a .wav file to an ASR server in real-time using WebSocket. It sends audio chunks and handles transcription responses asynchronously.
To integrate with Bodhi, here’s a streamlined process:
WebSocket Connection: Connect to the server using the WebSocket URI, including necessary authentication headers (x-api-key
and x-customer-id
).
If you don’t have your API key and Customer ID, please sign up on Bodhi by following the instructions .
Audio Configuration: Once the WebSocket connection is established, you need to send a configuration containing the sample_rate
, a unique transaction_id
, and the model
you wish to use. A list of available models can be ↗️
Audio Data Transmission: Send audio chunks (e.g., 300ms) via WebSocket. Keep sending until the end-of-file (EOF) is reached. Once all the chunks are sent, include the signal {"eof": 1}
to indicate the transmission is complete. This helps the Bodhi server process and finalize the transcription response, ensuring that the client receives the final results after the entire audio is processed.
Receive and Process Responses: After sending the audio data, you will receive partial and complete transcription responses from the server. You can process partial responses for real-time feedback, but for final processing, always choose the complete response. This ensures you get the finalized transcription after processing.
Final Output: Collect and display the transcription results at the end, formatted for readability.
The following Python script demonstrates how to stream audio data to Bodhi's WebSocket-based transcription service. It handles:
📦 Required Dependencies
Install the necessary Python packages using pip:
Error Handling: Anticipate and manage issues such as connection interruptions, authentication failures, invalid audio files, and server-side errors. For a full list of Bodhi’s error codes and their meanings, see .
WebSocket Audio Client
For a deeper understanding of the response structure, you can gain valuable insights into the details of the API's responses by referring to . It provides an in-depth breakdown of each field and its significance, helping you interpret the data returned from the Bodhi API.
Authentication via environment variables
WebSocket connection
Sending audio chunks
Receiving partial and complete transcription
Error handling and final output formatting