Bodhi Docs
Bodhi Docs
  • Bodhi Overview
  • Developer Quickstart
    • Streaming - Websocket
    • Non-Streaming API
    • Error and Response
  • Starter Apps
Powered by GitBook
On this page
  1. Developer Quickstart

Streaming - Websocket

Bodhi’s Streaming API delivers low-latency, real-time speech recognition in 12+ Indian languages; built for fast, accurate voice experiences.

PreviousBodhi OverviewNextNon-Streaming API

Last updated 5 days ago

This script demonstrates how to stream audio data from a .wav file to an ASR server in real-time using WebSocket. It sends audio chunks and handles transcription responses asynchronously.

To integrate with Bodhi, here’s a streamlined process:

  • WebSocket Connection: Connect to the server using the WebSocket URI, including necessary authentication headers (x-api-key and x-customer-id).

export CUSTOMER_ID=<Your Bodhi Customer ID>
export API_KEY=<Your Bodhi API Key>

If you don’t have your API key and Customer ID, please sign up on Bodhi by following the instructions .

  • Audio Configuration: Once the WebSocket connection is established, you need to send a configuration containing the sample_rate, a unique transaction_id, and the model you wish to use. A list of available models can be ↗️

{
   "config": {
      "sample_rate": 8000,
      "transaction_id": "89f48568-ebc7-4a63-830b-fa69fde03044",
       "model": "hi-banking-v2-8khz",
      }
}

  • Audio Data Transmission: Send audio chunks (e.g., 300ms) via WebSocket. Keep sending until the end-of-file (EOF) is reached. Once all the chunks are sent, include the signal {"eof": 1} to indicate the transmission is complete. This helps the Bodhi server process and finalize the transcription response, ensuring that the client receives the final results after the entire audio is processed.

  • Receive and Process Responses: After sending the audio data, you will receive partial and complete transcription responses from the server. You can process partial responses for real-time feedback, but for final processing, always choose the complete response. This ensures you get the finalized transcription after processing.

  • Final Output: Collect and display the transcription results at the end, formatted for readability.

🔧 Sample Python Client for Bodhi Streaming API

The following Python script demonstrates how to stream audio data to Bodhi's WebSocket-based transcription service. It handles:

📦 Required Dependencies

Install the necessary Python packages using pip:

pip install websockets

bodhi-client.py
import asyncio, websockets, wave, json, os, uuid, ssl, sys

transcriptions = []


async def receive(ws):
    try:
        while True:
            msg = await ws.recv()
            try:
                data = json.loads(msg)
                print(data)
                if data.get("type") == "complete":
                    global transcriptions
                    transcriptions.append(data.get("text"))

                if data.get("eos"):
                    break
            except json.JSONDecodeError:
                print("Received non-JSON message:", msg)
    except websockets.exceptions.ConnectionClosed as e:
        print(f"WebSocket closed: {e}")
    except Exception as e:
        print(f"Error in receive(): {e}")


async def send(ws, wf):
    try:
        rate = wf.getframerate()
        buf = int(rate * 0.1)
        while chunk := wf.readframes(buf):
            await ws.send(chunk)
            await asyncio.sleep(0.1)
        await ws.send('{"eof": 1}')
    except Exception as e:
        print(f"Error in send(): {e}")


async def run(uri, file):
    key, cid = os.getenv("API_KEY"), os.getenv("CUSTOMER_ID")
    if not key or not cid:
        print("Set API_KEY and CUSTOMER_ID environment variables.")
        return

    headers = {"x-api-key": key, "x-customer-id": cid}
    ssl_ctx = ssl.create_default_context()
    ssl_ctx.check_hostname = False
    ssl_ctx.verify_mode = ssl.CERT_NONE

    try:
        wf = wave.open(file, "rb")
    except FileNotFoundError:
        print(f"Audio file not found: {file}")
        return
    except wave.Error as e:
        print(f"Invalid WAV file: {e}")
        return

    try:
        async with websockets.connect(
            uri, extra_headers=headers, ssl=ssl_ctx if uri.startswith("wss") else None
        ) as ws:
            await ws.send(
                json.dumps(
                    {
                        "config": {
                            "sample_rate": wf.getframerate(),
                            "transaction_id": str(uuid.uuid4()),
                            "model": "hi-banking-v2-8khz",
                        }
                    }
                )
            )
            await asyncio.gather(send(ws, wf), receive(ws))

            print("\n" + "-" * 80)
            print(f"Transcription: {", ".join(transcriptions)}")
            print("-" * 80 + "\n")
    except websockets.exceptions.InvalidStatusCode as e:
        print(f"Failed to connect (status code): {e.status_code}")
    except Exception as e:
        print(f"Connection error: {e}")
    finally:
        wf.close()


if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python script.py path_to_audio.wav")
    else:
        asyncio.run(run("wss://bodhi.navana.ai", sys.argv[1]))

Error Handling: Anticipate and manage issues such as connection interruptions, authentication failures, invalid audio files, and server-side errors. For a full list of Bodhi’s error codes and their meanings, see .

WebSocket Audio Client

For a deeper understanding of the response structure, you can gain valuable insights into the details of the API's responses by referring to . It provides an in-depth breakdown of each field and its significance, helping you interpret the data returned from the Bodhi API.

💻
here
found here
Error Codes
this section
Cover

Authentication via environment variables

Cover

WebSocket connection

Cover

Sending audio chunks

Cover

Receiving partial and complete transcription

Cover

Error handling and final output formatting