Bodhi Docs
Bodhi Docs
  • Bodhi Overview
  • Developer Quickstart
    • Streaming - Websocket
      • Response Structure
      • Error Responses
      • Advanced Features
      • Measuring Latency
      • Starter Apps
      • Connection Lifecycle
    • Non-Streaming API
      • Response Structure
      • Error Responses
      • Advanced Features
Powered by GitBook
On this page
  • Requirements
  • SDK Example
  • Non-SDK Example
  • Next Steps
  1. Developer Quickstart

Streaming - Websocket

Bodhi’s Streaming API delivers low-latency, real-time speech recognition in 12+ Indian languages; built for fast, accurate voice experiences.

PreviousBodhi OverviewNextResponse Structure

Last updated 2 days ago

Requirements

  • Customer ID

  • Api Key

Both can be found by creating an account

SDK Example

To transcribe audio from an audio stream using one of Bodhi SDKs, follow these steps.

Install the SDK

Open a terminal, go to the directory where you'd like to set up your project, and install the Bodhi SDK.

# Set environment variables
# export CUSTOMER_ID=<bodhi_customer_id>
# export API_KEY=<bodhi_api_key>


# Install the Bodhi Python SDK
pip install bodhi-sdk

Common Setup

from bodhi import (
    BodhiClient,
    TranscriptionResponse,
    LiveTranscriptionEvents,
)

client = BodhiClient(api_key=API_KEY, customer_id=CUSTOMER_ID)

client.on(LiveTranscriptionEvents.Transcript, on_transcript)
client.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
client.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)
client.on(LiveTranscriptionEvents.Error, on_error)
client.on(LiveTranscriptionEvents.Close, on_close)

Event handler callbacks receive real-time updates:

These optional asynchronous callbacks let you respond to real-time events during a transcription session. You can implement only the ones relevant to your use case—none are required.

  • on_transcript – Triggered whenever a transcript is available

  • on_utterance_end – Triggered when the system detects the end of a spoken utterance.

  • on_speech_started – Triggered each time the user starts speaking after the completion of a previous utterance, marking the start of a new speech segment

  • on_error – Triggered if any error occurs during processing.

  • on_close – Triggered when the WebSocket connection is closed.

async def on_transcript(response: TranscriptionResponse):
    print(f"Transcript: {response.text}")

async def on_utterance_end(response: TranscriptionResponse):
    print(f"UtteranceEnd: {response}")

async def on_speech_started(response: TranscriptionResponse):
    print(f"SpeechStarted: {response}")

async def on_error(e: Exception):
    print(f"Error: {str(e)}")

async def on_close():
    print("WebSocket connection closed.")

Transcribing from Live Streaming

import wave

config = TranscriptionConfig(model="hi-banking-v2-8khz")

# Start streaming connection

# Mock live audio stream by reading a local file in chunks
# Replace stream of audio coming from your telephony provider
with wave.open("loan.wav", "rb") as wf:
    sample_rate = wf.getframerate()
    config.sample_rate = sample_rate
    await client.start_connection(config=config)

    REALTIME_RESOLUTION = 0.02  # 20ms
    byte_rate = sample_rate * wf.getsampwidth() * wf.getnchannels()
    data = wf.readframes(wf.getnframes())
    audio_cursor = 0

    while len(data):
        i = int(byte_rate * REALTIME_RESOLUTION)
        chunk, data = data[:i], data[i:]
        await client.send_audio_stream(chunk)
        audio_cursor += REALTIME_RESOLUTION
        await asyncio.sleep(REALTIME_RESOLUTION)

# Close connection and get final transcription result
result = await client.close_connection()
print("Final result:", result)

Transcribing from a Remote File URL

config = TranscriptionConfig(model="hi-banking-v2-8khz")
audio_url = "https://bodhi.navana.ai/audios/loan.wav"

result = await client.transcribe_remote_url(audio_url, config=config)
print("Final result:", result)

from a Local File

import os

config = TranscriptionConfig(model="hi-banking-v2-8khz")
audio_file = os.path.join(os.path.dirname(__file__), "loan.wav")

result = await client.transcribe_local_file(audio_file, config=config)
print("Final result:", result)

Non-SDK Example

Establish connection

Connect to the server using the WebSocket URI, including necessary authentication headers (x-api-key and x-customer-id).

# Set environment variables
#export CUSTOMER_ID=<bodhi_customer_id>
#export API_KEY=<bodhi_api_key>

request_headers = {
        "x-api-key": api_key,
        "x-customer-id": customer_id,
    }
chunk_duration_ms = 100

connector = aiohttp.TCPConnector(ssl=ssl_context if uri.startswith("wss://") else None)

async with aiohttp.ClientSession(connector=connector, headers=request_headers) as session:
    try:
        async with session.ws_connect(uri) as ws:
            wf = wave.open(filepath, "rb")
            channels, sample_width, sample_rate, num_samples, _, _ = wf.getparams()
            print(
                f"Channels = {channels}, Sample Rate = {sample_rate} Hz, Sample width = {sample_width} bytes",
                file=sys.stderr,
            )

    except aiohttp.WSServerHandshakeError as e:
        print(f"WebSocket handshake failed with status code: {e.status}", file=sys.stderr)
        if e.status == 401:
            print("Invalid API key or customer ID.", file=sys.stderr)
        elif e.status == 402:
            print("Insufficient balance.", file=sys.stderr)
        elif e.status == 403:
            print("Customer has been deactivated", file=sys.stderr)
    except aiohttp.ClientConnectionError as e:
        print(f"Connection error: {str(e)}", file=sys.stderr)
    except Exception as e:
        print(f"An error occurred: {str(e)}", file=sys.stderr)
        
        import traceback
        print("Full error traceback:", file=sys.stderr)
        print(traceback.format_exc(), file=sys.stderr)

After this block of code succeeds, you have created a persistent connection with the websocket server.

Configure connection


config_msg = json.dumps(
    {
        "config": {
            "sample_rate": sample_rate,
            "transaction_id": str(uuid.uuid4()),
            "model": "hi-banking-v2-8khz",
            }
        }
    )
await ws.send_str(config_msg)

8000 (8khz) is the default sample rate for telephone channels. All Bodhi models are specifically optimised for 8khz audio files.

Send and Receive Audio

In order for real time transcription to take place, you need to send and receive audio at the same time.

Send Audio

async def send_audio(ws, wf, buffer_size, interval_seconds):
    while True:
        data = wf.readframes(buffer_size)
        if not data:
            break
        await ws.send_bytes(data)
        await asyncio.sleep(interval_seconds)
    # Send EOF JSON message
    EOF_MESSAGE = '{"eof": 1}'
    await ws.send_str(EOF_MESSAGE)

To send audio, we iteratively loop through bytes in the audio file and send the data to the server. After sending a chunk of data, we sleep for some time (interval_seconds) to simulate real time streaming. Finally, we indicate to the server that we will not be sending any more data by sending the eof signal. This helps the Bodhi process and finalize the transcription response, ensuring that the client receives the final results after the entire audio is processed.

Receive Audio

async def receive_transcription(ws):
    complete_sentences = []
    async for msg in ws:
        if msg.type == aiohttp.WSMsgType.TEXT:
            try:
                response_data = json.loads(msg.data)

                call_id = response_data.get("call_id")
                segment_id = response_data.get("segment_id")
                transcript_type = response_data.get("type")
                transcript_text = response_data.get("text")
                end_of_stream = response_data.get("eos", False)

                if transcript_type == "complete" and transcript_text != "":
                    complete_sentences.append(transcript_text)

                print(
                    f"Received data: Call_id={call_id}, "
                    f"Segment_id={segment_id}, "
                    f"EOS={end_of_stream}, "
                    f"Type={transcript_type}, "
                    f"Text={transcript_text}"
                )

                if end_of_stream:
                    print("Complete transcript: ", ", ".join(complete_sentences))
                    break

            except json.JSONDecodeError:
                print(f"Received a non-JSON response: {msg.data}")

        elif msg.type == aiohttp.WSMsgType.ERROR:
            print(f"WebSocket error: {ws.exception()}")
            break
        elif msg.type == aiohttp.WSMsgType.CLOSED:
            break

Run them together

buffer_size = int(sample_rate * chunk_duration_ms / 1000)
interval_seconds = chunk_duration_ms / 1000.0

send_task = asyncio.create_task(send_audio(ws, wf, buffer_size, interval_seconds))
recv_task = asyncio.create_task(receive_transcription(ws))

await asyncio.gather(send_task, recv_task)

After sending the audio data, you will receive partial and complete transcription responses from the server. You can process partial responses for real-time feedback however it is possible that a response changes in nature by the time you receive a complete transcript. This is because as the server receives more audio context it is possible that the model will modify its previous hypothesis to improve accuracy.

Next Steps

  1. Check the feature overview to see how to improve accuracy, background handling and more.

This script demonstrates how to stream audio data from a .wav file to an ASR server in real-time using WebSocket. This is a good way to get comfortable with the api before connecting it up for live calling. You can learn more about this connection lifecycle .

All code snippets in this document come from the following python . These snippets are meant for educational purposes. Please refer to the demo link for the latest and complete code.

The code also illustrates a list of errors that can occur during websocket connection. Read more

Now, you need to send a configuration containing the sample_rate, a unique transaction_id, and the model you wish to use. A list of available models can be ↗️

Check the to see what information

Check theto ensure they are being handled appropriately

📚
here
here
demo
here
detailed response structure
error responses
found here