These optional asynchronous callbacks let you respond to real-time events during a transcription session. You can implement only the ones relevant to your use case—none are required.
on_transcript – Triggered whenever a transcript is available
on_utterance_end – Triggered when the system detects the end of a spoken utterance.
on_speech_started – Triggered each time the user starts speaking after the completion of a previous utterance, marking the start of a new speech segment
on_error – Triggered if any error occurs during processing.
on_close – Triggered when the WebSocket connection is closed.
import wave
config = TranscriptionConfig(model="hi-banking-v2-8khz")
# Start streaming connection
# Mock live audio stream by reading a local file in chunks
# Replace stream of audio coming from your telephony provider
with wave.open("loan.wav", "rb") as wf:
sample_rate = wf.getframerate()
config.sample_rate = sample_rate
await client.start_connection(config=config)
REALTIME_RESOLUTION = 0.02 # 20ms
byte_rate = sample_rate * wf.getsampwidth() * wf.getnchannels()
data = wf.readframes(wf.getnframes())
audio_cursor = 0
while len(data):
i = int(byte_rate * REALTIME_RESOLUTION)
chunk, data = data[:i], data[i:]
await client.send_audio_stream(chunk)
audio_cursor += REALTIME_RESOLUTION
await asyncio.sleep(REALTIME_RESOLUTION)
# Close connection and get final transcription result
result = await client.close_connection()
print("Final result:", result)
8000 (8khz) is the default sample rate for telephone channels. All Bodhi models are specifically optimised for 8khz audio files.
Send and Receive Audio
In order for real time transcription to take place, you need to send and receive audio at the same time.
Send Audio
async def send_audio(ws, wf, buffer_size, interval_seconds):
while True:
data = wf.readframes(buffer_size)
if not data:
break
await ws.send_bytes(data)
await asyncio.sleep(interval_seconds)
# Send EOF JSON message
EOF_MESSAGE = '{"eof": 1}'
await ws.send_str(EOF_MESSAGE)
To send audio, we iteratively loop through bytes in the audio file and send the data to the server. After sending a chunk of data, we sleep for some time (interval_seconds) to simulate real time streaming. Finally, we indicate to the server that we will not be sending any more data by sending the eof signal. This helps the Bodhi process and finalize the transcription response, ensuring that the client receives the final results after the entire audio is processed.
After sending the audio data, you will receive partial and complete transcription responses from the server. You can process partial responses for real-time feedback however it is possible that a response changes in nature by the time you receive a complete transcript. This is because as the server receives more audio context it is possible that the model will modify its previous hypothesis to improve accuracy.
Next Steps
Check the feature overview to see how to improve accuracy, background handling and more.
This script demonstrates how to stream audio data from a .wav file to an ASR server in real-time using WebSocket. This is a good way to get comfortable with the api before connecting it up for live calling. You can learn more about this connection lifecycle .
All code snippets in this document come from the following python . These snippets are meant for educational purposes. Please refer to the demo link for the latest and complete code.
The code also illustrates a list of errors that can occur during websocket connection. Read more
Now, you need to send a configuration containing the sample_rate, a unique transaction_id, and the model you wish to use. A list of available models can be ↗️
Check the to see what information
Check theto ensure they are being handled appropriately