Getting Started
Below is a intro of Bodhi ASR Streaming features. Please refer to the corresponding documentation for more details.
Last updated
Below is a intro of Bodhi ASR Streaming features. Please refer to the corresponding documentation for more details.
Last updated
To start using Bodhi, sign up on our . Upon successful sign up, you will receive an email verification to set up your account.
Upon successful signup to the dashboard, you will receive an API key which allows you to access our ASR services. Your active API key and Customer ID will be displayed in your account dashboard and can be copied for use.
Once your API key is set up, you can start using Bodhi's streaming ASR services with the below listed ASR models.
Bengali
bn-general-v2-8khz
bn-banking-v2-8khz
English (en-IN)
en-general-v2-8khz
en-banking-v2-8khz
Hindi
hi-general-v2-8khz
hi-banking-v2-8khz
Kannada
kn-general-v2-8khz
kn-banking-v2-8khz
Malayalam
ml-general-v2-8khz
ml-banking-v2-8khz
Marathi
mr-general-v2-8khz
mr-banking-v2-8khz
Tamil
ta-general-v2-8khz
ta-banking-v2-8khz
Telugu
te-general-v2-8khz
te-banking-v2-8khz
Gujarati
gu-general-v2-8khz
gu-banking-v2-8khz
call_id (string)
Unique identifier associated with every streaming connection
segment_id (string)
Integer associated with every speech segment during the entire active socket connection
eos (bool)
Marks the end of the streaming connection when "eos" is true.
type (string)
Possible values: "partial" | "complete"
partial
Partial transcript corresponding to every streaming audio chunk
complete
Complete/final transcript generated for each speech segment
Generated once per segment_id i.e., when the speech segment end is reached
text (string)
The transcript that has been processed thus far.
segment_meta (object)
tokens: Array of strings representing individual text pieces (or "tokens") recognized from the audio. Tokens may include words or parts of words.
timestamps: Array of numerical values indicating when each token was detected in the segment/sentence (in seconds). Each timestamp aligns with the tokens array, so the i-th timestamp represents the time at which the i-th token was spoken. Useful for measuring latency.
start_time: Starting point (in seconds) of the current segment in the overall audio timeline.
confidence: Segment level confidence. Float between 0 and 1. Currently supported for all langauges except Telugu, Odia and English.
words: Array of word level objects (only populated when type is complete).
word: The recognised word.
confidence: Float value between 0.0 and 1.0 representing the model’s confidence in the recognized word. Currently supported for all languages except Telugu, Odia and English.
To ensure optimal compatibility and performance with our audio processing system, please adhere to the following audio stream requirements:
Encoding/Bit Depth: 16Bit PCM with a 2 Byte depth, providing high-quality audio representation.
Minimum Sample Rate: The audio must have a sample rate of at least 8000Hz.
Fixed Streaming Rate: Audio packets should be streamed at (chunk_duration_ms) a fixed size (50 - 500 ms), ensuring consistent data flow. We recommend using 100 ms as shown in the example script.
Channels: Audio must be single-channel (Mono) to ensure compatibility with our processing pipeline.
Speakers: Initially, support is provided for a single speaker per channel. However, support for multiple speakers on a single channel is under development and will be announced soon.
400 - Bad Request
Error occurs due to issues with the WebSocket config
message sent by the client after connection. This includes:
Invalid JSON format.
Missing or invalid (non-UUID) transaction_id
.
Requesting a model
that is not available.
Since config validation will occur post establishing the websocket connection, the server will send back a json response through the socket before closing:
401 - Unauthorised
Error occurs if x-customer-id
or x-api-key
header is incorrect
402 - Insufficient Balance
Error occurs if account does not have enough credits to be able to process the asr request
403 - Inactive Customer
Error occurs if your account has been deactivated.
500 - Internal Server Error
Error occurs if the server encounters an unexpected internal issue or panic during processing.
503 - Service Unavailable
Error occurs if the server is temporarily unable to handle the request or if it is unreachable.