Getting Started
Below is a intro of Bodhi ASR Streaming features. Please refer to the corresponding documentation for more details.
Getting Started
Signing Up
To start using Bodhi, sign up on our website. Upon successful sign up, you will receive an email verification to set up your account.
Accessing Your API Key
Upon successful signup to the dashboard, you will receive an API key which allows you to access our ASR services. Your active API key and Customer ID will be displayed in your account dashboard and can be copied for use.
Once your API key is set up, you can start using Bodhi's streaming ASR services with the below listed ASR models.
Available ASR Models
Bengali
bn-general-v2-8khz
bn-banking-v2-8khz
English (en-IN)
en-general-v2-8khz
en-banking-v2-8khz
Hindi
hi-general-v2-8khz
hi-banking-v2-8khz
Kannada
kn-general-v2-8khz
kn-banking-v2-8khz
Malayalam
ml-general-v2-8khz
ml-banking-v2-8khz
Marathi
mr-general-v2-8khz
mr-banking-v2-8khz
Tamil
ta-general-v2-8khz
ta-banking-v2-8khz
Telugu
te-general-v2-8khz
te-banking-v2-8khz
Gujarati
gu-general-v2-8khz
gu-banking-v2-8khz
Description of Response
call_id (string)
Unique identifier associated with every streaming connection
segment_id (string)
Unique identifier associated with every speech segment during the entire active socket connection
eos (bool)
Marks the end of the streaming connection when "eos" is true.
type (string)
partial
Partial transcript corresponding to every streaming audio chunk
Partial transcripts for every audio chunk (will be for a 100ms audio chunk if streaming audio packet size is 100ms)
complete
Complete/final transcript generated for each speech segment
Generated once per segment_id i.e., when the speech segment end is reached
text (string)
The transcript that has been processed thus far.
segment_meta (object)
tokens: Array of strings representing individual text pieces (or "tokens") recognized from the audio. Tokens may include words or parts of words.
timestamps: Array of numerical values indicating when each token was detected in the segment/sentence (in seconds). Each timestamp aligns with the tokens array, so the i-th timestamp represents the time at which the i-th token was spoken. Useful for measuring latency.
Audio Stream Requirements
To ensure optimal compatibility and performance with our audio processing system, please adhere to the following audio stream requirements:
Encoding/Bit Depth: 16Bit PCM with a 2 Byte depth, providing high-quality audio representation.
Minimum Sample Rate: The audio must have a sample rate of at least 8000Hz.
Fixed Streaming Rate: Audio packets should be streamed at (chunk_duration_ms) a fixed size (50 - 500 ms), ensuring consistent data flow. We recommend using 100 ms as shown in the example script.
Channels: Audio must be single-channel (Mono) to ensure compatibility with our processing pipeline.
Speakers: Initially, support is provided for a single speaker per channel. However, support for multiple speakers on a single channel is under development and will be announced soon.
Error Codes
401 - Unauthorised
Error occurs if x-customer-id
or x-api-key
header is incorrect
402 - Insufficient Balance
Error occurs if account does not have enough credits to be able to process the asr request
403 - Inactive Customer
Error occurs if your account has been deactivated.
Last updated