Getting Started

Below is a intro of Bodhi ASR Streaming features. Please refer to the corresponding documentation for more details.


Getting Started

Signing Up

To start using Bodhi, sign up on our website. Upon successful sign up, you will receive an email verification to set up your account.

Accessing Your API Key

Upon successful signup to the dashboard, you will receive an API key which allows you to access our ASR services. Your active API key and Customer ID will be displayed in your account dashboard and can be copied for use.

Once your API key is set up, you can start using Bodhi's streaming ASR services with the below listed ASR models.


Available ASR Models

LanguageGeneral ModelBanking Model

Bengali

bn-general-v2-8khz

bn-banking-v2-8khz

English (en-IN)

en-general-v2-8khz

en-banking-v2-8khz

Hindi

hi-general-v2-8khz

hi-banking-v2-8khz

Kannada

kn-general-v2-8khz

kn-banking-v2-8khz

Malayalam

ml-general-v2-8khz

ml-banking-v2-8khz

Marathi

mr-general-v2-8khz

mr-banking-v2-8khz

Tamil

ta-general-v2-8khz

ta-banking-v2-8khz

Telugu

te-general-v2-8khz

te-banking-v2-8khz


Description of Response

{
  "call_id": "<unique_call_id>",
  "segment_id": "<segment_id>",
  "eos": false,
  "type": "partial",
  "text": "<transcripts>"
}
KeyDescription

call_id (string)

Unique identifier associated with every streaming connection

segment_id (string)

Unique identifier associated with every speech segment during the entire active socket connection

text (string)

partial

  • Partial transcript corresponding to every streaming audio chunk

  • Partial transcripts for every audio chunk (will be for a 100ms audio chunk if streaming audio packet size is 100ms)

complete

  • Complete/final transcript generated for each speech segment

    • Generated once per segment_id i.e., when the speech segment end is reached

eos (bool)

Marks the end of the streaming connection when "eos" is true.


Audio Stream Requirements

To ensure optimal compatibility and performance with our audio processing system, please adhere to the following audio stream requirements:

  • Encoding/Bit Depth: 16Bit PCM with a 2 Byte depth, providing high-quality audio representation.

  • Minimum Sample Rate: The audio must have a sample rate of at least 8000Hz.

  • Fixed Streaming Rate: Audio packets should be streamed at (chunk_duration_ms) a fixed size (50 - 500 ms), ensuring consistent data flow. We recommend using 100 ms as shown in the example script.

  • Channels: Audio must be single-channel (Mono) to ensure compatibility with our processing pipeline.

  • Speakers: Initially, support is provided for a single speaker per channel. However, support for multiple speakers on a single channel is under development and will be announced soon.

Last updated