Getting Started

Below is a intro of Bodhi ASR Streaming features. Please refer to the corresponding documentation for more details.


Getting Started

Signing Up

To start using Bodhi, sign up on our website. Upon successful sign up, you will receive an email verification to set up your account.

Accessing Your API Key

Upon successful signup to the dashboard, you will receive an API key which allows you to access our ASR services. Your active API key and Customer ID will be displayed in your account dashboard and can be copied for use.

Once your API key is set up, you can start using Bodhi's streaming ASR services with the below listed ASR models.


Available ASR Models

Language
General Model
Banking Model

Bengali

bn-general-v2-8khz

bn-banking-v2-8khz

English (en-IN)

en-general-v2-8khz

en-banking-v2-8khz

Hindi

hi-general-v2-8khz

hi-banking-v2-8khz

Kannada

kn-general-v2-8khz

kn-banking-v2-8khz

Malayalam

ml-general-v2-8khz

ml-banking-v2-8khz

Marathi

mr-general-v2-8khz

mr-banking-v2-8khz

Tamil

ta-general-v2-8khz

ta-banking-v2-8khz

Telugu

te-general-v2-8khz

te-banking-v2-8khz

Gujarati

gu-general-v2-8khz

gu-banking-v2-8khz


Description of Response

{
  "call_id": "<unique_call_id>",
  "segment_id": "<segment_id>",
  "eos": false,
  "type": "partial",
  "text": "<transcripts>",
  "segment_meta": {
    "tokens":[],
    "timestamps":[]
  }
}
Key
Description

call_id (string)

Unique identifier associated with every streaming connection

segment_id (string)

Unique identifier associated with every speech segment during the entire active socket connection

eos (bool)

Marks the end of the streaming connection when "eos" is true.

type (string)

partial

  • Partial transcript corresponding to every streaming audio chunk

  • Partial transcripts for every audio chunk (will be for a 100ms audio chunk if streaming audio packet size is 100ms)

complete

  • Complete/final transcript generated for each speech segment

    • Generated once per segment_id i.e., when the speech segment end is reached

text (string)

The transcript that has been processed thus far.

segment_meta (object)

  • tokens: Array of strings representing individual text pieces (or "tokens") recognized from the audio. Tokens may include words or parts of words.

  • timestamps: Array of numerical values indicating when each token was detected in the segment/sentence (in seconds). Each timestamp aligns with the tokens array, so the i-th timestamp represents the time at which the i-th token was spoken. Useful for measuring latency.


Audio Stream Requirements

To ensure optimal compatibility and performance with our audio processing system, please adhere to the following audio stream requirements:

  • Encoding/Bit Depth: 16Bit PCM with a 2 Byte depth, providing high-quality audio representation.

  • Minimum Sample Rate: The audio must have a sample rate of at least 8000Hz.

  • Fixed Streaming Rate: Audio packets should be streamed at (chunk_duration_ms) a fixed size (50 - 500 ms), ensuring consistent data flow. We recommend using 100 ms as shown in the example script.

  • Channels: Audio must be single-channel (Mono) to ensure compatibility with our processing pipeline.

  • Speakers: Initially, support is provided for a single speaker per channel. However, support for multiple speakers on a single channel is under development and will be announced soon.


Error Codes

401 - Unauthorised

Error occurs if x-customer-id or x-api-key header is incorrect

402 - Insufficient Balance

Error occurs if account does not have enough credits to be able to process the asr request

403 - Inactive Customer

Error occurs if your account has been deactivated.


Last updated