Advanced Features

Refer to the features below for improving accuracy, debugging latency and more.

🔍 Context Biasing (Hotwords)

You can boost recognition of important or uncommon phrases by specifying hotwords during the request.

Using Hotwords

Define your hotwords as a JSON array. You can specifiy a higher "boosting score" if you would like to provide extra emphasis to longer phrases (recommended!). Currently, the default score applied is 1.5 which should be sufficient for single words.

curl --location 'https://bodhi.navana.ai/api/transcribe' \
--header 'x-customer-id: <customer_id>' \
--header 'x-api-key: <api_key>' \
--form 'transaction_id=<uuid>' \
--form 'audio_file=@"<audio_file_path>"' \
--form 'model="hi-banking-v2-8khz"' \
--form 'hotwords="[{\"phrase\":\"बोधी\"},{\"phrase\":\"स्पीच रिकग्निशन\",\"score\":4.5}]"'

Best Practices

Best Practice

Description

✅ Use uncommon words

Target domain-specific or rare phrases like "बोधी स्पीच रिकग्निशन"

✅ Use local script

Always write in Devanagari (e.g. बोधी, not bodhi)

✅ Avoid punctuation

Remove quotes, commas, periods

✅ Use higher scores for longer phrases

e.g. "बोधी स्पीच रिकग्निशन " -> 2.5 vs

"बोधी" -> 1.5

Avoid copying hotwords from other providers without validation. Bodhi may already support commonly spoken Hindi words natively.

Warnings

Avoid very short particles like "का", "की", "ए", etc.
Don’t boost every word in a sentence — only uncommon or error-prone segments.
Phrases work better for commonly missed phrases, individual tokens are better for rare words.
Avoid boosting words that already work as is.

🔢 Parse Numbers into Numerals

Bodhi supports converting spoken number words into actual digits using the parse_number flag in the form values.

This is useful when transcribing sentences that include monetary values, phone numbers, addresses, or quantities — especially for use cases like banking, insurance, and logistics.

curl --location 'https://bodhi.navana.ai/api/transcribe' \
--header 'x-customer-id: <customer_id>' \
--header 'x-api-key: <api_key>' \
--form 'transaction_id="<uuid>"' \
--form 'audio_file=@"<audio_file_path>"' \
--form 'model="hi-banking-v2-8khz"' \
--form 'parse_number="True"'

🧾 Example

Mode

Output

Without parse_number

"घर बनाने के लिए मुझे पच्चीस लाख का लोन चाहिए"

With parse_number: True

"घर बनाने के लिए मुझे 2500000 का लोन चाहिए"

🌐 Language Support

This feature is currently available for:

Hindi (hi)
Malayalam (ml)
Kannada (kn)
Gujarati (gu)
Marathi (mr)

Want support for another language? Reach out to support@navanatech.in

📦 Aux Metadata

Set aux: True in your form values to receive server-side diagnostic metadata along with your transcript response.

This is useful for logging, benchmarking, or correlating timestamps across systems.

curl --location 'https://bodhi.navana.ai/api/transcribe' \
--header 'x-customer-id: <customer_id>' \
--header 'x-api-key: <api_key>' \
--form 'transaction_id="<uuid>"' \
--form 'audio_file=@"<audio_file_path>"' \
--form 'model="hi-banking-v2-8khz"' \
--form 'aux="true"'

📘 What You Get

When enabled, each final transcript message will include an aux_info block:

"aux_info": {
        "request_time": 0.273680048,
        "received_request_time": "2025-05-19T09:44:50.975311686Z",
        "segments_meta": [
            {
                "tokens": [
                    " घ",
                    "र",
                    " बना",
                    "ने",
                    " के",
                    " लिए",
                    " मुझे",
                    " प",
                    "च",
                    "्",
                    "च",
                    "ी",
                    "स",
                    " लाख",
                    " का",
                    " ल",
                    "ो",
                    "न",
                    " चाहिए"
                ],
                "timestamps": [
                    1,
                    1.16,
                    1.4399999,
                    1.7199999,
                    1.8399999,
                    2,
                    2.24,
                    2.44,
                    2.48,
                    2.6399999,
                    2.6799998,
                    2.72,
                    2.76,
                    2.9199998,
                    3.12,
                    3.28,
                    3.32,
                    3.4399998,
                    3.72
                ],
                "start_time": 0,
                "end_time": 3.72,
                "text": " घर बनाने के लिए मुझे पच्चीस लाख का लोन चाहिए",
                "confidence": 0.8847437
            }
        ],
        "confidence": 0.8847437 
    }

Field

Description

request_time (float)

Total time in seconds that the server spent handling this request (excluding network transfer delays).

received_request_time (timestamp)

The timestamp (UTC) when the server received the initial WebSocket connection or request.

segments_meta (array of objects)

Detailed view of all segment objects (transcripts separated by silences) recognized for the audio file provided. Each segment object has the following information:

tokens: Array of strings representing individual text pieces (or "tokens") recognized from the segment. Tokens may include words or parts of words.

timestamps: Array of numerical values indicating when each token was detected in the segment (in seconds). Each timestamp aligns with the tokens array, so the i-th timestamp represents the time at which the i-th token was spoken. Useful for measuring latency.

start_time: Starting point (in seconds) of the current segment in the overall audio timeline.

end_time: Ending point (in seconds) of the current segment in the overall audio timeline.
text: Transription belonging to the current segment
confidence: Confidence score (float between 0 and 1) for the model’s prediction for this segment.

confidence

Confidence score (float between 0 and 1) for the model’s prediction for the entire audio.

Note: This is an average of all segment confidences. Note: This field will not be present if the model does not predict any text for the audio.

This can help you:

Profile server-side performance
Track session start times
Debug slow or idle sessions
Assessing how confident the model is about its prediction

PreviousError Responses

Last updated 1 month ago