Refer to the features below for improving accuracy, increasing background noise resilience and more.
🔍 Context Biasing (Hotwords)
You can boost recognition of important or uncommon phrases by specifying hotwords during the WebSocket request.
Using Hotwords
Define your hotwords as a JSON array. You can specifiy a higher "boosting score" if you would like to provide extra emphasis to longer phrases (recommended!). Currently, the default score applied is 1.5 which should be sufficient for single words.
Target domain-specific or rare phrases like "बोधी स्पीच रिकग्निशन"
✅ Use local script
Always write in Devanagari (e.g. बोधी, not bodhi)
✅ Avoid punctuation
Remove quotes, commas, periods
✅ Use higher scores for longer phrases
e.g. "बोधी स्पीच रिकग्निशन " -> 2.5 vs
"बोधी" -> 1.5
Avoid copying hotwords from other providers without validation. Bodhi may already support commonly spoken Hindi words natively.
Warnings
Avoid very short particles like "का", "की", "ए", etc.
Don’t boost every word in a sentence — only uncommon or error-prone segments.
Phrases work better for commonly missed phrases, individual tokens are better for rare words.
Avoid boosting words that already work as is.
🧠 Confidence Scoring
Bodhi returns both segment-level confidence for all finalized results. You can use this to decide whether to use the response or not. For instance, in extremely noisy surrounding, confidence thresholds can be used to reduce unintended transcriptions.
Segment Confidence
Word-level Confidence
Recommended Usage
Use Case
Strategy
Filter out weak segments
Depending on language the confidence threshold can be somewhere in the range of 0.65 - 0.75.
Tune per language and per use case
Call center vs dictation use cases need different thresholds. This threshold can also vary per language.
🧹 Partial Result Exclusion
To avoid reacting to incomplete guesses, use the exclude_partial flag in the configuration. The client will only receive complete transcripts.
🔢 Parse Numbers into Numerals
Bodhi supports converting spoken number words into actual digits using the parse_number flag in the WebSocket config.
This is useful when transcribing sentences that include monetary values, phone numbers, addresses, or quantities — especially for use cases like banking, insurance, and logistics.