# Response Structure

The table below explains the structure of responses from the Bodhi API, detailing the meaning of each field and helping you better understand the data returned from the API.

```json
{
  "call_id": "<uuid>",
  "segment_id": <int>,
  "eos": <boolean>,
  "type": "<string>",
  "text": "<string>",
  "segment_meta": {
    "tokens": [],
    "timestamps": [],
    "start_time": <float>,
    "confidence": <float>,
    "words": [
      {
        "word": "<string>",
        "confidence": <float>
      }
    ]
  }
}
```

**Field Descriptions**

<table><thead><tr><th width="203.84979248046875">Key</th><th>Description</th></tr></thead><tbody><tr><td><strong>call_id</strong> (string)</td><td>Unique identifier associated with every streaming connection</td></tr><tr><td><strong>segment_id (string)</strong></td><td>Integer associated with every speech segment during the entire active socket connection</td></tr><tr><td><p><strong>eos</strong> (bool)</p><p></p></td><td>Marks the end of the streaming connection when "eos" is true.</td></tr><tr><td><strong>type (string)</strong></td><td><p>Possible values: "partial" | "complete"</p><p><strong>partial</strong></p><ul><li>Partial transcript corresponding to every streaming audio chunk</li></ul><p><strong>complete</strong></p><ul><li><p>Complete/final transcript generated for each speech segment</p><ul><li>Generated once per segment_id i.e., when the speech segment end is reached</li></ul></li></ul></td></tr><tr><td><strong>text (string)</strong></td><td>The transcript that has been processed thus far.</td></tr><tr><td><p></p><p><strong>segment_meta (object)</strong></p><p></p></td><td><p></p><ul><li><strong>tokens</strong>: Array of strings representing individual text pieces (or "tokens") recognized from the audio. Tokens may include words or parts of words.</li></ul><p></p><ul><li><strong>timestamps</strong>: Array of numerical values indicating when each token was detected in the segment/sentence (in seconds). Each timestamp aligns with the tokens array, so the i-th timestamp represents the time at which the i-th token was spoken. Useful for measuring latency.</li></ul><p></p><ul><li><strong>start_time</strong>: Starting point (in seconds) of the current segment in the overall audio timeline.</li></ul><p></p><ul><li><strong>confidence</strong>: Segment level confidence. Float between 0 and 1. </li></ul><p></p><ul><li><p><strong>words</strong>: Array of word level objects (only populated when type is complete).</p><ul><li><strong>word</strong>: The recognised word.</li><li><strong>confidence</strong>: Float value between 0.0 and 1.0 representing the model’s confidence in the recognized word.</li></ul></li></ul></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://navana.gitbook.io/bodhi/quickstart/streaming-websocket/response-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
