Skip to main content

Latency settings

Balance speed and accuracy in your Realtime transcription by adjusting latency settings.

Configuration options

Configure real-time latency with the following parameters:

  • max_delay: Maximum time in seconds (0.7-4.0, default: 4.0) between what's said and final transcript delivery
  • max_delay_mode: Mode setting (fixed or flexible, default: flexible) for handling numeral formatting
  • enable_partials: Boolean (default: false) to enable partial transcripts for faster feedback

Add these parameters to your StartRecognition message:

{
"transcription_config": {
"max_delay": 0.7,
"max_delay_mode": "flexible",
"enable_partials": true,
"language": "en",
"operating_point": "enhanced"
}
}

Speed vs. accuracy trade-offs

Choose the right max_delay setting for your use case:

SettingAccuracy ImpactRecommended Use Cases
0.7-1.5s< 5% degradationConversational AI, voice assistants
2.0s~1% degradationLive captioning, broadcast media
4.0sNo degradationHighest accuracy needs with partial transcripts
WARNING

Lower latency settings trade some accuracy for speed. Test thoroughly with your specific audio.

Partial transcripts

Get preliminary results faster while waiting for final, more accurate transcripts.

How partial transcripts work

  • Delivered in under 500ms (vs. final transcripts at your configured max_delay)
  • Updated continuously as more speech context becomes available
  • Enabled with enable_partials: true in your configuration

Limitations

  • Accuracy is typically 10-25% lower than final transcripts
  • Punctuation and capitalization may be incorrect
  • Confidence scores are not meaningful and should be ignored

Numeral formatting

Improve transcript readability with properly formatted numbers, dates, and currencies.

Flexible mode

When using max_delay_mode: "flexible" (default):

  • System waits until an entity (number, date, currency) is fully spoken
  • Ensures proper formatting of complex numerical expressions
  • Slightly increases latency only when entities are detected

Fixed mode

For applications with strict latency requirements:

  • Set max_delay_mode: "fixed" to enforce consistent timing
  • System won't wait for entities to complete before returning results
WARNING

Fixed mode reduces accuracy and readability of numbers, currencies, and dates.

Example output comparison

Finals only (default)

With only final transcripts (default configuration):

(Final): I am 35.

Partials with flexible mode

With enable_partials: true and max_delay_mode: "flexible":

(Partial): I
(Partial): I am
(Partial): I am third
(Partial): I am 30
(Final): I am 35.

Note how the system corrects "30" to "35" in the final transcript.

Partials with fixed mode

With enable_partials: true and max_delay_mode: "fixed":

(Partial): I
(Final): I am
(Partial): third
(Final): 30
(Partial): five
(Final): five.

Final output: "I am 30 five." Note how the number isn't properly formatted.