Deepgram Developer Documentation

Key APIs

Speech-to-Text API — Real-time (WebSocket) and batch (REST) transcription powered by the Nova model. Supports word-level timestamps, diarization, punctuation, and language detection.
Text-to-Speech API — Synthesize natural-sounding speech via the Speak model with REST or streaming endpoints.
Voice Agent API — A single unified WebSocket API that orchestrates STT → LLM → TTS in one pipeline, reducing latency and integration complexity.
Audio Intelligence API — Adds capabilities like summarization, topic detection, sentiment analysis, and intent recognition on top of transcription.

Quick Example (STT via cURL)

curl -X POST "https://api.deepgram.com/v1/listen" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Key Parameters

model — Choose nova-2, flux, or custom model identifiers
language — BCP-47 language code (e.g., en-US, es, de)
diarize — Enable speaker diarization (true/false)
punctuate — Auto punctuation insertion
stream — Real-time streaming via WebSocket

SDKs & Integrations

Official SDKs available for Python, Node.js, Go, .NET, and Rust. Native integration available for Amazon Connect.