Deepgram logo

Deepgram

Free tier

The most accurate and cost-effective real-time APIs for speech-to-text, text-to-speech, and voice agents

Free tier available·All audiences·Powered by Deepgram·API available

Key strengths

Real-time and batch speech-to-text with industry-leading accuracyUnified Voice Agent API combining STT, TTS, and LLM orchestrationAvailable both cloud-hosted and self-hosted for enterprise complianceMultilingual support across 10+ languages including Flux conversational STTCost-effective at scale with enterprise-grade reliability
Free tier + paid plans
San Francisco, USA
Founded 2015
Self-hostable
No ratings yet

Deepgram Developer Documentation

Key APIs

  • Speech-to-Text API — Real-time (WebSocket) and batch (REST) transcription powered by the Nova model. Supports word-level timestamps, diarization, punctuation, and language detection.
  • Text-to-Speech API — Synthesize natural-sounding speech via the Speak model with REST or streaming endpoints.
  • Voice Agent API — A single unified WebSocket API that orchestrates STT → LLM → TTS in one pipeline, reducing latency and integration complexity.
  • Audio Intelligence API — Adds capabilities like summarization, topic detection, sentiment analysis, and intent recognition on top of transcription.

Quick Example (STT via cURL)

curl -X POST "https://api.deepgram.com/v1/listen" \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Key Parameters

  • model — Choose nova-2, flux, or custom model identifiers
  • language — BCP-47 language code (e.g., en-US, es, de)
  • diarize — Enable speaker diarization (true/false)
  • punctuate — Auto punctuation insertion
  • stream — Real-time streaming via WebSocket

SDKs & Integrations

Official SDKs available for Python, Node.js, Go, .NET, and Rust. Native integration available for Amazon Connect.