Deepgram
Free tierThe most accurate and cost-effective real-time APIs for speech-to-text, text-to-speech, and voice agents
Free tier available·All audiences·Powered by Deepgram·API available
Key strengths
Real-time and batch speech-to-text with industry-leading accuracyUnified Voice Agent API combining STT, TTS, and LLM orchestrationAvailable both cloud-hosted and self-hosted for enterprise complianceMultilingual support across 10+ languages including Flux conversational STTCost-effective at scale with enterprise-grade reliability
Free tier + paid plans
San Francisco, USA
Founded 2015
Self-hostable
No ratings yet
Deepgram Developer Documentation
Key APIs
- Speech-to-Text API — Real-time (WebSocket) and batch (REST) transcription powered by the Nova model. Supports word-level timestamps, diarization, punctuation, and language detection.
- Text-to-Speech API — Synthesize natural-sounding speech via the Speak model with REST or streaming endpoints.
- Voice Agent API — A single unified WebSocket API that orchestrates STT → LLM → TTS in one pipeline, reducing latency and integration complexity.
- Audio Intelligence API — Adds capabilities like summarization, topic detection, sentiment analysis, and intent recognition on top of transcription.
Quick Example (STT via cURL)
curl -X POST "https://api.deepgram.com/v1/listen" \
-H "Authorization: Token YOUR_API_KEY" \
-H "Content-Type: audio/wav" \
--data-binary @audio.wav
Key Parameters
model— Choosenova-2,flux, or custom model identifierslanguage— BCP-47 language code (e.g.,en-US,es,de)diarize— Enable speaker diarization (true/false)punctuate— Auto punctuation insertionstream— Real-time streaming via WebSocket
SDKs & Integrations
Official SDKs available for Python, Node.js, Go, .NET, and Rust. Native integration available for Amazon Connect.
