Cartesia

Free tier

Architecting AI that learns and interacts like humans — ultra-low latency voice AI

Free tier available·All audiences·Powered by Cartesia·API available

Key strengths

Ultra-low latency real-time voice models built on State Space Models (SSMs)Full-stack voice platform: STT (Ink), TTS (Sonic), and voice agents (Line)Flexible deployment: cloud, on-premise, and on-deviceEnterprise-grade compliance with in-region data residency supportPioneer of Mamba & H-Net architectures for efficient large-scale inference

Free tier + paid plans

Self-hostable

No ratings yet

Developer Documentation

API Access

Cartesia provides a REST/streaming API for Sonic (TTS) and Ink (STT). Authenticate with your API key and start making requests immediately:

curl -X POST https://api.cartesia.ai/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, how can I help you?", "voice_id": "sonic-3.5", "output_format": "pcm_16000"}'

SDKs & Integration

Official SDKs available for Python, Node.js, and other major languages.
Line (voice agents) integrates with existing telephony and enterprise systems via standard interfaces.
Supports streaming audio output for real-time, low-latency pipelines.

Deployment Options

Cloud: Deploy via regional API endpoints (in-region processing for data residency compliance).
On-premise: Deploy in your own VPC or customer environment for full infrastructure control.
On-device: Edge deployment on mobile, PC, and robotics with fully private, offline inference.

Key Parameters

voice_id: Select from Sonic model versions (e.g., sonic-3.5)
output_format: Audio encoding, e.g., pcm_16000, mp3
language: Supported multi-language input/output
stream: Boolean flag for real-time streaming responses