Cartesia logo

Cartesia

Free tier

Architecting AI that learns and interacts like humans — ultra-low latency voice AI

Free tier available·All audiences·Powered by Cartesia·API available

Key strengths

Ultra-low latency real-time voice models built on State Space Models (SSMs)Full-stack voice platform: STT (Ink), TTS (Sonic), and voice agents (Line)Flexible deployment: cloud, on-premise, and on-deviceEnterprise-grade compliance with in-region data residency supportPioneer of Mamba & H-Net architectures for efficient large-scale inference
Free tier + paid plans
Self-hostable
No ratings yet

Developer Documentation

API Access

Cartesia provides a REST/streaming API for Sonic (TTS) and Ink (STT). Authenticate with your API key and start making requests immediately:

curl -X POST https://api.cartesia.ai/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, how can I help you?", "voice_id": "sonic-3.5", "output_format": "pcm_16000"}'

SDKs & Integration

  • Official SDKs available for Python, Node.js, and other major languages.
  • Line (voice agents) integrates with existing telephony and enterprise systems via standard interfaces.
  • Supports streaming audio output for real-time, low-latency pipelines.

Deployment Options

  • Cloud: Deploy via regional API endpoints (in-region processing for data residency compliance).
  • On-premise: Deploy in your own VPC or customer environment for full infrastructure control.
  • On-device: Edge deployment on mobile, PC, and robotics with fully private, offline inference.

Key Parameters

  • voice_id: Select from Sonic model versions (e.g., sonic-3.5)
  • output_format: Audio encoding, e.g., pcm_16000, mp3
  • language: Supported multi-language input/output
  • stream: Boolean flag for real-time streaming responses