Cartesia

Builds real-time voice AI models using state space architecture for ultra-low-latency text-to-speech and speech-to-text.

Updated May 2026

Overview

Website: cartesia.ai
Ownership: Private
Segment: Audio & Speech

Product overview

Cartesia develops Sonic TTS models (e.g., Sonic-3, Turbo) for ultra-low latency speech synthesis (40-90ms time-to-first-audio) and Ink STT models, using efficient state space model architecture distinct from transformers for real-time applications. These power conversational AI agents, customer support, content creation, and gaming, used by over 10,000 customers including Quora, Cresta, Rasa, and Forethought. Their SSMs enable on-device operation, better long-context handling, and lower compute costs compared to competitors like OpenAI TTS.

Revenue model

Subscription tiers (Free $0, Pro $4/mo, Startup $39/mo, Scale $239/mo, Enterprise custom) with included credits plus usage-based billing: 1 credit/character for TTS (Sonic), 1 credit/second for STT (Ink), $0.014-$0.06/min telephony; 20% savings on yearly plans.

Moat

Cartesia's key competitive moat is its proprietary state space model (SSM) architecture, invented by its founders at Stanford AI Lab and scaled to deliver the fastest, most realistic voice AI models like Sonic 2.0 with 90ms latency, unprecedented controllability for voice cloning and editing, and efficient on-device deployment—outperforming transformer-based rivals in speed, quality, and real-time multimodal capabilities. This first-mover technical lead, combined with a robust API infrastructure boasting 99.9% uptime and enterprise compliance, creates high switching costs for customers reliant on its ultra-low-latency, customizable TTS performance.

Headwinds

State space models may not prove superior to transformers at scale, and the company faces intense competition from well-funded foundation model labs.

Stack lineage

What powers Cartesia

L4Models

Together AI
Cartesia runs real-time voice AI on Together AI's GPU infrastructure.

What Cartesia powers

No relationships recorded yet.