The AI Stack
Sign in

Baseten

AI inference platform for deploying and scaling open-source and custom models at production scale.

Updated April 2026

Overview

Website
baseten.co
Founded
2019
Headquarters
San Francisco, California
Segment
Model Distribution & Serving

Product overview

Baseten provides a platform to train, deploy, and serve AI models including open-source LLMs, custom fine-tuned models, and pre-optimized Model APIs, using optimized inference engines like TensorRT-LLM for high throughput and low latency. Customers include AI startups like Abridge, Cursor, Clay, and Sourcegraph for production applications such as transcription, voice agents, and coding assistants. It stands out with multi-cloud GPU scheduling, autoscaling to zero, compound AI chains, and superior performance benchmarks over competitors like vLLM.

Revenue model

Usage-based pay-as-you-go: Model APIs priced per million tokens (e.g., GPT OSS 120B at $0.10 input/$0.50 output); dedicated deployments billed per minute of GPU compute (e.g., H100 $0.10833/min, A100 $0.06667/min). Plans include Basic (free, pay-as-you-go), Pro (volume discounts), Enterprise (custom SLAs, self-hosting).

Moat

Baseten's key competitive moat is its proprietary hardware-software co-design for AI inference, delivering 225% better cost-performance on NVIDIA Blackwell GPUs via Google Cloud A4 VMs, enabling 5x more requests at lower latency and costs than rivals. This is reinforced by multi-cloud scalability, ultra-fast cold starts (e.g., 15 seconds for Stable Diffusion), scale-to-zero autoscaling to eliminate idle GPU waste, and open-source tools like Truss, creating high switching costs for customers optimizing production AI deployments.

Headwinds

Faces intense competition from hyperscale cloud providers and other model serving platforms in a rapidly commoditizing market.

Active layers