Inception
Develops diffusion-based large language models that generate text 5-10x faster than transformer LLMs.
Updated April 2026
Overview
- Website
- inceptionlabs.ai
- Headquarters
- Palo Alto, CA, United States
- Segment
- Specialized & Emerging
Product overview
Inception Labs builds Mercury-series diffusion LLMs like Mercury 2 and Mercury Edit 2, offering features such as 128K context, reasoning, tool use, and structured outputs via OpenAI-compatible API. These models serve developers and enterprises in coding, real-time voice, agentic workflows, and customer support, with users including Zed, Viant, Wispr Flow, and Skyvern. They stand out by using parallel token generation via diffusion technology for 1,000+ tokens/sec on NVIDIA GPUs and 10x lower cost versus autoregressive LLMs.
Revenue model
Usage-based API pricing: $0.25 per 1M input tokens ($0.025 cached), $0.75 per 1M output tokens for Mercury models; free tier with 10M tokens; developer plan with generous limits and priority support; enterprise with custom SLAs and volume pricing.
Moat
Inception's key competitive moat is its proprietary diffusion-based large language models, such as the Mercury model, which enable significantly faster inference speeds and lower compute costs compared to traditional autoregressive LLMs, pioneered by founders with breakthroughs in diffusion modeling, Flash Attention, and Direct Preference Optimization. This technological edge creates high barriers to entry through advanced research IP and rapid integration into enterprise tools like AWS Bedrock and development platforms, while delivering real-time performance advantages in coding, voice, and search applications.
Headwinds
Diffusion-based LLM approach is unproven at scale and faces intense competition from established autoregressive models.