Back to Explore

Mercury 2

API

Fastest reasoning LLM built for instant production AI

💡 Mercury, from Inception Labs, is the first commercial diffusion LLM. It is up to 10x faster than traditional autoregressive models, offering comparable or superior quality on coding and reasoning tasks.

"Traditional LLMs are like a typist hitting one key at a time; Mercury 2 is like a master editor who sketches a rough draft and polishes the entire page simultaneously."

8/10

Hype

9/10

Utility

14

Votes

Product Profile
Full Analysis Report

Mercury 2: Bringing Diffusion to Text Generation for 10x Faster Speeds

2026-02-26 | https://www.producthunt.com/products/mercury-2

Mercury 2 Chat Interface

Gemini Interpretation: This is the Inception chat interface, featuring a minimalist dark design with a "Diffusion Effect" toggle and a "Mercury 2" model selector. The style is similar to Perplexity, focusing on a clean and fast user experience.


30-Second Quick Take

What is it?: Inception Labs has built an LLM that takes the road less traveled. Instead of spitting out tokens one by one, it works like an image diffusion model—generating a "draft" and refining multiple tokens simultaneously. The result: speeds exceeding 1,000+ tokens/sec, making it 13x faster than Claude Haiku and 15x faster than GPT-5 Mini.

Is it worth your attention?: Yes, but it depends on your use case.

Why?:

  • If you're building AI Agents, real-time voice assistants, or code completion tools where latency is a dealbreaker, this is likely your most cost-effective choice.
  • If you need the "smartest" model for complex reasoning or long-form writing, Mercury 2 isn't the top pick—its intelligence is roughly on par with Claude Haiku, not the "Opus" or "Pro" tier.
  • The architectural innovation is fascinating; developers should keep a close eye on the Diffusion LLM direction.

How does it compare?

Speed Comparison

Gemini Interpretation: This benchmark chart visually demonstrates the massive gap between Mercury 2 (1009 t/s), Claude Haiku 4.5 (89 t/s), and GPT-5 Mini (71 t/s), labeled as ">5x faster."

vsMercury 2Claude 4.5 HaikuGPT 5.2 Mini
Core DifferenceDiffusion architecture, parallel generationAutoregressive, one token at a timeAutoregressive, one token at a time
Speed1,196 t/s89 t/s71 t/s
Latency1.7 seconds23.4 secondsN/A
Output Price$0.75/M$5.00/MN/A
IntelligenceMedium (AIME 91.1)ComparableComparable
AdvantageFast, CheapMature ecosystem, stableOpenAI Ecosystem

Three Questions That Matter

Is this for me?

  • Target Audience: Developers and companies building AI applications, especially those sensitive to latency and cost.
  • Are you the one?: If you are doing the following, Mercury 2 is directly relevant to you:
    • Building AI Agents that require rapid, iterative LLM calls.
    • Creating real-time voice assistants where users can't wait 20 seconds for a reply.
    • Developing code completion or editor plugins that need instant feedback.
    • Handling large-scale batch processing where inference cost is the core concern.
  • Use Cases:
    • Agent Loops → Mercury 2’s 1.7s latency vs. 14-23s for others determines whether the product is even usable.
    • Real-time Dialogue/Voice → Use this.
    • Code Completion/Refactoring → Use this (already integrated with ProxyAI, Kilo Code, etc.).
    • Deep Analysis/Long-form Writing → Not ideal; stick with Claude Opus or GPT-5.

Is it useful for me?

DimensionBenefitCost
TimeAgent loops are 10x faster; code completion is nearly instantLearning a new API (though it is OpenAI-compatible)
MoneyOutput cost is 1/7th of Claude Haiku and 1/4th of Gemini FlashPay-as-you-go, $0.75 per million output tokens
EffortNo more choosing between "fast" or "smart"—if you need speed, this is itDiffusion prompting techniques might differ slightly from traditional models

ROI Judgment: If your scenario involves calling an LLM dozens of times per task (agents, search, code), the ROI for Mercury 2 is massive. You get 5-13x the speed at 4-7x lower cost. If you only call an API occasionally for a single chat, the difference is negligible.

Why will I love it?

The "Wow" Factor:

  • The feel of speed: Moving from "waiting for an answer" to "instant response" is a qualitative shift in user experience.
  • Cost disruption: Tasks that used to cost dollars to run via an Agent now cost cents.

What people are saying:

"The speed numbers are absurd. Around 1,000 tokens per second with end-to-end latency of 1.7 seconds. That's an order of magnitude faster." — @RuiDiaoX

Real User Feedback:

Positive: "Impressive inference speed from Inception Labs' diffusion LLMs. Diffusion LLMs are a fascinating alternative to conventional autoregressive LLMs. Well done!" — @AndrewYNg (Andrew Ng, 1224 likes)

Positive: "After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents." — Customer Feedback (Inception Labs Website)

Watching: "This is a very promising approach, and I hope they build larger, more capable models. If they achieve the performance of a Qwen3.5 34B, it would enable TurboTokens on home PCs." — @TeksEdge


For Indie Hackers & Developers

Tech Stack

  • Core Architecture: Diffusion Large Language Model (dLLM), distinct from traditional autoregressive models.
  • How it works: Starting from noise, it uses a Transformer network to denoise in multiple steps, modifying multiple tokens at once—similar to how Midjourney generates images, but for text.
  • GPU: Runs on NVIDIA Blackwell GPUs.
  • API Format: Provided via Inception API, 128K context window.
  • Feature Support: Tool calling (function calling), JSON structured output.

Diffusion vs Autoregressive Comparison

Gemini Interpretation: On the left, an autoregressive LLM requires 75 iterations to generate code; on the right, the Inception Diffusion LLM completes the same task in just 14 iterations, a 5x+ efficiency boost.

Core Implementation

Simply put: Traditional LLMs are like typists hitting one key at a time. Mercury 2 is like an editor who quickly writes a rough draft and then polishes all necessary parts simultaneously. Because each step processes multiple tokens in parallel, the effective work done per neural network inference far exceeds autoregressive models.

This isn't just optimization (like better GPUs or model compression); it's a fundamental change in the path. Diffusion has already proven itself in image and video generation (Midjourney, Sora); Inception is now bringing it to language.

Open Source Status

  • Model is not open source; available via API only.
  • Third-party SDK: https://github.com/hamzaamjad/mercury-client
  • Existing Integrations: ProxyAI, Buildglare, Kilo Code, browser-use.
  • Paper: https://arxiv.org/abs/2506.17298
  • Difficulty to replicate: Extremely high. Requires deep background in diffusion research and massive GPU resources. The founders are the Stanford professors who helped invent diffusion models.

Business Model

  • Monetization: API usage-based billing.
  • Pricing: $0.25/M input tokens, $0.75/M output tokens.
  • Blended Price: ~$0.38/M tokens (extremely cheap).
  • User Base: Not disclosed, but already integrated into several dev tools.

Big Tech Risk

This is the big question. Google is rumored to be working on "Gemini Diffusion." If Google releases a diffusion model that is both fast and flagship-smart, Inception's space could shrink. However:

  1. Inception is the first to bring diffusion LLMs to commercial scale, giving them a first-mover advantage.
  2. The founding team are academic authorities in this specific field.
  3. Investors include Microsoft, NVIDIA, and Databricks—giants choosing to invest rather than build it themselves (yet).
  4. The risk remains: if they can't scale intelligence beyond the "Haiku" level, speed alone might not be enough long-term.

For Product Managers

Pain Point Analysis

  • Problem Solved: Slow LLM inference and high costs, which severely limit the deployment of Agents, real-time voice, and code completion.
  • Severity: High. Many companies want to build AI Agents, but the latency makes for a poor user experience. An Agent task might require 20+ LLM calls; waiting 10-20 seconds for each is unacceptable for users.

User Persona

  • Core Users: AI application developers, Agent platforms, Voice AI companies.
  • Secondary Users: Code editor/IDE companies, search engines.
  • Scenarios: Agent loops, real-time voice dialogue, auto-completion, large-scale text processing.

Feature Breakdown

FeatureTypeDescription
Ultra-fast Inference (1000+ t/s)CoreThe primary advantage of the diffusion architecture
Low Cost ($0.75/M output)Core4-7x cheaper than competitors
Reasoning Capability (AIME 91.1)CoreComparable to Haiku-level models
Function CallingCoreEssential for Agent scenarios
JSON Structured OutputCoreDeveloper-friendly
128K ContextNice-to-haveSufficient but not industry-leading

Competitive Differentiation

vsMercury 2Groq (Llama)CerebrasSambaNova
Core DifferenceArchitectural InnovationHardware Acceleration (LPU)Hardware AccelerationHardware Acceleration
Speed SourceParallel GenerationSpecialized ChipsWafer-scale ChipsCustom Processors
IntelligenceMediumModel-dependentModel-dependentModel-dependent
Pricing$0.75/M outputVariesVariesVaries
UniquenessInnovation at the model levelRuns any modelRuns any modelRuns any model

Key Takeaways

  1. "Speed as a Feature": Instead of competing on pure intelligence, they hit the extreme on the speed dimension to find a differentiated entry point.
  2. Academic to Commercial Path: A clear trajectory from paper → open research → commercial API.
  3. Simple Pricing: Only two price points (input/output) with no complex tiers, lowering the barrier to decision-making.

For Tech Bloggers

Founder Story

  • Founders: Stefano Ermon (Stanford Professor), Aditya Grover (UCLA Professor), Volodymyr Kuleshov (Cornell Professor).
  • Background: The trio has collaborated for over 10 years and were early researchers of core AI technologies like Diffusion Models, Flash Attention, and DPO (Direct Preference Optimization). Ermon himself was involved in the invention of diffusion models—the tech behind Midjourney and Sora.
  • The Mission: Academia proved that diffusion could crush traditional methods in image and video; they want to replicate that success in text.
  • Timeline: Founded in 2024 → Stealth debut Feb 2025 → $50M funding Nov 2025 → Mercury 2 release Feb 2026.

Discussion Angles

  • Angle 1 - "Fast enough, but smart enough?": Mercury 2 targets the Haiku tier, not Opus or GPT-5. It's perfect for speed-reliant tasks, but can it handle high-level complexity?
  • Angle 2 - "Diffusion vs. Autoregressive: The Future?": This is a battle of technical philosophies. If diffusion LLMs can reach flagship intelligence while maintaining speed, the industry landscape will be rewritten.
  • Angle 3 - "The Inventors Step In": The founders are the literal inventors of the tech. A story of researchers commercializing their own breakthrough has natural appeal.
  • Angle 4 - "The Andrew Ng + Andrej Karpathy Signal": When two AI godfathers bet on the same horse, it's a strong market signal.

Buzz Data

  • PH Ranking: 14 votes (Low, as the product is B2B/Developer focused, not for general PH consumers).
  • Twitter/X: Founder's tweet got 3,753 likes; Andrew Ng's retweet got 1,224 likes.
  • HN Discussion: Dedicated thread (item?id=47144464).
  • Media Coverage: Extensive reporting by Bloomberg, TechCrunch, Yahoo Finance, eWeek, InfoWorld, and The Decoder.

Content Suggestions

  • Tech Deep Dive: How diffusion models are applied to text generation.
  • Founder Profile: The journey from Stanford labs to a $50M startup.
  • Market Analysis: The "Inference Speed Race" of 2026.

For Early Adopters

Pricing Analysis

TierPriceFeaturesIs it enough?
API$0.25/M input, $0.75/M outputFull model capability, 128K context, function calling, JSON outputPerfectly sufficient for speed-sensitive apps

Currently API-only with no explicit free tier mentioned. However, the price is incredibly low—generating 1 million tokens (about a medium-length book) costs only $0.75.

Getting Started Guide

  • Setup Time: 10-15 minutes.
  • Learning Curve: Low (if you've used OpenAI's API).
  • Steps:
    1. Apply for an API key at https://www.inceptionlabs.ai/.
    2. Install the SDK: pip install mercury-client.
    3. Call it like OpenAI; it supports function calling and JSON mode.
    4. Note: Diffusion prompts may need slight adjustments compared to GPT prompts.

Pitfalls & Complaints

  1. Verbosity: Mercury 2 tends to generate very long outputs. In evaluations, it generated 69M tokens where other models averaged 15M. You may need to explicitly prompt for brevity.
  2. Niche Robustness: It might be less stable than mature autoregressive models on extremely niche or highly specialized reasoning tasks.
  3. Fine-tuning Path: If you need to fine-tune, the process for diffusion models differs from traditional methods, and support is currently unclear.
  4. Ecosystem Lock-in: Only one API provider (Inception Labs), unlike the rich third-party toolchains for OpenAI or Anthropic.

Security & Privacy

  • Data Storage: Cloud-based API.
  • Privacy Policy: Refer to Inception Labs' specific terms.
  • Security Audits: No public information available yet.

Alternatives

AlternativeAdvantageDisadvantage
Claude 4.5 HaikuMature ecosystem, brand trust13x slower, 7x more expensive
GPT 5.2 MiniOpenAI ecosystem, rich tools15x slower
Groq + LlamaChoice of modelsHardware acceleration, not architectural innovation
Gemini 3 FlashGoogle ecosystem, multimodal4x more expensive, slower

For Investors

Market Analysis

  • Sector Size: AI inference market projected at $106.1B in 2025 → $255B by 2030 (19.2% CAGR).
  • Growth Rate: Inference costs are dropping 10x annually.
  • Drivers: The explosion of AI Agents requires massive low-latency inference; by 2026, inference cost will be the primary competitive factor.

Competitive Landscape

TierPlayersPositioning
LeadersOpenAI, Anthropic, GoogleAll-rounders, leading in intelligence
Speed TierGroq, Cerebras, SambaNovaHardware acceleration for existing models
Architectural InnovationInception Labs (Mercury 2)Diffusion LLM, speed at the model level

Timing Analysis

  • Why Now?: Agents are the hottest application direction for 2026, but latency and cost are the main bottlenecks. Mercury 2 hits this pain point perfectly.
  • Tech Maturity: Diffusion LLMs have academic backing; Mercury 2 is the first commercial-grade implementation.
  • Market Readiness: Developers are used to API models, making switching costs low. The challenge is educating the market on the "Diffusion LLM" concept.

Team Background

  • Founders: Stefano Ermon (Stanford), Aditya Grover (UCLA), Volodymyr Kuleshov (Cornell)—three top-tier professors.
  • Core Contributions: Inventors of diffusion models, Flash Attention, DPO, etc.
  • Track Record: Extremely high academic citations; among the most influential researchers in AI.

Funding Status

  • Total Raised: $56 Million.
  • Lead Investor: Menlo Ventures.
  • Participants: Mayfield, Innovation Endeavors, NVentures (NVIDIA), M12 (Microsoft), Snowflake Ventures, Databricks Investment.
  • Angel Investors: Andrew Ng, Andrej Karpathy.
  • Total Investors: 13.

Conclusion

Mercury 2 is a genuine architectural innovation, not just a patch on old tech. Its speed advantage is overwhelming, but its intelligence remains at the Haiku level. The key to its future is whether it can scale up in capability.

User TypeRecommendation
Developers✅ If you're building Agents or real-time apps, this is a must-try. Familiar API, low switching cost.
Product Managers✅ Watch this space. Lower speed and cost could unlock product forms that were previously impossible.
Bloggers✅ Great material. Founder story + technical rivalry + big-name backing.
Early Adopters✅ Worth a try; the API is very cheap. Just don't expect it to replace Claude Opus for complex tasks.
Investors✅ Top-tier team, great sector, perfect timing. The risk lies in Big Tech giants adopting similar architectures.

Resource Links

ResourceLink
Official Websitehttps://www.inceptionlabs.ai/
Bloghttps://www.inceptionlabs.ai/blog/introducing-mercury-2
Artificial Analysishttps://artificialanalysis.ai/models/mercury-2
Paperhttps://arxiv.org/abs/2506.17298
Python SDKhttps://github.com/hamzaamjad/mercury-client
HN Discussionhttps://news.ycombinator.com/item?id=47144464
ProductHunthttps://www.producthunt.com/products/mercury-2
Founder's Twitterhttps://twitter.com/StefanoErmon

2026-02-26 | Trend-Tracker v7.3

FAQ

Frequently Asked Questions about Mercury 2

Fastest reasoning LLM built for instant production AI

Data source: ProductHuntFeb 26, 2026
Last updated: