Back to Explore

Step 3 5 Flash

Frontier open-source MoE model built for OpenClaw agents

💡 Step 3.5 Flash is a high-efficiency open-source Mixture-of-Experts (MoE) model developed by StepFun. It features 196B total parameters with only 11B active during inference, achieving a remarkable 'intelligence density' that rivals massive closed-source models. Specifically optimized for AI Agent scenarios, it excels in mathematical reasoning (AIME 2025) and coding tasks (LiveCodeBench) while supporting local deployment on high-end consumer hardware like the Mac Studio M4 Max.

"It's the 'Special Ops' of AI models—a lean, highly trained 11B-parameter team that executes complex missions with the precision of a massive army, yet fits right inside your local workstation."

30-Second Verdict
What is it: An ultra-efficient open-source LLM from StepFun, built on MoE architecture and specifically optimized for Agents.
Worth attention: Highly worth watching. It is one of the most 'parameter-efficient' models in the open-source world, with top-tier math and coding reasoning under an Apache 2.0 license.
8/10

Hype

9/10

Utility

101

Votes

Product Profile
Full Analysis Report

Step 3.5 Flash: The Masterpiece of "Doing More with Less" in Open Source

2026-03-06 | ProductHunt | GitHub | Official Blog

Step 3.5 Flash Intelligence Density Comparison

This scatter plot is the most convincing evidence for Step 3.5 Flash: the horizontal axis represents total parameters, and the vertical axis represents comprehensive performance. Step 3.5 Flash uses 196B parameters (with only 11B active) to achieve scores comparable to 100B-level closed-source models, leading the pack in "intelligence density."


30-Second Quick Judgment

What is it: An open-source LLM released by StepFun that uses a MoE architecture to compress 196B parameters into 11B active parameters during runtime. It is specifically optimized for Agent scenarios and can run on a Mac Studio M4 Max.

Is it worth watching?: Yes. It is one of the most "parameter-efficient" players in the open-source field. It scored 97.3 in AIME 2025 math reasoning (tied for first with GLM-4.7) and 86.4 on LiveCodeBench. Plus, it features an Apache 2.0 license and a free API on OpenRouter. If you are developing Agents or want to escape API fees, this is a serious contender.


Three Questions That Matter

Is it relevant to me?

Target Users:

  • AI Agent developers (who need reliable tool calling)
  • Independent devs/startups sensitive to API costs
  • Companies focused on data privacy wanting local deployment
  • Open-source contributors and AI researchers

You are the target user if:

  • Your monthly API bill exceeds $100 and you want a free alternative.
  • You are building coding agents/automation tools and need fast inference.
  • You don't want to send your code to third-party APIs and prefer running locally.
  • You are researching MoE architectures or RL training methods.

Who is this NOT for?:

  • Casual users just looking for chat/writing (Claude or GPT are more intuitive).
  • Users without high-end hardware (M4 Max / DGX Spark); it won't run well.
  • Those needing multimodal capabilities (image/video); Step 3.5 Flash is text-only.

Is it useful to me?

DimensionBenefitCost
TimeAgent task inference at 100-350 tok/s, several times faster than most local modelsSetup and debugging takes half a day to a full day
MoneyZero API fees for self-hosting; free trial on OpenRouterHigh hardware barrier: M4 Max ~$4000+ or DGX Spark
Effort256K context window handles long code files at onceTool calling compatibility has some "gotchas" to navigate

ROI Judgment: If you already have the right hardware or your team spends >$500/month on APIs, switching is a bargain. If you're just using Claude/GPT for small personal projects, it's not worth the hassle.

Is it well-received?

The Highlights:

  • Incredible Parameter Efficiency: Activating only 11B out of 196B parameters to match DeepSeek V3.2 (685B) provides that "punching above its weight" satisfaction.
  • Truly Local: Tested at ~44 tok/s on Mac Ultra; MLX Q6.5 quantization maintains 96.95% token accuracy.
  • Genuine Open Source: They aren't just releasing weights; the Steptron training framework, SFT data, and RLVR code are all planned for open source.

What users are saying:

"first local LLM in the 200B parameter range that is usable with a CLI harness, best experience with a local LLM doing agentic coding" -- HackerNews User

"pretty fast and smart enough to handle most things" -- @hung-truong Blog Review

"StepFun is further expanding the boundaries of open source. Besides the final and base models, they've open-sourced the Steptron training framework and base-midtrain models." -- @dddanielwang

The Complaints:

When used with OpenClaw, it "seems to freeze up a lot and is generally unreliable" -- @hung-truong

Tool calling isn't perfect out of the box and is incompatible with frameworks like Claude Code -- NVIDIA Developer Forums


For Independent Developers

Tech Stack

  • Architecture: Sparse Mixture of Experts (MoE)
    • 196B total parameters, only 11B activated per token.
    • 288 routed experts per layer + 1 shared expert (always active).
    • Top-8 expert selection.
    • 45 layers, hidden size 4096, vocabulary of 128,896.
  • Inference Acceleration: 3-way Multi-Token Prediction (MTP-3)
    • Uses MTP in both training and inference (rare).
    • Predicts 4 tokens in a single forward pass.
    • Real-world 100-300 tok/s, peaking at 350 tok/s for coding tasks.
  • Context: 256K tokens, 3:1 Sliding Window Attention.
  • Quantized Deployment: Supports GGUF/INT4; MLX Q6.5 runs on Mac Ultra.

Core Implementation

Step 3.5 Flash's core innovation is "intelligence density"—using a scalable RL framework to continuously self-improve Agent capabilities. It integrates Python code execution within Chain-of-Thought reasoning, achieving a 99.8 on AIME 2025. It also features the DockSmith + Session-Router system, covering Agent scenarios across 50K environments, 15K repos, and 20+ programming languages.

Open Source Status

Business Model

  • Monetization: Free open-source model → Paid API platform → Enterprise deployment services.
  • API Pricing: $0.10/M input tokens, $0.30/M output tokens (5x cheaper than Gemini 3.1 Flash-Lite).
  • OpenRouter Free Trial: Currently offering free API quotas.

Giant Risk

There is risk, but it's not fatal. Google Gemini 3 Flash ($0.50/$3.00) and GPT-5.3 Instant are direct competitors, but Step 3.5 Flash differentiates itself by being: 1) Fully open and locally deployable; 2) Extremely parameter-efficient; 3) Apache 2.0 unrestricted commercial use. However, open-source giants like Qwen 3.5 and DeepSeek V3.2 are also in the S-Tier, making for fierce competition.


For Product Managers

Pain Point Analysis

  • Problem Solved: High cost of closed APIs + weak logic in open models + the need for fast, smart models in Agent scenarios.
  • Severity: High frequency + essential need. Enterprises and devs spend hundreds to thousands monthly on APIs, and sending data to third parties poses privacy risks.

User Persona

  • Core Users: AI Agent developers, startup CTOs, active open-source contributors.
  • Extended Users: AI researchers (studying MoE and RL), cost-conscious SMEs.
  • Use Cases: Code generation/review, automated testing, Agent orchestration, long document analysis.

Feature Breakdown

FeatureTypeDescription
Code Reasoning (LCB 86.4)CoreSuperior performance in coding tasks
Math Reasoning (AIME 97.3)CoreFrontier logic and mathematical capabilities
Agent Tool CallingCoreOptimized specifically for agentic workflows
256K Long ContextCoreHandles large codebases/documents
Local DeploymentNice-to-haveRequires high-end hardware
Deep ResearchNice-to-have65.27%, close to OpenAI/Gemini Deep Research

Competitive Differentiation

DimensionStep 3.5 FlashGemini 3 FlashDeepSeek V3.2Qwen 3.5
Parameters196B/11B activeClosed685B/37B active397B
Open SourceApache 2.0ClosedMITOpen
API Price$0.10/$0.30$0.50/$3.00Open/Self-hostOpen/Self-host
Math (AIME)97.3-89.3-
Code (LCB)86.4--83.6
Local RunM4 Max capableNoNeeds larger gearNeeds larger gear

Key Takeaways

  1. "Intelligence Density" Positioning: Don't compete on parameter count; compete on intelligence per parameter. This narrative is very clever.
  2. Differentiation through Openness: Open-sourcing the training recipe, not just weights, builds deep community trust.
  3. Binding with Agent Platforms (OpenClaw): Joint promotion of model + platform creates a synergistic ecosystem.

For Tech Bloggers

Founder Story

  • Founder: Jiang Daxin.
  • Background: Former Microsoft Global VP with years of deep expertise at Microsoft.
  • Motivation: Deeply moved by the appearance of ChatGPT, he felt the AGI window was now and left Microsoft to start his own venture.
  • Luxury Team:
    • Qi Yin (Co-founder of Megvii) joined as Chairman.
    • Zhang Xiangyu (Co-author of ResNet) as Chief Scientist.
    • Zhu Yibo (ex-Microsoft/ByteDance/Google) as CTO.
    • Jiao Binxing (ex-Microsoft Bing core team lead) as Head of Data.
  • Mission: "Step into Intelligence, 10x every person's potential."

Controversies / Discussion Angles

  • Benchmark Padding?: Some on HackerNews have questioned benchmark cherry-picking and called for third-party verification.
  • Open vs. Closed War: Performance parity is expected in 2026 Q2; Step 3.5 Flash is a landmark event in this race.
  • Chinese AI Going Global: StepFun plans a HK IPO with $700M in Series B+ funding; the international path for Chinese AI unicorns.
  • The "Small Model, Big Wisdom" Route: Is 196B total with only 11B active the new direction for model development?

Hype Data

  • PH: 101 votes.
  • HackerNews: Active discussion threads.
  • Twitter/X: Recommended by OpenRouter, discussed in the Chinese AI community, and featured by Turkish tech bloggers.
  • Sebastian Raschka (renowned ML author) included it in "Top 10 Open-Source LLM Architectures of 2026."
  • NVIDIA NIM and SiliconFlow platforms are already live.

Content Suggestions

  • Angles to write: "Why you won't pay API fees for Agents in 2026" / "The frontier model that runs on a single Mac."
  • Trend-jacking: The coming parity between open and closed source; Step 3.5 Flash is the poster child.

For Early Adopters

Pricing Analysis

TierPriceFeaturesIs it enough?
Self-hostedFreeAll featuresRequires high-end hardware
OpenRouter Free$0Limited quotaEnough for testing
StepFun API$0.10/$0.30 per 1M tokensFull APIExtremely cheap

Getting Started Guide

  • Fastest Way: OpenRouter free API; register and use in 5 minutes.
  • Local Deployment: Requires Mac Studio M4 Max (~150GB RAM) or NVIDIA DGX Spark.
  • Agent Frameworks: OpenClaw has an official Cookbook, though stability is still being refined.
  • Learning Curve: Low (API call) / Medium (Local deployment) / High (Fine-tuning/Dev).
  • Steps:
    1. Register at OpenRouter and get an API Key.
    2. Configure your Agent framework (Base URL + Model ID: step-3.5-flash).
    3. Set context window to 256K.
    4. For local runs, download the GGUF/MLX version from HuggingFace.

Pitfalls and Complaints

  1. Imperfect Tool Calling: Out-of-the-box incompatibility with many Agent frameworks (like Claude Code); requires manual tuning.
  2. Verbose Inference: Requires more tokens than Gemini 3.0 Pro to reach the same quality.
  3. Unstable in Non-Code Scenarios: Performance may fluctuate during distribution shifts.
  4. Poor OpenClaw Experience: Frequent freezes, though this might be an OpenClaw issue rather than the model itself.

Security and Privacy

  • Data Storage: Self-hosting is completely local; data never leaves your machine.
  • Open Audit: Code is fully open-source and can be audited by anyone.
  • Apache 2.0: No usage restrictions; commercial-ready.

Alternatives

AlternativeAdvantageDisadvantage
DeepSeek V3.2MIT license, all-rounder, larger community685B parameters are hardware-heavy
Qwen 3.5Top GPQA scores, strong coding397B parameters
Qwen3-Coder-Next3B active, ultra-efficient for codeWeak general capabilities
Gemini 3 FlashNo self-hosting needed, stable APIClosed source, ongoing costs

For Investors

Market Analysis

  • Sector Size: US open-source AI model market at $5.19B, global CAGR of 15.1%.
  • LLM Market: North America projected to reach $105.5B by 2030.
  • Trends: 63% of enterprises use open-source AI; open source accounts for 62.8% of models. Parity with closed source expected by Q2 2026.
  • Drivers: API cost pressure, data privacy regulations, and efficiency gains (MoE architecture).

Competitive Landscape

TierPlayersPositioning
Top ClosedOpenAI (GPT-5), Google (Gemini 3), Anthropic (Claude)Strongest performance, high cost
Top OpenDeepSeek, Qwen (Alibaba), GLM (Zhipu)Large-parameter all-rounders
Efficiency SchoolStepFun (Step 3.5 Flash), Mistral, NVIDIA NemotronParameter efficiency first
Vertical OpenQwen3-Coder, Kimi K2Focused on code/verticals

Timing Analysis

  • Why Now?:
    1. MoE architecture has matured, enabling "Small Model, Big Wisdom."
    2. The open-closed gap has narrowed to 7 points (out of 100); parity is imminent.
    3. The Agent wave requires high-performance, low-cost inference engines.
    4. Consumer hardware (M4 Max/DGX Spark) is now capable of running 200B-class models.
  • Tech Maturity: High. MoE + MTP + SWA are all proven technologies.
  • Market Readiness: High. Enterprise anxiety over API costs and privacy is driving local deployment demand.

Team Background

  • Founder: Jiang Daxin, former Microsoft Global VP.
  • Chairman: Qi Yin, Co-founder of Megvii.
  • Chief Scientist: Zhang Xiangyu, co-author of ResNet.
  • Core Team: Senior engineers from Microsoft, ByteDance, and Google.
  • Track Record: Already released 22 self-developed foundation models (including 16 multimodal models).

Funding Status

  • Latest Round: Series B+, over 5 billion RMB (~$700M), Jan 2026.
    • Set a 12-month record for a single funding round in the Chinese AI sector.
  • Investors: Tencent, Qiming Venture Partners, 5Y Capital, Shanghai State-owned Assets, China Life Private Equity, etc.
  • IPO Plan: Planning a Hong Kong listing, expected to raise ~$500M (within 2026).
  • Positioning: One of China's "Six AI Tigers."

Conclusion

In a nutshell: Step 3.5 Flash proves that "models don't need to be huge, just smart." Activating only 11B out of 196B parameters to hit S-Tier results makes it the benchmark for the 2026 open-source AI efficiency race.

User TypeRecommendation
DevelopersWorth Watching -- Apache 2.0 + High Performance + Free API. A great choice for Agent dev, but watch out for tool calling quirks.
Product ManagersWorth Studying -- The "intelligence density" positioning and deep open-source strategy are excellent case studies.
BloggersGreat Content -- Chinese AI global expansion + Open vs Closed war + Founder story; plenty of angles.
Early AdoptersTry with Caution -- Start with the OpenRouter free API; local deployment requires high-end gear.
InvestorsWatch the IPO -- $700M Series B+ and HK IPO plans make StepFun a key target in the Chinese AI race.

Resource Links

ResourceLink
Official Websitehttps://www.stepfun.com
GitHubhttps://github.com/stepfun-ai/Step-3.5-Flash
HuggingFacehttps://huggingface.co/stepfun-ai/Step-3.5-Flash
arXiv Paperhttps://arxiv.org/html/2602.10604v1
Official Bloghttps://static.stepfun.com/blog/step-3.5-flash/
API Platformhttps://platform.stepfun.com
OpenRouter (Free)https://openrouter.ai/stepfun/step-3.5-flash:free
NVIDIA NIMhttps://build.nvidia.com/stepfun-ai/step-3.5-flash/modelcard
ProductHunthttps://www.producthunt.com/products/step-3-5-flash

2026-03-06 | Trend-Tracker v7.3

One-line Verdict

Step 3.5 Flash is the 2026 benchmark for 'doing more with less' in the open-source community. With its extreme intelligence density and Apache 2.0 license, it has become a top choice for Agent development and local deployment.

FAQ

Frequently Asked Questions about Step 3 5 Flash

An ultra-efficient open-source LLM from StepFun, built on MoE architecture and specifically optimized for Agents.

The main features of Step 3 5 Flash include: LiveCodeBench 86.4 for code reasoning, AIME 97.3 for mathematical reasoning, 256K long context window, Tool calling optimized specifically for Agents.

Ultra-low API pricing ($0.10/M input), free credits via OpenRouter, and completely free for self-hosting.

AI Agent developers, cost-sensitive independent devs, privacy-conscious enterprises, and AI researchers.

Alternatives to Step 3 5 Flash include: Gemini 3 Flash, DeepSeek V3.2, Qwen 3.5.

Data source: ProductHuntMar 6, 2026
Last updated: