Step 3.5 Flash: The Masterpiece of "Doing More with Less" in Open Source
2026-03-06 | ProductHunt | GitHub | Official Blog

This scatter plot is the most convincing evidence for Step 3.5 Flash: the horizontal axis represents total parameters, and the vertical axis represents comprehensive performance. Step 3.5 Flash uses 196B parameters (with only 11B active) to achieve scores comparable to 100B-level closed-source models, leading the pack in "intelligence density."
30-Second Quick Judgment
What is it: An open-source LLM released by StepFun that uses a MoE architecture to compress 196B parameters into 11B active parameters during runtime. It is specifically optimized for Agent scenarios and can run on a Mac Studio M4 Max.
Is it worth watching?: Yes. It is one of the most "parameter-efficient" players in the open-source field. It scored 97.3 in AIME 2025 math reasoning (tied for first with GLM-4.7) and 86.4 on LiveCodeBench. Plus, it features an Apache 2.0 license and a free API on OpenRouter. If you are developing Agents or want to escape API fees, this is a serious contender.
Three Questions That Matter
Is it relevant to me?
Target Users:
- AI Agent developers (who need reliable tool calling)
- Independent devs/startups sensitive to API costs
- Companies focused on data privacy wanting local deployment
- Open-source contributors and AI researchers
You are the target user if:
- Your monthly API bill exceeds $100 and you want a free alternative.
- You are building coding agents/automation tools and need fast inference.
- You don't want to send your code to third-party APIs and prefer running locally.
- You are researching MoE architectures or RL training methods.
Who is this NOT for?:
- Casual users just looking for chat/writing (Claude or GPT are more intuitive).
- Users without high-end hardware (M4 Max / DGX Spark); it won't run well.
- Those needing multimodal capabilities (image/video); Step 3.5 Flash is text-only.
Is it useful to me?
| Dimension | Benefit | Cost |
|---|---|---|
| Time | Agent task inference at 100-350 tok/s, several times faster than most local models | Setup and debugging takes half a day to a full day |
| Money | Zero API fees for self-hosting; free trial on OpenRouter | High hardware barrier: M4 Max ~$4000+ or DGX Spark |
| Effort | 256K context window handles long code files at once | Tool calling compatibility has some "gotchas" to navigate |
ROI Judgment: If you already have the right hardware or your team spends >$500/month on APIs, switching is a bargain. If you're just using Claude/GPT for small personal projects, it's not worth the hassle.
Is it well-received?
The Highlights:
- Incredible Parameter Efficiency: Activating only 11B out of 196B parameters to match DeepSeek V3.2 (685B) provides that "punching above its weight" satisfaction.
- Truly Local: Tested at ~44 tok/s on Mac Ultra; MLX Q6.5 quantization maintains 96.95% token accuracy.
- Genuine Open Source: They aren't just releasing weights; the Steptron training framework, SFT data, and RLVR code are all planned for open source.
What users are saying:
"first local LLM in the 200B parameter range that is usable with a CLI harness, best experience with a local LLM doing agentic coding" -- HackerNews User
"pretty fast and smart enough to handle most things" -- @hung-truong Blog Review
"StepFun is further expanding the boundaries of open source. Besides the final and base models, they've open-sourced the Steptron training framework and base-midtrain models." -- @dddanielwang
The Complaints:
When used with OpenClaw, it "seems to freeze up a lot and is generally unreliable" -- @hung-truong
Tool calling isn't perfect out of the box and is incompatible with frameworks like Claude Code -- NVIDIA Developer Forums
For Independent Developers
Tech Stack
- Architecture: Sparse Mixture of Experts (MoE)
- 196B total parameters, only 11B activated per token.
- 288 routed experts per layer + 1 shared expert (always active).
- Top-8 expert selection.
- 45 layers, hidden size 4096, vocabulary of 128,896.
- Inference Acceleration: 3-way Multi-Token Prediction (MTP-3)
- Uses MTP in both training and inference (rare).
- Predicts 4 tokens in a single forward pass.
- Real-world 100-300 tok/s, peaking at 350 tok/s for coding tasks.
- Context: 256K tokens, 3:1 Sliding Window Attention.
- Quantized Deployment: Supports GGUF/INT4; MLX Q6.5 runs on Mac Ultra.
Core Implementation
Step 3.5 Flash's core innovation is "intelligence density"—using a scalable RL framework to continuously self-improve Agent capabilities. It integrates Python code execution within Chain-of-Thought reasoning, achieving a 99.8 on AIME 2025. It also features the DockSmith + Session-Router system, covering Agent scenarios across 50K environments, 15K repos, and 20+ programming languages.
Open Source Status
- Is it open?: Yes, Apache 2.0 license (one of the most permissive commercial-friendly licenses).
- Depth of Open Source: Model weights + Steptron training framework + SFT data + RLVR + Eval (rolling out).
- GitHub: https://github.com/stepfun-ai/Step-3.5-Flash
- HuggingFace: https://huggingface.co/stepfun-ai/Step-3.5-Flash
- arXiv Paper: https://arxiv.org/html/2602.10604v1
- Build-it-yourself difficulty: Extremely high. Requires a 10k-GPU cluster + massive training data + MoE engineering expertise; estimated 50+ person-years.
Business Model
- Monetization: Free open-source model → Paid API platform → Enterprise deployment services.
- API Pricing: $0.10/M input tokens, $0.30/M output tokens (5x cheaper than Gemini 3.1 Flash-Lite).
- OpenRouter Free Trial: Currently offering free API quotas.
Giant Risk
There is risk, but it's not fatal. Google Gemini 3 Flash ($0.50/$3.00) and GPT-5.3 Instant are direct competitors, but Step 3.5 Flash differentiates itself by being: 1) Fully open and locally deployable; 2) Extremely parameter-efficient; 3) Apache 2.0 unrestricted commercial use. However, open-source giants like Qwen 3.5 and DeepSeek V3.2 are also in the S-Tier, making for fierce competition.
For Product Managers
Pain Point Analysis
- Problem Solved: High cost of closed APIs + weak logic in open models + the need for fast, smart models in Agent scenarios.
- Severity: High frequency + essential need. Enterprises and devs spend hundreds to thousands monthly on APIs, and sending data to third parties poses privacy risks.
User Persona
- Core Users: AI Agent developers, startup CTOs, active open-source contributors.
- Extended Users: AI researchers (studying MoE and RL), cost-conscious SMEs.
- Use Cases: Code generation/review, automated testing, Agent orchestration, long document analysis.
Feature Breakdown
| Feature | Type | Description |
|---|---|---|
| Code Reasoning (LCB 86.4) | Core | Superior performance in coding tasks |
| Math Reasoning (AIME 97.3) | Core | Frontier logic and mathematical capabilities |
| Agent Tool Calling | Core | Optimized specifically for agentic workflows |
| 256K Long Context | Core | Handles large codebases/documents |
| Local Deployment | Nice-to-have | Requires high-end hardware |
| Deep Research | Nice-to-have | 65.27%, close to OpenAI/Gemini Deep Research |
Competitive Differentiation
| Dimension | Step 3.5 Flash | Gemini 3 Flash | DeepSeek V3.2 | Qwen 3.5 |
|---|---|---|---|---|
| Parameters | 196B/11B active | Closed | 685B/37B active | 397B |
| Open Source | Apache 2.0 | Closed | MIT | Open |
| API Price | $0.10/$0.30 | $0.50/$3.00 | Open/Self-host | Open/Self-host |
| Math (AIME) | 97.3 | - | 89.3 | - |
| Code (LCB) | 86.4 | - | - | 83.6 |
| Local Run | M4 Max capable | No | Needs larger gear | Needs larger gear |
Key Takeaways
- "Intelligence Density" Positioning: Don't compete on parameter count; compete on intelligence per parameter. This narrative is very clever.
- Differentiation through Openness: Open-sourcing the training recipe, not just weights, builds deep community trust.
- Binding with Agent Platforms (OpenClaw): Joint promotion of model + platform creates a synergistic ecosystem.
For Tech Bloggers
Founder Story
- Founder: Jiang Daxin.
- Background: Former Microsoft Global VP with years of deep expertise at Microsoft.
- Motivation: Deeply moved by the appearance of ChatGPT, he felt the AGI window was now and left Microsoft to start his own venture.
- Luxury Team:
- Qi Yin (Co-founder of Megvii) joined as Chairman.
- Zhang Xiangyu (Co-author of ResNet) as Chief Scientist.
- Zhu Yibo (ex-Microsoft/ByteDance/Google) as CTO.
- Jiao Binxing (ex-Microsoft Bing core team lead) as Head of Data.
- Mission: "Step into Intelligence, 10x every person's potential."
Controversies / Discussion Angles
- Benchmark Padding?: Some on HackerNews have questioned benchmark cherry-picking and called for third-party verification.
- Open vs. Closed War: Performance parity is expected in 2026 Q2; Step 3.5 Flash is a landmark event in this race.
- Chinese AI Going Global: StepFun plans a HK IPO with $700M in Series B+ funding; the international path for Chinese AI unicorns.
- The "Small Model, Big Wisdom" Route: Is 196B total with only 11B active the new direction for model development?
Hype Data
- PH: 101 votes.
- HackerNews: Active discussion threads.
- Twitter/X: Recommended by OpenRouter, discussed in the Chinese AI community, and featured by Turkish tech bloggers.
- Sebastian Raschka (renowned ML author) included it in "Top 10 Open-Source LLM Architectures of 2026."
- NVIDIA NIM and SiliconFlow platforms are already live.
Content Suggestions
- Angles to write: "Why you won't pay API fees for Agents in 2026" / "The frontier model that runs on a single Mac."
- Trend-jacking: The coming parity between open and closed source; Step 3.5 Flash is the poster child.
For Early Adopters
Pricing Analysis
| Tier | Price | Features | Is it enough? |
|---|---|---|---|
| Self-hosted | Free | All features | Requires high-end hardware |
| OpenRouter Free | $0 | Limited quota | Enough for testing |
| StepFun API | $0.10/$0.30 per 1M tokens | Full API | Extremely cheap |
Getting Started Guide
- Fastest Way: OpenRouter free API; register and use in 5 minutes.
- Local Deployment: Requires Mac Studio M4 Max (~150GB RAM) or NVIDIA DGX Spark.
- Agent Frameworks: OpenClaw has an official Cookbook, though stability is still being refined.
- Learning Curve: Low (API call) / Medium (Local deployment) / High (Fine-tuning/Dev).
- Steps:
- Register at OpenRouter and get an API Key.
- Configure your Agent framework (Base URL + Model ID: step-3.5-flash).
- Set context window to 256K.
- For local runs, download the GGUF/MLX version from HuggingFace.
Pitfalls and Complaints
- Imperfect Tool Calling: Out-of-the-box incompatibility with many Agent frameworks (like Claude Code); requires manual tuning.
- Verbose Inference: Requires more tokens than Gemini 3.0 Pro to reach the same quality.
- Unstable in Non-Code Scenarios: Performance may fluctuate during distribution shifts.
- Poor OpenClaw Experience: Frequent freezes, though this might be an OpenClaw issue rather than the model itself.
Security and Privacy
- Data Storage: Self-hosting is completely local; data never leaves your machine.
- Open Audit: Code is fully open-source and can be audited by anyone.
- Apache 2.0: No usage restrictions; commercial-ready.
Alternatives
| Alternative | Advantage | Disadvantage |
|---|---|---|
| DeepSeek V3.2 | MIT license, all-rounder, larger community | 685B parameters are hardware-heavy |
| Qwen 3.5 | Top GPQA scores, strong coding | 397B parameters |
| Qwen3-Coder-Next | 3B active, ultra-efficient for code | Weak general capabilities |
| Gemini 3 Flash | No self-hosting needed, stable API | Closed source, ongoing costs |
For Investors
Market Analysis
- Sector Size: US open-source AI model market at $5.19B, global CAGR of 15.1%.
- LLM Market: North America projected to reach $105.5B by 2030.
- Trends: 63% of enterprises use open-source AI; open source accounts for 62.8% of models. Parity with closed source expected by Q2 2026.
- Drivers: API cost pressure, data privacy regulations, and efficiency gains (MoE architecture).
Competitive Landscape
| Tier | Players | Positioning |
|---|---|---|
| Top Closed | OpenAI (GPT-5), Google (Gemini 3), Anthropic (Claude) | Strongest performance, high cost |
| Top Open | DeepSeek, Qwen (Alibaba), GLM (Zhipu) | Large-parameter all-rounders |
| Efficiency School | StepFun (Step 3.5 Flash), Mistral, NVIDIA Nemotron | Parameter efficiency first |
| Vertical Open | Qwen3-Coder, Kimi K2 | Focused on code/verticals |
Timing Analysis
- Why Now?:
- MoE architecture has matured, enabling "Small Model, Big Wisdom."
- The open-closed gap has narrowed to 7 points (out of 100); parity is imminent.
- The Agent wave requires high-performance, low-cost inference engines.
- Consumer hardware (M4 Max/DGX Spark) is now capable of running 200B-class models.
- Tech Maturity: High. MoE + MTP + SWA are all proven technologies.
- Market Readiness: High. Enterprise anxiety over API costs and privacy is driving local deployment demand.
Team Background
- Founder: Jiang Daxin, former Microsoft Global VP.
- Chairman: Qi Yin, Co-founder of Megvii.
- Chief Scientist: Zhang Xiangyu, co-author of ResNet.
- Core Team: Senior engineers from Microsoft, ByteDance, and Google.
- Track Record: Already released 22 self-developed foundation models (including 16 multimodal models).
Funding Status
- Latest Round: Series B+, over 5 billion RMB (~$700M), Jan 2026.
- Set a 12-month record for a single funding round in the Chinese AI sector.
- Investors: Tencent, Qiming Venture Partners, 5Y Capital, Shanghai State-owned Assets, China Life Private Equity, etc.
- IPO Plan: Planning a Hong Kong listing, expected to raise ~$500M (within 2026).
- Positioning: One of China's "Six AI Tigers."
Conclusion
In a nutshell: Step 3.5 Flash proves that "models don't need to be huge, just smart." Activating only 11B out of 196B parameters to hit S-Tier results makes it the benchmark for the 2026 open-source AI efficiency race.
| User Type | Recommendation |
|---|---|
| Developers | Worth Watching -- Apache 2.0 + High Performance + Free API. A great choice for Agent dev, but watch out for tool calling quirks. |
| Product Managers | Worth Studying -- The "intelligence density" positioning and deep open-source strategy are excellent case studies. |
| Bloggers | Great Content -- Chinese AI global expansion + Open vs Closed war + Founder story; plenty of angles. |
| Early Adopters | Try with Caution -- Start with the OpenRouter free API; local deployment requires high-end gear. |
| Investors | Watch the IPO -- $700M Series B+ and HK IPO plans make StepFun a key target in the Chinese AI race. |
Resource Links
| Resource | Link |
|---|---|
| Official Website | https://www.stepfun.com |
| GitHub | https://github.com/stepfun-ai/Step-3.5-Flash |
| HuggingFace | https://huggingface.co/stepfun-ai/Step-3.5-Flash |
| arXiv Paper | https://arxiv.org/html/2602.10604v1 |
| Official Blog | https://static.stepfun.com/blog/step-3.5-flash/ |
| API Platform | https://platform.stepfun.com |
| OpenRouter (Free) | https://openrouter.ai/stepfun/step-3.5-flash:free |
| NVIDIA NIM | https://build.nvidia.com/stepfun-ai/step-3.5-flash/modelcard |
| ProductHunt | https://www.producthunt.com/products/step-3-5-flash |
2026-03-06 | Trend-Tracker v7.3