What is Step 3 5 Flash?

An ultra-efficient open-source LLM from StepFun, built on MoE architecture and specifically optimized for Agents.

What are the main features of Step 3 5 Flash?

The main features of Step 3 5 Flash include: LiveCodeBench 86.4 for code reasoning, AIME 97.3 for mathematical reasoning, 256K long context window, Tool calling optimized specifically for Agents.

How much does Step 3 5 Flash cost?

Ultra-low API pricing ($0.10/M input), free credits via OpenRouter, and completely free for self-hosting.

Who is Step 3 5 Flash for?

AI Agent developers, cost-sensitive independent devs, privacy-conscious enterprises, and AI researchers.

What are the alternatives to Step 3 5 Flash?

Alternatives to Step 3 5 Flash include: Gemini 3 Flash, DeepSeek V3.2, Qwen 3.5.

Step 3.5 Flash: The Masterpiece of "Doing More with Less" in Open Source

2026-03-06 | ProductHunt | GitHub | Official Blog

Step 3.5 Flash Intelligence Density Comparison

This scatter plot is the most convincing evidence for Step 3.5 Flash: the horizontal axis represents total parameters, and the vertical axis represents comprehensive performance. Step 3.5 Flash uses 196B parameters (with only 11B active) to achieve scores comparable to 100B-level closed-source models, leading the pack in "intelligence density."

30-Second Quick Judgment

What is it: An open-source LLM released by StepFun that uses a MoE architecture to compress 196B parameters into 11B active parameters during runtime. It is specifically optimized for Agent scenarios and can run on a Mac Studio M4 Max.

Is it worth watching?: Yes. It is one of the most "parameter-efficient" players in the open-source field. It scored 97.3 in AIME 2025 math reasoning (tied for first with GLM-4.7) and 86.4 on LiveCodeBench. Plus, it features an Apache 2.0 license and a free API on OpenRouter. If you are developing Agents or want to escape API fees, this is a serious contender.

Three Questions That Matter

Is it relevant to me?

Target Users:

AI Agent developers (who need reliable tool calling)
Independent devs/startups sensitive to API costs
Companies focused on data privacy wanting local deployment
Open-source contributors and AI researchers

You are the target user if:

Your monthly API bill exceeds $100 and you want a free alternative.
You are building coding agents/automation tools and need fast inference.
You don't want to send your code to third-party APIs and prefer running locally.
You are researching MoE architectures or RL training methods.

Who is this NOT for?:

Casual users just looking for chat/writing (Claude or GPT are more intuitive).
Users without high-end hardware (M4 Max / DGX Spark); it won't run well.
Those needing multimodal capabilities (image/video); Step 3.5 Flash is text-only.

Is it useful to me?

Dimension	Benefit	Cost
Time	Agent task inference at 100-350 tok/s, several times faster than most local models	Setup and debugging takes half a day to a full day
Money	Zero API fees for self-hosting; free trial on OpenRouter	High hardware barrier: M4 Max ~$4000+ or DGX Spark
Effort	256K context window handles long code files at once	Tool calling compatibility has some "gotchas" to navigate

ROI Judgment: If you already have the right hardware or your team spends >$500/month on APIs, switching is a bargain. If you're just using Claude/GPT for small personal projects, it's not worth the hassle.

Is it well-received?

The Highlights:

Incredible Parameter Efficiency: Activating only 11B out of 196B parameters to match DeepSeek V3.2 (685B) provides that "punching above its weight" satisfaction.
Truly Local: Tested at ~44 tok/s on Mac Ultra; MLX Q6.5 quantization maintains 96.95% token accuracy.
Genuine Open Source: They aren't just releasing weights; the Steptron training framework, SFT data, and RLVR code are all planned for open source.

What users are saying:

"first local LLM in the 200B parameter range that is usable with a CLI harness, best experience with a local LLM doing agentic coding" -- HackerNews User

"pretty fast and smart enough to handle most things" -- @hung-truong Blog Review

"StepFun is further expanding the boundaries of open source. Besides the final and base models, they've open-sourced the Steptron training framework and base-midtrain models." -- @dddanielwang

The Complaints:

When used with OpenClaw, it "seems to freeze up a lot and is generally unreliable" -- @hung-truong

Tool calling isn't perfect out of the box and is incompatible with frameworks like Claude Code -- NVIDIA Developer Forums

For Independent Developers

Tech Stack

Architecture: Sparse Mixture of Experts (MoE)
- 196B total parameters, only 11B activated per token.
- 288 routed experts per layer + 1 shared expert (always active).
- Top-8 expert selection.
- 45 layers, hidden size 4096, vocabulary of 128,896.
Inference Acceleration: 3-way Multi-Token Prediction (MTP-3)
- Uses MTP in both training and inference (rare).
- Predicts 4 tokens in a single forward pass.
- Real-world 100-300 tok/s, peaking at 350 tok/s for coding tasks.
Context: 256K tokens, 3:1 Sliding Window Attention.
Quantized Deployment: Supports GGUF/INT4; MLX Q6.5 runs on Mac Ultra.

Core Implementation

Step 3.5 Flash's core innovation is "intelligence density"—using a scalable RL framework to continuously self-improve Agent capabilities. It integrates Python code execution within Chain-of-Thought reasoning, achieving a 99.8 on AIME 2025. It also features the DockSmith + Session-Router system, covering Agent scenarios across 50K environments, 15K repos, and 20+ programming languages.

Open Source Status

Is it open?: Yes, Apache 2.0 license (one of the most permissive commercial-friendly licenses).
Depth of Open Source: Model weights + Steptron training framework + SFT data + RLVR + Eval (rolling out).
GitHub: https://github.com/stepfun-ai/Step-3.5-Flash
HuggingFace: https://huggingface.co/stepfun-ai/Step-3.5-Flash
arXiv Paper: https://arxiv.org/html/2602.10604v1
Build-it-yourself difficulty: Extremely high. Requires a 10k-GPU cluster + massive training data + MoE engineering expertise; estimated 50+ person-years.

Business Model

Monetization: Free open-source model → Paid API platform → Enterprise deployment services.
API Pricing: $0.10/M input tokens, $0.30/M output tokens (5x cheaper than Gemini 3.1 Flash-Lite).
OpenRouter Free Trial: Currently offering free API quotas.

Giant Risk

There is risk, but it's not fatal. Google Gemini 3 Flash ($0.50/$3.00) and GPT-5.3 Instant are direct competitors, but Step 3.5 Flash differentiates itself by being: 1) Fully open and locally deployable; 2) Extremely parameter-efficient; 3) Apache 2.0 unrestricted commercial use. However, open-source giants like Qwen 3.5 and DeepSeek V3.2 are also in the S-Tier, making for fierce competition.

For Product Managers

Pain Point Analysis

Problem Solved: High cost of closed APIs + weak logic in open models + the need for fast, smart models in Agent scenarios.
Severity: High frequency + essential need. Enterprises and devs spend hundreds to thousands monthly on APIs, and sending data to third parties poses privacy risks.

User Persona

Core Users: AI Agent developers, startup CTOs, active open-source contributors.
Extended Users: AI researchers (studying MoE and RL), cost-conscious SMEs.
Use Cases: Code generation/review, automated testing, Agent orchestration, long document analysis.

Feature Breakdown

Feature	Type	Description
Code Reasoning (LCB 86.4)	Core	Superior performance in coding tasks
Math Reasoning (AIME 97.3)	Core	Frontier logic and mathematical capabilities
Agent Tool Calling	Core	Optimized specifically for agentic workflows
256K Long Context	Core	Handles large codebases/documents
Local Deployment	Nice-to-have	Requires high-end hardware
Deep Research	Nice-to-have	65.27%, close to OpenAI/Gemini Deep Research

Competitive Differentiation

Dimension	Step 3.5 Flash	Gemini 3 Flash	DeepSeek V3.2	Qwen 3.5
Parameters	196B/11B active	Closed	685B/37B active	397B
Open Source	Apache 2.0	Closed	MIT	Open
API Price	$0.10/$0.30	$0.50/$3.00	Open/Self-host	Open/Self-host
Math (AIME)	97.3	-	89.3	-
Code (LCB)	86.4	-	-	83.6
Local Run	M4 Max capable	No	Needs larger gear	Needs larger gear

Key Takeaways

"Intelligence Density" Positioning: Don't compete on parameter count; compete on intelligence per parameter. This narrative is very clever.
Differentiation through Openness: Open-sourcing the training recipe, not just weights, builds deep community trust.
Binding with Agent Platforms (OpenClaw): Joint promotion of model + platform creates a synergistic ecosystem.

For Tech Bloggers

Founder Story

Founder: Jiang Daxin.
Background: Former Microsoft Global VP with years of deep expertise at Microsoft.
Motivation: Deeply moved by the appearance of ChatGPT, he felt the AGI window was now and left Microsoft to start his own venture.
Luxury Team:
- Qi Yin (Co-founder of Megvii) joined as Chairman.
- Zhang Xiangyu (Co-author of ResNet) as Chief Scientist.
- Zhu Yibo (ex-Microsoft/ByteDance/Google) as CTO.
- Jiao Binxing (ex-Microsoft Bing core team lead) as Head of Data.
Mission: "Step into Intelligence, 10x every person's potential."

Controversies / Discussion Angles

Benchmark Padding?: Some on HackerNews have questioned benchmark cherry-picking and called for third-party verification.
Open vs. Closed War: Performance parity is expected in 2026 Q2; Step 3.5 Flash is a landmark event in this race.
Chinese AI Going Global: StepFun plans a HK IPO with $700M in Series B+ funding; the international path for Chinese AI unicorns.
The "Small Model, Big Wisdom" Route: Is 196B total with only 11B active the new direction for model development?

Hype Data

PH: 101 votes.
HackerNews: Active discussion threads.
Twitter/X: Recommended by OpenRouter, discussed in the Chinese AI community, and featured by Turkish tech bloggers.
Sebastian Raschka (renowned ML author) included it in "Top 10 Open-Source LLM Architectures of 2026."
NVIDIA NIM and SiliconFlow platforms are already live.

Content Suggestions

Angles to write: "Why you won't pay API fees for Agents in 2026" / "The frontier model that runs on a single Mac."
Trend-jacking: The coming parity between open and closed source; Step 3.5 Flash is the poster child.

For Early Adopters

Pricing Analysis

Tier	Price	Features	Is it enough?
Self-hosted	Free	All features	Requires high-end hardware
OpenRouter Free	$0	Limited quota	Enough for testing
StepFun API	$0.10/$0.30 per 1M tokens	Full API	Extremely cheap

Getting Started Guide

Fastest Way: OpenRouter free API; register and use in 5 minutes.
Local Deployment: Requires Mac Studio M4 Max (~150GB RAM) or NVIDIA DGX Spark.
Agent Frameworks: OpenClaw has an official Cookbook, though stability is still being refined.
Learning Curve: Low (API call) / Medium (Local deployment) / High (Fine-tuning/Dev).
Steps:
1. Register at OpenRouter and get an API Key.
2. Configure your Agent framework (Base URL + Model ID: step-3.5-flash).
3. Set context window to 256K.
4. For local runs, download the GGUF/MLX version from HuggingFace.

Pitfalls and Complaints

Imperfect Tool Calling: Out-of-the-box incompatibility with many Agent frameworks (like Claude Code); requires manual tuning.
Verbose Inference: Requires more tokens than Gemini 3.0 Pro to reach the same quality.
Unstable in Non-Code Scenarios: Performance may fluctuate during distribution shifts.
Poor OpenClaw Experience: Frequent freezes, though this might be an OpenClaw issue rather than the model itself.

Security and Privacy

Data Storage: Self-hosting is completely local; data never leaves your machine.
Open Audit: Code is fully open-source and can be audited by anyone.
Apache 2.0: No usage restrictions; commercial-ready.

Alternatives

Alternative	Advantage	Disadvantage
DeepSeek V3.2	MIT license, all-rounder, larger community	685B parameters are hardware-heavy
Qwen 3.5	Top GPQA scores, strong coding	397B parameters
Qwen3-Coder-Next	3B active, ultra-efficient for code	Weak general capabilities
Gemini 3 Flash	No self-hosting needed, stable API	Closed source, ongoing costs

For Investors

Market Analysis

Sector Size: US open-source AI model market at $5.19B, global CAGR of 15.1%.
LLM Market: North America projected to reach $105.5B by 2030.
Trends: 63% of enterprises use open-source AI; open source accounts for 62.8% of models. Parity with closed source expected by Q2 2026.
Drivers: API cost pressure, data privacy regulations, and efficiency gains (MoE architecture).

Competitive Landscape

Tier	Players	Positioning
Top Closed	OpenAI (GPT-5), Google (Gemini 3), Anthropic (Claude)	Strongest performance, high cost
Top Open	DeepSeek, Qwen (Alibaba), GLM (Zhipu)	Large-parameter all-rounders
Efficiency School	StepFun (Step 3.5 Flash), Mistral, NVIDIA Nemotron	Parameter efficiency first
Vertical Open	Qwen3-Coder, Kimi K2	Focused on code/verticals

Timing Analysis

Why Now?:
1. MoE architecture has matured, enabling "Small Model, Big Wisdom."
2. The open-closed gap has narrowed to 7 points (out of 100); parity is imminent.
3. The Agent wave requires high-performance, low-cost inference engines.
4. Consumer hardware (M4 Max/DGX Spark) is now capable of running 200B-class models.
Tech Maturity: High. MoE + MTP + SWA are all proven technologies.
Market Readiness: High. Enterprise anxiety over API costs and privacy is driving local deployment demand.

Team Background

Founder: Jiang Daxin, former Microsoft Global VP.
Chairman: Qi Yin, Co-founder of Megvii.
Chief Scientist: Zhang Xiangyu, co-author of ResNet.
Core Team: Senior engineers from Microsoft, ByteDance, and Google.
Track Record: Already released 22 self-developed foundation models (including 16 multimodal models).

Funding Status

Latest Round: Series B+, over 5 billion RMB (~$700M), Jan 2026.
- Set a 12-month record for a single funding round in the Chinese AI sector.
Investors: Tencent, Qiming Venture Partners, 5Y Capital, Shanghai State-owned Assets, China Life Private Equity, etc.
IPO Plan: Planning a Hong Kong listing, expected to raise ~$500M (within 2026).
Positioning: One of China's "Six AI Tigers."

Conclusion

In a nutshell: Step 3.5 Flash proves that "models don't need to be huge, just smart." Activating only 11B out of 196B parameters to hit S-Tier results makes it the benchmark for the 2026 open-source AI efficiency race.

User Type	Recommendation
Developers	Worth Watching -- Apache 2.0 + High Performance + Free API. A great choice for Agent dev, but watch out for tool calling quirks.
Product Managers	Worth Studying -- The "intelligence density" positioning and deep open-source strategy are excellent case studies.
Bloggers	Great Content -- Chinese AI global expansion + Open vs Closed war + Founder story; plenty of angles.
Early Adopters	Try with Caution -- Start with the OpenRouter free API; local deployment requires high-end gear.
Investors	Watch the IPO -- $700M Series B+ and HK IPO plans make StepFun a key target in the Chinese AI race.

Resource Links

Resource	Link
Official Website	https://www.stepfun.com
GitHub	https://github.com/stepfun-ai/Step-3.5-Flash
HuggingFace	https://huggingface.co/stepfun-ai/Step-3.5-Flash
arXiv Paper	https://arxiv.org/html/2602.10604v1
Official Blog	https://static.stepfun.com/blog/step-3.5-flash/
API Platform	https://platform.stepfun.com
OpenRouter (Free)	https://openrouter.ai/stepfun/step-3.5-flash:free
NVIDIA NIM	https://build.nvidia.com/stepfun-ai/step-3.5-flash/modelcard
ProductHunt	https://www.producthunt.com/products/step-3-5-flash

2026-03-06 | Trend-Tracker v7.3

Step 3 5 Flash