An Alibaba-backed 397B parameter (MoE) open-source model supporting native multimodal and desktop agent operations under the Apache 2.0 license.

What are the main features of Qwen3.5?

The main features of Qwen3.5 include: 1 million token long context window, Native Visual Agent capabilities, Support for 201 languages, Thinking/Non-Thinking dual modes.

How much does Qwen3.5 cost?

Open-source version is free; Plus API is ~$0.40/1M tokens; Max-Thinking is ~$1.20/1M tokens.

AI application developers, enterprise IT teams, global product teams, and automation engineers.

What are the alternatives to Qwen3.5?

Alternatives to Qwen3.5 include: GPT-5.2, Claude Opus 4.5, Gemini 3 Flash, DeepSeek V3.2..

Qwen3.5: The "Price Assassin" of Open-Source LLMs is Here

2026-02-17 | ProductHunt | GitHub | PH 151 Votes

30-Second Quick Judgment

What is it?: An open-source LLM by Alibaba Cloud with 397B parameters, but only 17B are active at any time (like a team of 512 experts where only 11 are assigned to each question). It can see images, watch videos, and even operate a computer desktop. Apache 2.0 license, free for commercial use.

Is it worth your attention?: Absolutely. If you're developing with GPT-4 or Claude, Qwen3.5's API price is 1/5 to 1/37 of theirs. If you have the GPU resources, you can download and run it for free. This isn't just "another Chinese model"; it actually beats GPT-5.2 and Claude Opus 4.5 on several benchmarks.

Three Questions for Me

Is it relevant to me?

Target Users:

AI App Developers (need cheap, high-quality model APIs)
Enterprise IT Teams (want to self-host models to keep data private)
Multi-language Scenarios (supports 201 languages, exceptionally strong in Chinese)
Agent/Automation Developers (native support for tool calling and desktop control)

Am I the target?: You are if any of the following apply:

You use OpenAI/Anthropic APIs but find the monthly bill too high.
You want to build AI Agents that can operate a computer to complete tasks.
You build multi-language products and need a model strong in both English and Chinese.
You have GPU servers and want to run an unrestricted open-source model.

When would I use it?:

Code generation and refactoring -> Use this (LiveCodeBench score of 83.6, human competition level).
Long document analysis and summarization -> Use this (1 million token context window).
Operating desktop software for you -> Use this (native Visual Agent capabilities).
Need extremely stable production environment debugging -> Consider Claude (Qwen's debugging is still slightly less stable).

Is it useful to me?

Dimension	Benefit	Cost
Time	Multimodal + Agent in one model; no need to stitch multiple APIs.	Learning a new API format (though it's OpenAI-compatible, so cost is low).
Money	API price ~$0.40/1M tokens, 5x cheaper than GPT-4.1; self-hosting is free.	Self-hosting requires 3-4 80GB GPUs (~$15K hardware).
Effort	Open source + Apache 2.0; modify it however you want without permission.	Confusing naming (3.5/Plus/Max); need to figure out which one to pick.

ROI Judgment: If your monthly API spend exceeds $100, switching to Qwen3.5-Plus can save you 60-80% immediately. If you have idle GPU servers, the ROI of self-hosting is nearly infinite. The learning curve is minimal because it's compatible with the OpenAI API format—just change the base_url.

Is it exciting?

The "Wow" Factors:

Price Assassin: $0.40 vs. Claude's $15. Get the same job done for 1/37th of the cost.
1 Million Token Context: Throw an entire codebase in and ask everything at once.
Visual Agent: Give it a desktop screenshot, and it can plan and execute steps—an open-source alternative to Claude's Computer Use.
Unrestricted Open Source: Apache 2.0. Change it, sell it, do whatever you want.

Real User Feedback:

"A flagship open-weight model. It's particularly strong in search, synthesis, low hallucination, and handling long context." — Latent Space

"If you're building real systems, you care about three things: capability, iteration cost, and how often the model makes you say 'Why did you do that?'" — AnalyticsVidhya Test

"Great at writing new code, but prone to errors when debugging or modifying existing code." — Reddit Developer Community

For Independent Developers

Tech Stack

This is the most hardcore part of the Qwen3.5 architecture:

Core Architecture: Sparse MoE (Mixture-of-Experts) with 512 experts; only 10 routed experts + 1 shared expert are activated per token.
Attention Layers: Gated Delta Networks (Linear Attention) replace standard attention in 75% of the layers. The 60-layer stack follows a pattern: 3x(GDN->MoE) -> 1x(GatedAttention->MoE).
Multimodal: Native early fusion, not a late-stage adapter. Uses DeepStack Vision Transformer + Conv3d for video understanding.
Inference Acceleration: Built-in Multi-Token Prediction (MTP) for out-of-the-box speculative decoding.
Vocabulary: 250K vocabulary (up from 152K), making Chinese, math, and code tokens more compact, saving 15-25% on token costs.

In short, its core innovation is using "Linear Attention + a massive expert pool" to trade for inference efficiency. 397B parameters sound intimidating, but since it only runs 17B per token, it's 8-19x faster than dense models of the same size.

Core Function Implementation

Visual Agent Workflow:

Receives a desktop/mobile screenshot.
Identifies UI elements (buttons, input boxes, menus, etc.).
Plans a multi-step operation flow.
Generates executable commands.
Built-in tool calling: Web search, code execution, external APIs.

This is very similar to Anthropic’s Computer Use, but it's open source. You can build it quickly using the Qwen-Agent framework, which supports Function Calling, MCP, Code Interpreter, and RAG.

Open Source Status

Is it open?: Yes, Apache 2.0, commercial use allowed.
GitHub: QwenLM/Qwen3.5
Hugging Face: Qwen/Qwen3.5-397B-A17B
Ecosystem Scale: Over 170,000 derivative models and 600M+ downloads.
Similar Projects: DeepSeek V3.2 (MIT License), Llama 4 Maverick (Llama License).
Difficulty to DIY: Extremely high. Requires a 10,000-GPU cluster + trillions of tokens + millions of agent environments for RL. This isn't something an individual can build from scratch.

Business Model

Monetization: Open source drives traffic + Cloud API fees (Alibaba Cloud Model Studio).
Pricing: Qwen3.5-Plus API is ~$0.40/1M input tokens.
Comparison: GPT-4.1 is $2.00, Claude Opus is $15.00.
Enterprise Adoption: Attracted over 90,000 enterprises in one year.
Core Strategy: Using the model to drive Alibaba Cloud's overall business; the model itself doesn't necessarily need to be the profit center.

Giant Risk

Qwen3.5 is made by a giant (Alibaba). The real question is: Will your app be crushed by Alibaba itself?

It depends on what you build. If you're making a general AI assistant, Alibaba's Tongyi Qianwen will likely dominate. But if you focus on vertical niches (Legal, Medical, Finance), Alibaba is unlikely to go that deep. The open-source license guarantees your freedom—you can fine-tune a proprietary model, something you can't do with closed APIs.

For Product Managers

Pain Point Analysis

What it solves: Enterprises want to use LLMs for automation but face three hurdles: expensive APIs, data privacy concerns, and fragmented multimodal capabilities.
How painful is it?: High-frequency demand. By 2026, 80% of enterprises will deploy GenAI, but many are stuck on cost and security.

User Persona

AI App Teams: Need cheap, reliable APIs for high-frequency calls.
Enterprise IT: Sensitive data cannot leave the premises; requires private deployment.
Global Teams: Need support for 201 languages.
Automation Engineers: Want AI to operate software to complete complex workflows.

Feature Breakdown

Feature	Type	Description
Text Reasoning (Code/Math/Logic)	Core	LiveCodeBench 83.6, AIME 91.3
Native Multimodal (Image/Video)	Core	Fused from pre-training, not stitched later
Visual Agent (Desktop/Mobile)	Core	Open-source alternative to Computer Use
1M Token Context	Core	Supported by default in the Plus version
201 Languages	Nice-to-have	69% increase over previous gen; great for global products
Thinking/Non-Thinking Modes	Nice-to-have	Deep thought for complex issues, instant replies for simple ones
Tool Calling/MCP/RAG	Core	Fully supported by the Qwen-Agent framework

Competitor Differentiation

vs	Qwen3.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Flash
Core Difference	Open Source + MoE Efficiency	Closed Source All-rounder	Highest Reliability	Price Competitive
Price/1M Tokens	$0.40	Unannounced (High)	$15.00	$0.40
Open Source	Apache 2.0	No	No	No
Multimodal	Native Fusion	Native	Native	Native
Agent Capability	Visual Agent	Strong	Computer Use	Average
Context	256K / 1M	128K	200K	1M
Chinese Capability	Extremely Strong	Strong	Strong	Strong

Key Takeaways

Dual-Track Strategy: Use Apache 2.0 to attract developers (600M downloads), then monetize via the Plus API. This is smarter than being purely closed or purely open.
MoE Optimization: 397B params with only 17B active. It unifies "comprehensive" and "efficient" through architecture. Product lesson: More features don't mean they all need to load at once.
Native Multimodal: Don't stitch things together after the fact. Fusion from day one leads to a better experience. Product lesson: Core capabilities should be designed at the architectural level, not as patches.

For Tech Bloggers

Founder Story

The key figure behind Qwen is Jingren Zhou, CTO of Alibaba Cloud.

His resume is impressive: PhD in CS from Columbia, 11 years at Microsoft (Bing infrastructure architect), joined Alibaba in 2015. In 2021, he led the team that scaled the M6 model to 10 trillion parameters—the world's largest at the time—using only 512 GPUs for 10 days.

This achievement laid the technical foundation for Qwen. By December 2025, Zhou was promoted to Alibaba Group Partner, placing him in the core decision-making circle. Notably, Jack Ma, retired for 6 years, has begun receiving regular briefings from Zhou—indicating Qwen is a group-level strategic priority.

Another person to watch is Junyang Lin, a core Qwen researcher who is very active on X (Twitter), explaining naming logic and technical details as the team's public technical voice.

Controversies / Discussion Angles

The Naming Mess: From Qwen3 to Qwen3-Next to Qwen3.5, the community is confused. Even Lin admitted "Qwen3.5-Preview" was awkward, making people wonder, "+0.5 then -0.4?"
Benchmark Skepticism: CNBC noted that Alibaba's claims of surpassing GPT-5.2 "cannot be independently verified." This is a classic AI problem—every model claims to be the best, but real-world performance varies.
A New Chapter in US-China AI: In the same week Qwen3.5 launched, ByteDance released Doubao 2.0 and DeepSeek teased a new model. Chinese AI is no longer just "catching up"; it's leading in certain open-source directions.
The Open Source "Gambit": Alibaba open-sourcing a top-tier model under Apache 2.0 seems altruistic, but it's actually a way to lock developers into the Alibaba Cloud ecosystem. Clever, and worth a debate.

Hype Data

ProductHunt: 151 Votes
Media Coverage: Major reports from CNBC, VentureBeat, ComputerWorld, eWeek, and Silicon Republic.
Hardware Ecosystem: Day 0 GPU support from AMD; featured technical blog from NVIDIA.
Open Source Ecosystem: 600M+ downloads, 170,000+ derivative models.

Content Suggestions

Angle: "How Chinese Open-Source AI is Redefining the Price War" — Focus on the $0.40 vs. $15 price gap.
Trend Jacking: Compare it with Anthropic’s latest Computer Use update: "Open Source vs. Closed Source Visual Agents."
Deep Dive: What is Gated Delta Networks? How Linear Attention makes a 1-million-token context actually usable.

For Early Adopters

Pricing Analysis

Tier	Price	Features	Is it enough?
Open Source (Self-host)	Free	256K context, full 397B model	Enough if you have the GPUs.
Qwen3.5-Plus API	~$0.40/1M input tokens	1M context, tool calling, multimodal	Enough for 95% of use cases.
Qwen3-Max-Thinking	$1.20/1M input tokens	Enhanced reasoning, deep thought	For complex logic tasks.
Third-party (Groq/OpenRouter)	$0.29-0.50/1M tokens	Smaller models like Qwen3-32B	Great for daily dev work.

Is the free version enough? If you have the hardware (at least 3x80GB GPUs), the open-source version is fully featured. If not, the Plus API is so cheap it's almost negligible. At $0.40 per million tokens, processing a whole book costs about $0.08.

Quick Start Guide

Setup Time: 5 mins (API) / 30 mins (Local)
Learning Curve: Low (OpenAI API compatible)

Fastest Way to Start (3 steps):

Sign up for Alibaba Cloud Model Studio and get an API Key.
Change the base_url in your code from api.openai.com to the Alibaba endpoint.
Change the model parameter to qwen3.5-plus. Done.

Running Locally (with GPUs):

Install vLLM: pip install vllm
Start the service: vllm serve Qwen/Qwen3.5-397B-A17B --tensor-parallel-size 8
Call it via the OpenAI-compatible interface.

For Mac Users (256GB M3 Ultra):

Use the Unsloth 4-bit quantized version (214GB).
Deploy via llama-server.
Expect 25+ tokens/s, which is plenty for daily use.

Pitfalls and Complaints

Debugging Fails: "Good at writing new code, but when modifying existing code, it often gets it right then breaks it later and can't fix it." — Developer feedback.
Naming Confusion: Qwen3.5-Plus isn't an upgrade package for the open-source version; it's Alibaba's managed service. The naming is confusing.
Local Barriers: Even though it only "activates 17B," you still have to load all 397B into VRAM. Even with 4-bit quantization, you need 200GB+. Don't be fooled into thinking a small machine can run it.
Not the Best at Everything: In coding agent benchmarks like SWE-bench, it still lags behind specialized coding models from Claude/GPT.

Security and Privacy

Data Storage: Open-source version is fully local; data never leaves your machine. Plus API goes through Alibaba Cloud and is subject to their privacy policy.
Auditability: Apache 2.0. Code and weights are public; anyone can audit them.
Note: If using the Alibaba API, data passes through Chinese servers. For sensitive data, self-hosting is recommended.

Alternatives

Alternative	Advantage	Disadvantage
DeepSeek V3.2	MIT License, elite coding	Company future uncertainty
Llama 4 Maverick	Meta backing, huge ecosystem	MoE efficiency lags Qwen
Gemini 3 Flash	Similar price, Google ecosystem	Closed source, no self-hosting
Claude Opus 4.5	Most stable and reliable	37x more expensive
Mistral Large	European, GDPR friendly	Slightly lower capability

For Investors

Market Analysis

Sector Size: Enterprise LLM market $5.91B in 2026, projected $48.25B by 2034 (30% CAGR).
AI Agent Market: $7.8B in 2026 -> $52B by 2030.
Growth Rate: Global LLM market CAGR of 35.57%.
Drivers: Gartner predicts 80% of enterprises will deploy GenAI by 2026, with 40% of apps embedding AI Agents.

Competitive Landscape

Tier	Players	Positioning
Top-tier Closed	OpenAI (GPT-5.2), Anthropic (Claude Opus), Google (Gemini 3)	Best performance, highest price
Top-tier Open	Alibaba Qwen3.5, Meta Llama 4	Open + Commercial dual-track
Chinese Rivals	DeepSeek, ByteDance Doubao, Zhipu GLM, Moonshot Kimi	Intense competition, niche strengths
Inference Platforms	Groq, Together AI, Fireworks	Profit from inference efficiency

Timing Analysis

Why now?: February 2026 is the tipping point for agentic AI. Anthropic, OpenAI, and Qwen are all betting on "AI operating computers" simultaneously.
Tech Maturity: MoE architecture is now production-ready. Gated Delta Networks (Linear Attention) make 1-million-token contexts actually usable.
Market Readiness: Enterprises are desperate for automation but blocked by the cost of closed APIs. Qwen3.5 fills this gap perfectly.

Team Background

Leader: Jingren Zhou, Alibaba Cloud CTO/SVP, Columbia CS PhD, 11 years at Microsoft.
Scale: Alibaba Cloud's core AI team. While exact numbers aren't public, the release speed of 300+ models suggests a massive operation.
Track Record: Scaled M6 to 10T params in 2021; Qwen series adopted by 90,000 enterprises in one year.
Strategic Status: Jack Ma personally reviews progress; Zhou promoted to Group Partner in late 2025.

Funding Status

Parent Company: Alibaba Group (NYSE: BABA), Market Cap ~$300B.
Funding: Qwen is a strategic project funded internally by the group.
Commercial Signals: BABA stock rose on the day of Qwen3.5's launch; 90,000 enterprise users indicate real revenue for Alibaba Cloud AI.
Investment Angle: You can't invest in Qwen directly, but BABA stock is the indirect vehicle.

Conclusion

The Bottom Line: Qwen3.5 is the new benchmark for open-source LLMs in 2026—offering 80-90% of the capability of closed models at less than 1/5 the price, with the strongest visual agent capabilities in the open-source world.

User Type	Recommendation
Developers	Highly Recommended. Apache 2.0, cheap, OpenAI compatible. Unless you need the absolute best debugging, you should at least try it.
Product Managers	Recommended. The MoE efficiency and dual-track strategy are great case studies for product design.
Bloggers	Worth writing about. The "$0.40 vs. $15" price war and the US-China AI race offer many angles.
Early Adopters	Recommended. API takes 5 mins to set up. But keep Claude as a backup for complex debugging.
Investors	Watch the sector. Qwen3.5 proves the commercial viability of open-source LLMs. BABA is a key indirect play.

Resource Links

Resource	Link
Official Site	Alibaba Cloud Model Studio
GitHub	QwenLM/Qwen3.5
Hugging Face	Qwen/Qwen3.5-397B-A17B
Documentation	Qwen Docs
Agent Framework	Qwen-Agent
vLLM Deployment	vLLM Recipes
Local (Unsloth)	Unsloth Guide
Twitter	@Alibaba_Qwen

2026-02-17 | Trend-Tracker v7.3

Qwen3.5