Back to Explore

Qwen3.5

LLMs

The 397B native multimodal agent with 17B active params

💡 Qwen3.5 is the flagship large language model series developed by the Qwen team at Alibaba Cloud, featuring state-of-the-art multimodal and agentic capabilities. - QwenLM/Qwen3

"Qwen3.5 is like a massive library with 512 specialized librarians; instead of everyone shouting at once, only the 11 best experts step forward to answer your specific question instantly."

30-Second Verdict
What is it: An Alibaba-backed 397B parameter (MoE) open-source model supporting native multimodal and desktop agent operations under the Apache 2.0 license.
Worth attention: Extremely high. API pricing is 1/5 to 1/37 of GPT-4, with performance rivaling or exceeding GPT-5.2 and Claude 4.5 on many metrics. Supports free commercial use and private deployment.
8/10

Hype

9/10

Utility

301

Votes

Product Profile
Full Analysis Report

Qwen3.5: The "Price Assassin" of Open-Source LLMs is Here

2026-02-17 | ProductHunt | GitHub | PH 151 Votes


30-Second Quick Judgment

What is it?: An open-source LLM by Alibaba Cloud with 397B parameters, but only 17B are active at any time (like a team of 512 experts where only 11 are assigned to each question). It can see images, watch videos, and even operate a computer desktop. Apache 2.0 license, free for commercial use.

Is it worth your attention?: Absolutely. If you're developing with GPT-4 or Claude, Qwen3.5's API price is 1/5 to 1/37 of theirs. If you have the GPU resources, you can download and run it for free. This isn't just "another Chinese model"; it actually beats GPT-5.2 and Claude Opus 4.5 on several benchmarks.


Three Questions for Me

Is it relevant to me?

Target Users:

  • AI App Developers (need cheap, high-quality model APIs)
  • Enterprise IT Teams (want to self-host models to keep data private)
  • Multi-language Scenarios (supports 201 languages, exceptionally strong in Chinese)
  • Agent/Automation Developers (native support for tool calling and desktop control)

Am I the target?: You are if any of the following apply:

  • You use OpenAI/Anthropic APIs but find the monthly bill too high.
  • You want to build AI Agents that can operate a computer to complete tasks.
  • You build multi-language products and need a model strong in both English and Chinese.
  • You have GPU servers and want to run an unrestricted open-source model.

When would I use it?:

  • Code generation and refactoring -> Use this (LiveCodeBench score of 83.6, human competition level).
  • Long document analysis and summarization -> Use this (1 million token context window).
  • Operating desktop software for you -> Use this (native Visual Agent capabilities).
  • Need extremely stable production environment debugging -> Consider Claude (Qwen's debugging is still slightly less stable).

Is it useful to me?

DimensionBenefitCost
TimeMultimodal + Agent in one model; no need to stitch multiple APIs.Learning a new API format (though it's OpenAI-compatible, so cost is low).
MoneyAPI price ~$0.40/1M tokens, 5x cheaper than GPT-4.1; self-hosting is free.Self-hosting requires 3-4 80GB GPUs (~$15K hardware).
EffortOpen source + Apache 2.0; modify it however you want without permission.Confusing naming (3.5/Plus/Max); need to figure out which one to pick.

ROI Judgment: If your monthly API spend exceeds $100, switching to Qwen3.5-Plus can save you 60-80% immediately. If you have idle GPU servers, the ROI of self-hosting is nearly infinite. The learning curve is minimal because it's compatible with the OpenAI API format—just change the base_url.

Is it exciting?

The "Wow" Factors:

  • Price Assassin: $0.40 vs. Claude's $15. Get the same job done for 1/37th of the cost.
  • 1 Million Token Context: Throw an entire codebase in and ask everything at once.
  • Visual Agent: Give it a desktop screenshot, and it can plan and execute steps—an open-source alternative to Claude's Computer Use.
  • Unrestricted Open Source: Apache 2.0. Change it, sell it, do whatever you want.

Real User Feedback:

"A flagship open-weight model. It's particularly strong in search, synthesis, low hallucination, and handling long context." — Latent Space

"If you're building real systems, you care about three things: capability, iteration cost, and how often the model makes you say 'Why did you do that?'" — AnalyticsVidhya Test

"Great at writing new code, but prone to errors when debugging or modifying existing code." — Reddit Developer Community


For Independent Developers

Tech Stack

This is the most hardcore part of the Qwen3.5 architecture:

  • Core Architecture: Sparse MoE (Mixture-of-Experts) with 512 experts; only 10 routed experts + 1 shared expert are activated per token.
  • Attention Layers: Gated Delta Networks (Linear Attention) replace standard attention in 75% of the layers. The 60-layer stack follows a pattern: 3x(GDN->MoE) -> 1x(GatedAttention->MoE).
  • Multimodal: Native early fusion, not a late-stage adapter. Uses DeepStack Vision Transformer + Conv3d for video understanding.
  • Inference Acceleration: Built-in Multi-Token Prediction (MTP) for out-of-the-box speculative decoding.
  • Vocabulary: 250K vocabulary (up from 152K), making Chinese, math, and code tokens more compact, saving 15-25% on token costs.

In short, its core innovation is using "Linear Attention + a massive expert pool" to trade for inference efficiency. 397B parameters sound intimidating, but since it only runs 17B per token, it's 8-19x faster than dense models of the same size.

Core Function Implementation

Visual Agent Workflow:

  1. Receives a desktop/mobile screenshot.
  2. Identifies UI elements (buttons, input boxes, menus, etc.).
  3. Plans a multi-step operation flow.
  4. Generates executable commands.
  5. Built-in tool calling: Web search, code execution, external APIs.

This is very similar to Anthropic’s Computer Use, but it's open source. You can build it quickly using the Qwen-Agent framework, which supports Function Calling, MCP, Code Interpreter, and RAG.

Open Source Status

  • Is it open?: Yes, Apache 2.0, commercial use allowed.
  • GitHub: QwenLM/Qwen3.5
  • Hugging Face: Qwen/Qwen3.5-397B-A17B
  • Ecosystem Scale: Over 170,000 derivative models and 600M+ downloads.
  • Similar Projects: DeepSeek V3.2 (MIT License), Llama 4 Maverick (Llama License).
  • Difficulty to DIY: Extremely high. Requires a 10,000-GPU cluster + trillions of tokens + millions of agent environments for RL. This isn't something an individual can build from scratch.

Business Model

  • Monetization: Open source drives traffic + Cloud API fees (Alibaba Cloud Model Studio).
  • Pricing: Qwen3.5-Plus API is ~$0.40/1M input tokens.
  • Comparison: GPT-4.1 is $2.00, Claude Opus is $15.00.
  • Enterprise Adoption: Attracted over 90,000 enterprises in one year.
  • Core Strategy: Using the model to drive Alibaba Cloud's overall business; the model itself doesn't necessarily need to be the profit center.

Giant Risk

Qwen3.5 is made by a giant (Alibaba). The real question is: Will your app be crushed by Alibaba itself?

It depends on what you build. If you're making a general AI assistant, Alibaba's Tongyi Qianwen will likely dominate. But if you focus on vertical niches (Legal, Medical, Finance), Alibaba is unlikely to go that deep. The open-source license guarantees your freedom—you can fine-tune a proprietary model, something you can't do with closed APIs.


For Product Managers

Pain Point Analysis

  • What it solves: Enterprises want to use LLMs for automation but face three hurdles: expensive APIs, data privacy concerns, and fragmented multimodal capabilities.
  • How painful is it?: High-frequency demand. By 2026, 80% of enterprises will deploy GenAI, but many are stuck on cost and security.

User Persona

  • AI App Teams: Need cheap, reliable APIs for high-frequency calls.
  • Enterprise IT: Sensitive data cannot leave the premises; requires private deployment.
  • Global Teams: Need support for 201 languages.
  • Automation Engineers: Want AI to operate software to complete complex workflows.

Feature Breakdown

FeatureTypeDescription
Text Reasoning (Code/Math/Logic)CoreLiveCodeBench 83.6, AIME 91.3
Native Multimodal (Image/Video)CoreFused from pre-training, not stitched later
Visual Agent (Desktop/Mobile)CoreOpen-source alternative to Computer Use
1M Token ContextCoreSupported by default in the Plus version
201 LanguagesNice-to-have69% increase over previous gen; great for global products
Thinking/Non-Thinking ModesNice-to-haveDeep thought for complex issues, instant replies for simple ones
Tool Calling/MCP/RAGCoreFully supported by the Qwen-Agent framework

Competitor Differentiation

vsQwen3.5GPT-5.2Claude Opus 4.5Gemini 3 Flash
Core DifferenceOpen Source + MoE EfficiencyClosed Source All-rounderHighest ReliabilityPrice Competitive
Price/1M Tokens$0.40Unannounced (High)$15.00$0.40
Open SourceApache 2.0NoNoNo
MultimodalNative FusionNativeNativeNative
Agent CapabilityVisual AgentStrongComputer UseAverage
Context256K / 1M128K200K1M
Chinese CapabilityExtremely StrongStrongStrongStrong

Key Takeaways

  1. Dual-Track Strategy: Use Apache 2.0 to attract developers (600M downloads), then monetize via the Plus API. This is smarter than being purely closed or purely open.
  2. MoE Optimization: 397B params with only 17B active. It unifies "comprehensive" and "efficient" through architecture. Product lesson: More features don't mean they all need to load at once.
  3. Native Multimodal: Don't stitch things together after the fact. Fusion from day one leads to a better experience. Product lesson: Core capabilities should be designed at the architectural level, not as patches.

For Tech Bloggers

Founder Story

The key figure behind Qwen is Jingren Zhou, CTO of Alibaba Cloud.

His resume is impressive: PhD in CS from Columbia, 11 years at Microsoft (Bing infrastructure architect), joined Alibaba in 2015. In 2021, he led the team that scaled the M6 model to 10 trillion parameters—the world's largest at the time—using only 512 GPUs for 10 days.

This achievement laid the technical foundation for Qwen. By December 2025, Zhou was promoted to Alibaba Group Partner, placing him in the core decision-making circle. Notably, Jack Ma, retired for 6 years, has begun receiving regular briefings from Zhou—indicating Qwen is a group-level strategic priority.

Another person to watch is Junyang Lin, a core Qwen researcher who is very active on X (Twitter), explaining naming logic and technical details as the team's public technical voice.

Controversies / Discussion Angles

  • The Naming Mess: From Qwen3 to Qwen3-Next to Qwen3.5, the community is confused. Even Lin admitted "Qwen3.5-Preview" was awkward, making people wonder, "+0.5 then -0.4?"
  • Benchmark Skepticism: CNBC noted that Alibaba's claims of surpassing GPT-5.2 "cannot be independently verified." This is a classic AI problem—every model claims to be the best, but real-world performance varies.
  • A New Chapter in US-China AI: In the same week Qwen3.5 launched, ByteDance released Doubao 2.0 and DeepSeek teased a new model. Chinese AI is no longer just "catching up"; it's leading in certain open-source directions.
  • The Open Source "Gambit": Alibaba open-sourcing a top-tier model under Apache 2.0 seems altruistic, but it's actually a way to lock developers into the Alibaba Cloud ecosystem. Clever, and worth a debate.

Hype Data

  • ProductHunt: 151 Votes
  • Media Coverage: Major reports from CNBC, VentureBeat, ComputerWorld, eWeek, and Silicon Republic.
  • Hardware Ecosystem: Day 0 GPU support from AMD; featured technical blog from NVIDIA.
  • Open Source Ecosystem: 600M+ downloads, 170,000+ derivative models.

Content Suggestions

  • Angle: "How Chinese Open-Source AI is Redefining the Price War" — Focus on the $0.40 vs. $15 price gap.
  • Trend Jacking: Compare it with Anthropic’s latest Computer Use update: "Open Source vs. Closed Source Visual Agents."
  • Deep Dive: What is Gated Delta Networks? How Linear Attention makes a 1-million-token context actually usable.

For Early Adopters

Pricing Analysis

TierPriceFeaturesIs it enough?
Open Source (Self-host)Free256K context, full 397B modelEnough if you have the GPUs.
Qwen3.5-Plus API~$0.40/1M input tokens1M context, tool calling, multimodalEnough for 95% of use cases.
Qwen3-Max-Thinking$1.20/1M input tokensEnhanced reasoning, deep thoughtFor complex logic tasks.
Third-party (Groq/OpenRouter)$0.29-0.50/1M tokensSmaller models like Qwen3-32BGreat for daily dev work.

Is the free version enough? If you have the hardware (at least 3x80GB GPUs), the open-source version is fully featured. If not, the Plus API is so cheap it's almost negligible. At $0.40 per million tokens, processing a whole book costs about $0.08.

Quick Start Guide

  • Setup Time: 5 mins (API) / 30 mins (Local)
  • Learning Curve: Low (OpenAI API compatible)

Fastest Way to Start (3 steps):

  1. Sign up for Alibaba Cloud Model Studio and get an API Key.
  2. Change the base_url in your code from api.openai.com to the Alibaba endpoint.
  3. Change the model parameter to qwen3.5-plus. Done.

Running Locally (with GPUs):

  1. Install vLLM: pip install vllm
  2. Start the service: vllm serve Qwen/Qwen3.5-397B-A17B --tensor-parallel-size 8
  3. Call it via the OpenAI-compatible interface.

For Mac Users (256GB M3 Ultra):

  1. Use the Unsloth 4-bit quantized version (214GB).
  2. Deploy via llama-server.
  3. Expect 25+ tokens/s, which is plenty for daily use.

Pitfalls and Complaints

  1. Debugging Fails: "Good at writing new code, but when modifying existing code, it often gets it right then breaks it later and can't fix it." — Developer feedback.
  2. Naming Confusion: Qwen3.5-Plus isn't an upgrade package for the open-source version; it's Alibaba's managed service. The naming is confusing.
  3. Local Barriers: Even though it only "activates 17B," you still have to load all 397B into VRAM. Even with 4-bit quantization, you need 200GB+. Don't be fooled into thinking a small machine can run it.
  4. Not the Best at Everything: In coding agent benchmarks like SWE-bench, it still lags behind specialized coding models from Claude/GPT.

Security and Privacy

  • Data Storage: Open-source version is fully local; data never leaves your machine. Plus API goes through Alibaba Cloud and is subject to their privacy policy.
  • Auditability: Apache 2.0. Code and weights are public; anyone can audit them.
  • Note: If using the Alibaba API, data passes through Chinese servers. For sensitive data, self-hosting is recommended.

Alternatives

AlternativeAdvantageDisadvantage
DeepSeek V3.2MIT License, elite codingCompany future uncertainty
Llama 4 MaverickMeta backing, huge ecosystemMoE efficiency lags Qwen
Gemini 3 FlashSimilar price, Google ecosystemClosed source, no self-hosting
Claude Opus 4.5Most stable and reliable37x more expensive
Mistral LargeEuropean, GDPR friendlySlightly lower capability

For Investors

Market Analysis

  • Sector Size: Enterprise LLM market $5.91B in 2026, projected $48.25B by 2034 (30% CAGR).
  • AI Agent Market: $7.8B in 2026 -> $52B by 2030.
  • Growth Rate: Global LLM market CAGR of 35.57%.
  • Drivers: Gartner predicts 80% of enterprises will deploy GenAI by 2026, with 40% of apps embedding AI Agents.

Competitive Landscape

TierPlayersPositioning
Top-tier ClosedOpenAI (GPT-5.2), Anthropic (Claude Opus), Google (Gemini 3)Best performance, highest price
Top-tier OpenAlibaba Qwen3.5, Meta Llama 4Open + Commercial dual-track
Chinese RivalsDeepSeek, ByteDance Doubao, Zhipu GLM, Moonshot KimiIntense competition, niche strengths
Inference PlatformsGroq, Together AI, FireworksProfit from inference efficiency

Timing Analysis

  • Why now?: February 2026 is the tipping point for agentic AI. Anthropic, OpenAI, and Qwen are all betting on "AI operating computers" simultaneously.
  • Tech Maturity: MoE architecture is now production-ready. Gated Delta Networks (Linear Attention) make 1-million-token contexts actually usable.
  • Market Readiness: Enterprises are desperate for automation but blocked by the cost of closed APIs. Qwen3.5 fills this gap perfectly.

Team Background

  • Leader: Jingren Zhou, Alibaba Cloud CTO/SVP, Columbia CS PhD, 11 years at Microsoft.
  • Scale: Alibaba Cloud's core AI team. While exact numbers aren't public, the release speed of 300+ models suggests a massive operation.
  • Track Record: Scaled M6 to 10T params in 2021; Qwen series adopted by 90,000 enterprises in one year.
  • Strategic Status: Jack Ma personally reviews progress; Zhou promoted to Group Partner in late 2025.

Funding Status

  • Parent Company: Alibaba Group (NYSE: BABA), Market Cap ~$300B.
  • Funding: Qwen is a strategic project funded internally by the group.
  • Commercial Signals: BABA stock rose on the day of Qwen3.5's launch; 90,000 enterprise users indicate real revenue for Alibaba Cloud AI.
  • Investment Angle: You can't invest in Qwen directly, but BABA stock is the indirect vehicle.

Conclusion

The Bottom Line: Qwen3.5 is the new benchmark for open-source LLMs in 2026—offering 80-90% of the capability of closed models at less than 1/5 the price, with the strongest visual agent capabilities in the open-source world.

User TypeRecommendation
DevelopersHighly Recommended. Apache 2.0, cheap, OpenAI compatible. Unless you need the absolute best debugging, you should at least try it.
Product ManagersRecommended. The MoE efficiency and dual-track strategy are great case studies for product design.
BloggersWorth writing about. The "$0.40 vs. $15" price war and the US-China AI race offer many angles.
Early AdoptersRecommended. API takes 5 mins to set up. But keep Claude as a backup for complex debugging.
InvestorsWatch the sector. Qwen3.5 proves the commercial viability of open-source LLMs. BABA is a key indirect play.

Resource Links

ResourceLink
Official SiteAlibaba Cloud Model Studio
GitHubQwenLM/Qwen3.5
Hugging FaceQwen/Qwen3.5-397B-A17B
DocumentationQwen Docs
Agent FrameworkQwen-Agent
vLLM DeploymentvLLM Recipes
Local (Unsloth)Unsloth Guide
Twitter@Alibaba_Qwen

2026-02-17 | Trend-Tracker v7.3

One-line Verdict

Qwen3.5 is the most cost-effective flagship open-source model of 2026. Excelling in visual agents and long-context processing, it is the premier choice for developers looking to cut costs while increasing efficiency.

FAQ

Frequently Asked Questions about Qwen3.5

An Alibaba-backed 397B parameter (MoE) open-source model supporting native multimodal and desktop agent operations under the Apache 2.0 license.

The main features of Qwen3.5 include: 1 million token long context window, Native Visual Agent capabilities, Support for 201 languages, Thinking/Non-Thinking dual modes.

Open-source version is free; Plus API is ~$0.40/1M tokens; Max-Thinking is ~$1.20/1M tokens.

AI application developers, enterprise IT teams, global product teams, and automation engineers.

Alternatives to Qwen3.5 include: GPT-5.2, Claude Opus 4.5, Gemini 3 Flash, DeepSeek V3.2..

Data source: ProductHuntFeb 19, 2026
Last updated: