Qwen3.6 27B A4B - NVIDIA RTX 5090

qwen/qwen3.6-27b

qwen3.6:27bReleased Apr 27, 2026159K context

Best forCodeAgentic Coding

Best fit competitor·Claude 3.5 Sonnet

Why teams pick Qwen3.6 27B A4B over Claude 3.5 Sonnet

Built for agentic coding and repo-level reasoning — the same workloads teams reach for Sonnet on, without sending code to a third-party API.

262K native context (YaRN to ~1M) on hardware you control, with Thinking Preservation across long sessions.

Apache 2.0 and self-hosted: fine-tune, air-gap, and ship features without Anthropic rate limits or data-handling policies.

About

Qwen3.6 27B A4B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs — and supports a 262,144-token context window.

The model is designed for agentic coding and reasoning tasks, with particular strength in repository-level code comprehension, front-end development workflows, and multi-step problem solving. It includes a built-in thinking mode for extended reasoning and preserves thinking context across conversation history. Qwen3.6 27B A4B supports 201 languages and dialects and is released under the Apache 2.0 license.

Compare

Model Cost Across Durations

Live pack pricing vs typical API estimates from 11 hours through 1 month.

API estimates for GPT-5.4 and Claude Opus 4.6 vs Qwen 3.6 27B A4B on RTX 5090.

Time pack

Qwen 3.6 27B A4B on RTX 5090

24 hours cost

$31

Lowest

GPT-5.4

24 hours cost

$35.47

Save $4.47 vs our model

Claude Opus 4.6

24 hours cost

$63.07

Save $32.07 vs our model

Models in chart

Qwen 3.6 27B A4B on RTX 5090
GPT-5.4
Claude Opus 4.6

At a glance

Release

Apr 27, 2026

Parameters

27.8 B (reported)

Quantization

Q4_K_M

Size

17GB

Context

159K

Benchmarks

Snapshot of third-party and official benchmark metrics for Qwen3.6 27B A4B.

Performance indexes

46

Artificial Analysis

Intelligence Index

#1 among open-weight small models (4B–40B)

77.2

SWE-bench Verified

Coding Index

Strong real-world GitHub issue resolution

59.3

Terminal-Bench 2.0

Agentic Index

Agentic terminal + tool use

Benchmark scores

GPQA Diamond

i

Graduate-level scientific reasoning

87.8%

HLE

i

Humanity's Last Exam

24.0%

IFBench

i

Instruction-following benchmark

83.9%

What it's good at

Flagship-level agentic coding and terminal use — outperforms the 397B Qwen3.5 MoE on every major coding benchmark.

Repository-level code comprehension and multi-step problem solving across long contexts.

Front-end development: QwenWebBench covers Web Design, Web Apps, Games, SVG, Data Visualization, Animation, and 3D (bilingual EN/CN).

Extended reasoning via built-in Thinking Preservation mode — preserves chain-of-thought across conversation turns.

Strong multilingual support: 201 languages and dialects.

Efficient local deployment: runs on ~18GB VRAM (Q4 quantized: ~16.8GB); dense architecture compresses more predictably than MoE.

Apps & integrations

Choose an app below. Each guide shows how to point the app at your OpenAI-compatible endpoint.

Automate workflows and call your model as a node.

Build AI agents and tools on an OpenAI-compatible endpoint.

Connect agent runners to your chat completions endpoint.

Power developer tools with your OpenAI-compatible model.

Override OpenAI Base URL in Cursor Settings and use your model with BYOK.

Use the Cline extension in VS Code to connect your OpenAI-compatible endpoint.

Run OpenAI Codex CLI against your Chat Completions endpoint via config.toml.

Full Pi OS guide: SSH, API keys, curl, Python venv, systemd, and troubleshooting.

FAQ

Frequently asked questions

Common questions about Qwen3.6 27B A4B, deployment, and using it on OpenLLM Buddy.

6 questions

Ready to try it? Deploy Qwen3.6 27B A4B · Browse models