Gemma 4 26B A4B - NVIDIA RTX 4090

ollama/gemma4:26b

gemma4:26bReleased April 2, 202623K context

Best forReasoningAgentic Coding

Best fit competitor·Claude Sonnet 4.5

Why teams pick Gemma 4 26B A4B over Claude Sonnet 4.5

Run it on your own GPU with predictable flat pricing — no per-token API meter running in the background.

Apache 2.0 weights you can fine-tune, audit, and keep inside your network instead of routing prompts through a hosted API.

MoE architecture activates only 3.8B parameters per token, so you get strong reasoning quality without paying for a full dense 27B+ API bill.

About

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

Compare

Model Cost Across Durations

Live pack pricing vs typical API estimates from 11 hours through 1 month.

API estimates for GPT-4.1 and Claude Sonnet 4.5 vs Gemma 4 26B A4B on RTX 4090.

Time pack

Gemma 4 26B A4B on RTX 4090

24 hours cost

$22

Lowest

GPT-4.1

24 hours cost

$33.87

Save $11.87 vs our model

Claude Sonnet 4.5

24 hours cost

$58.06

Save $36.06 vs our model

Models in chart

Gemma 4 26B A4B on RTX 4090
GPT-4.1
Claude Sonnet 4.5

At a glance

Release

April 2, 2026

Parameters

25.8B (reported)

Quantization

Q4_K_M

Size

18GB

Context

23K

Benchmarks

Performance metrics for Gemma 4 26B A4B (Reasoning). Source: Artificial Analysis.

Performance indexes

31.2

Artificial Analysis

Intelligence Index

Better than 65% of models compared

22.4

Artificial Analysis

Coding Index

Better than 57% of models compared

32.1

Artificial Analysis

Agentic Index

Better than 58% of models compared

Benchmark scores

GPQA Diamond

i

Graduate-level scientific reasoning

79.2%

HLE

i

Humanity's Last Exam

18.3%

IFBench

i

Instruction-following benchmark

72.4%

τ²-Bench Telecom

i

Conversational AI agents in dual-control scenarios

43.6%

AA-LCR

i

Long context reasoning evaluation

55.7%

GDPval-AA

i

Economically valuable tasks

25.7%

CritPt

i

Research-level physics reasoning

0.0%

Apps & integrations

Choose an app below. Each guide shows how to point the app at your OpenAI-compatible endpoint.

Automate workflows and call your model as a node.

Build AI agents and tools on an OpenAI-compatible endpoint.

Connect agent runners to your chat completions endpoint.

Power developer tools with your OpenAI-compatible model.

Override OpenAI Base URL in Cursor Settings and use your model with BYOK.

Use the Cline extension in VS Code to connect your OpenAI-compatible endpoint.

Run OpenAI Codex CLI against your Chat Completions endpoint via config.toml.

Full Pi OS guide: SSH, API keys, curl, Python venv, systemd, and troubleshooting.

FAQ

Frequently asked questions

Common questions about Gemma 4 26B A4B, deployment, and using it on OpenLLM Buddy.

6 questions

Ready to try it? Deploy Gemma 4 26B A4B · Browse models