OpenLLM Buddy

Open-source models on dedicated GPU

Every listing pairs a model with a GPU class (for example NVIDIA RTX 4090 or RTX 5090). Compare Gemma 4 26B A4B and Qwen3.6 27B A4B — benchmarks, specs, flat pack pricing, and OpenAI-compatible API examples on your instance.

Catalog

2 models

Gemma 4 26B A4B
April 2, 2026
Deployed on dedicated GPU
NVIDIA RTX 4090
Model + GPU instance
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B q…
Release
April 2, 2026
Parameters
25.8B (reported)
Quantization
Q4_K_M
Size
18GB
Context
23K
View model
Qwen3.6 27B A4B
Apr 27, 2026
Deployed on dedicated GPU
NVIDIA RTX 5090
Model + GPU instance
Qwen3.6 27B A4B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video i…
Release
Apr 27, 2026
Parameters
27.8 B (reported)
Quantization
Q4_K_M
Size
17GB
Context
159K
View model