OpenLLM Buddy

Open-source models on dedicated GPU

Every listing pairs a model with a GPU class (for example NVIDIA RTX 4090 or RTX 5090). Compare Gemma 4 26B A4B and Qwen3.6 27B A4B — benchmarks, specs, flat pack pricing, and OpenAI-compatible API examples on your instance.

Catalog

2 models

  • April 2, 2026

    Deployed on dedicated GPU

    NVIDIA RTX 4090

    Model + GPU instance

    Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B q…

    Release
    April 2, 2026
    Parameters
    25.8B (reported)
    Quantization
    Q4_K_M
    Size
    18GB
    Context
    23K
  • Apr 27, 2026

    Deployed on dedicated GPU

    NVIDIA RTX 5090

    Model + GPU instance

    Qwen3.6 27B A4B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video i…

    Release
    Apr 27, 2026
    Parameters
    27.8 B (reported)
    Quantization
    Q4_K_M
    Size
    17GB
    Context
    159K