How It Works

From template to live API in three steps

OpenLLM Buddy is managed deployment for open-source LLMs: you choose the model and GPU, we run the infrastructure, you integrate with tools you already use.

Three Steps. Deploy to API.

Pick a model on dedicated NVIDIA GPU, buy a flat time pack, and call your instance with an OpenAI-compatible API — we handle provisioning, limits, and uptime.

Pick a model & time pack

Browse the OpenLLM catalog or community templates on the self-deploy page. Choose Gemma 4 26B A4B, Qwen3.6 27B A4B, or another mapped model on RTX 4090 / 5090. Select an 11-hour or 24-hour flat pack — one price, no per-token meter.

/console/deploy Gemma 4 26B A4B · 11h pack Qwen3.6 27B A4B · 24h pack

Browse templates

Pay & we spin it up

Checkout with Razorpay. We provision your dedicated GPU and surface progress on My Instances. When status is Live, copy your API key and base URL from the console.

Pay & deploy → /console/pods queued → provisioning → Live API key: ob_sk_••••••••

Open deploy

Call it like OpenAI

POST to /v1/chat/completions with your key and model handle (e.g. gemma4:26b or qwen3.6:27b). Drop the same base URL + Bearer token into n8n, Cursor, OpenClaw, Codex, or any agent stack.

POST /v1/chat/completions model: "gemma4:26b" Authorization: Bearer ob_sk_…

API reference

Ready to deploy

Compare models, see flat pack pricing, or jump straight into deploy.

Deploy now Browse templates API docs