Reference
API documentation
Call your deployed models with an OpenAI-compatible POST /v1/chat/completions endpoint. Use an API key from API Keys in the console — each key routes to one instance.
Quick start
- Deploy a model under Templates and wait until the instance is ready.
- Use the API key created automatically when the instance became ready. If you did not save it, open API Keys, select that instance, and generate a new key (shown once).
- Send a POST to the chat completions URL below with Authorization: Bearer … and a JSON body including model and messages.
API keys & instances
Keys tie your apps to a single running instance. Manage them under API Keys or view live instances on My Instances.
- Auto-generated on deploy — When an instance reaches ready, a default API key is created automatically in the background. The secret is only shown once at creation—if you did not copy it, generate a new key below.
- One key, one instance — Each API key is bound to exactly one deployment. Requests authenticated with that key are routed only to that instance’s model endpoint.
- Instance ended → key revoked — When an instance stops, fails, or its pack ends, all keys for that deployment are revoked. They cannot be used again. A new deployment needs a new API key.
- Extend pack → same key — Extending uptime on the same instance keeps the same deployment id. Your existing API keys continue to work—no rotation required.
- Create another key — Open API Keys, choose the ready instance, name the key, and click Generate Key. You can hold multiple active keys per instance if needed.
Endpoint
Use this URL for all chat requests through the OpenLLM Buddy proxy. Your API key selects which deployment receives the traffic.
OpenAI SDK users can set baseURL to the same host with path /v1 (omit /chat/completions).
Authentication
Send your secret key in the Authorization header using the Bearer scheme. Keys look like ob_sk_….
Authorization: Bearer ob_sk_000000000000000000000000000000000000000000000001
Headers
| Header | Required | Value | Notes |
|---|---|---|---|
| Authorization | Yes | Bearer <YOUR_API_KEY> | API key from Console → API Keys. Each key is tied to one deployment. |
| Content-Type | Yes | application/json | Request body must be JSON. |
Request body
JSON object. The modelfield must match your deployment's model id.
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model id for your deployment (see table below). Must match the model running on that instance. |
| messages | array | Yes | Chat turns: [{ "role": "user" | "assistant" | "system", "content": "..." }, ...]. |
| temperature | number | No | Sampling temperature (0–2). Default depends on the upstream runtime. |
| top_p | number | No | Nucleus sampling (0–1). Often used with temperature. |
| max_tokens | integer | No | Cap completion length when supported by the upstream server. |
{
"model": "gemma4:26b",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"temperature": 1,
"top_p": 0.95
}Model ids
Use the modelvalue from your deployment's model page.
| Model | API model id |
|---|---|
| Gemma 4 26B A4B | gemma4:26b |
| Qwen3.6 27B A4B | qwen3.6:27b |
Response
On success you receive a standard OpenAI-style chat completion. Read the assistant text from choices[0].message.content. Token usage is in usage when the upstream provides it.
{
"id": "chatcmpl-example",
"object": "chat.completion",
"created": 1779715432,
"model": "gemma4:26b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The assistant reply is in choices[0].message.content."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 48,
"total_tokens": 60
}
}Code examples
curl -s openllmbuddy-proxy.botbuddytech.workers.dev/v1/chat/completions \
-H 'Authorization: Bearer ob_sk_000000000000000000000000000000000000000000000001' \
-H 'Content-Type: application/json' \
-d '{"model":"gemma4:26b","messages":[{"role":"user","content":"Hello!"}]}'Errors
| HTTP | Meaning |
|---|---|
| 400 | Missing API key, invalid JSON body, or malformed request. |
| 401 | Invalid or revoked API key. |
| 404 | Deployment not found for this key. |
| 409 | Deployment not ready, stopped, terminated, or has no endpoint yet. |