Reference

API documentation

Call your deployed models with an OpenAI-compatible POST /v1/chat/completions endpoint. Use an API key from API Keys in the console — each key routes to one instance.

Quick start

  1. Deploy a model under Templates and wait until the instance is ready.
  2. Use the API key created automatically when the instance became ready. If you did not save it, open API Keys, select that instance, and generate a new key (shown once).
  3. Send a POST to the chat completions URL below with Authorization: Bearer … and a JSON body including model and messages.

API keys & instances

Keys tie your apps to a single running instance. Manage them under API Keys or view live instances on My Instances.

  • Auto-generated on deployWhen an instance reaches ready, a default API key is created automatically in the background. The secret is only shown once at creation—if you did not copy it, generate a new key below.
  • One key, one instanceEach API key is bound to exactly one deployment. Requests authenticated with that key are routed only to that instance’s model endpoint.
  • Instance ended → key revokedWhen an instance stops, fails, or its pack ends, all keys for that deployment are revoked. They cannot be used again. A new deployment needs a new API key.
  • Extend pack → same keyExtending uptime on the same instance keeps the same deployment id. Your existing API keys continue to work—no rotation required.
  • Create another keyOpen API Keys, choose the ready instance, name the key, and click Generate Key. You can hold multiple active keys per instance if needed.
Generate API key

Endpoint

Use this URL for all chat requests through the OpenLLM Buddy proxy. Your API key selects which deployment receives the traffic.

POST openllmbuddy-proxy.botbuddytech.workers.dev/v1/chat/completions

OpenAI SDK users can set baseURL to the same host with path /v1 (omit /chat/completions).

Authentication

Send your secret key in the Authorization header using the Bearer scheme. Keys look like ob_sk_….

Header
http
Authorization: Bearer ob_sk_000000000000000000000000000000000000000000000001
Manage API keys

Headers

HeaderRequiredValueNotes
AuthorizationYesBearer <YOUR_API_KEY>API key from Console → API Keys. Each key is tied to one deployment.
Content-TypeYesapplication/jsonRequest body must be JSON.

Request body

JSON object. The modelfield must match your deployment's model id.

FieldTypeRequiredDescription
modelstringYesModel id for your deployment (see table below). Must match the model running on that instance.
messagesarrayYesChat turns: [{ "role": "user" | "assistant" | "system", "content": "..." }, ...].
temperaturenumberNoSampling temperature (0–2). Default depends on the upstream runtime.
top_pnumberNoNucleus sampling (0–1). Often used with temperature.
max_tokensintegerNoCap completion length when supported by the upstream server.
Example body
json
{
  "model": "qwen3.6:27b",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "temperature": 1,
  "top_p": 0.95
}

Model ids

Use the modelvalue from your deployment's model page.

ModelAPI model id
Gemma 4 26B A4Bgemma4:26b
Qwen3.6 27B A4Bqwen3.6:27b

Response

On success you receive a standard OpenAI-style chat completion. Read the assistant text from choices[0].message.content. Token usage is in usage when the upstream provides it.

Example response
json
{
  "id": "chatcmpl-example",
  "object": "chat.completion",
  "created": 1779715432,
  "model": "qwen3.6:27b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The assistant reply is in choices[0].message.content."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 48,
    "total_tokens": 60
  }
}

Code examples

cURL
bash
curl -s openllmbuddy-proxy.botbuddytech.workers.dev/v1/chat/completions \
  -H 'Authorization: Bearer ob_sk_000000000000000000000000000000000000000000000001' \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.6:27b","messages":[{"role":"user","content":"Hello!"}]}'

Errors

HTTPMeaning
400Missing API key, invalid JSON body, or malformed request.
401Invalid or revoked API key.
404Deployment not found for this key.
409Deployment not ready, stopped, terminated, or has no endpoint yet.