Chat Completions API

Build agentic applications with open-source and frontier models. Full tool calling, streaming, and multi-turn conversation support.

Base URL: https://api.ainative.studio

Quick Start

curl -s https://api.ainative.studio/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_SK_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen-coder-32b",
    "max_tokens": 64,
    "messages": [{"role": "user", "content": "Say hello"}]
  }' | jq .

If you get a JSON response with "role": "assistant", you're good.

Authentication

API Key (server-to-server)

POST /v1/messages
x-api-key: sk_your_key_here
Content-Type: application/json

Accepted key formats:

sk_ prefix — AINative native keys (from the dashboard)
Platform tokens — issued by OAuth flow

NOT accepted: sk-ant- prefix (Anthropic-direct keys return 401)

Bearer Token (user-facing apps)

POST /v1/messages
Authorization: Bearer eyJhbGc...
Content-Type: application/json

Bearer tokens expire in 24 hours. Refresh via POST /api/v1/auth/refresh.

Endpoints

Two formats, same backend:

Endpoint	Format	Best For
`POST /v1/messages`	Anthropic Messages API	Anthropic SDK, Cody CLI
`POST /api/v1/chat/completions`	OpenAI-compatible	OpenAI SDK, general integrations

Available Models

Code Models

Model ID	Params	Provider	Tool Calling	Speed	Plan
`qwen-coder-32b`	32B	HuggingFace	Yes	Medium	Pro+
`qwen-coder-7b`	7B	HuggingFace	Yes	Fast	Basic+
`nouscoder-14b`	14B	HuggingFace	Yes	Medium	Pro+
`minimax-m2.7`	—	NVIDIA NIM	No	Fast	Pro+

Text Models

Model ID	Params	Provider	Tool Calling	Speed	Plan
`qwen-7b`	7B	HuggingFace	Yes	Fast	Free
`gemma-2b`	4B*	HuggingFace	Yes	Fastest	Free
`gemma-9b`	27B*	HuggingFace	Yes	Medium	Pro+
`llama-3.3-8b-instruct`	8B	Meta	Yes	Fast	Free
`llama-3.3-70b-instruct`	70B	Meta	Yes	Slow	Basic+
`llama-4-maverick-17b-128e`	17B	Meta	Yes	Medium	Pro+

*gemma models remapped to latest available versions (gemma-3n, gemma-3-27b)

Reasoning Models

Model ID	Params	Provider	Tool Calling	Speed	Plan
`deepseek-r1-distill-qwen-7b`	7B	HuggingFace	Yes	Medium	Pro+
`deepseek-r1-distill-llama-8b`	8B	HuggingFace	Yes	Medium	Pro+
`deepseek-r1`	671B	HuggingFace	Yes	Slow	Enterprise

Frontier Models (routed through our gateway)

Model ID	Provider	Tool Calling	Plan
`claude-3-5-haiku`	Anthropic	Yes	Basic+
`claude-sonnet-4.5`	Anthropic	Yes	Pro+
`claude-opus-4`	Anthropic	Yes	Enterprise

Your sk_ key works — no separate Anthropic key needed.

Multimodal Models

See Multimodal API and Audio API for image, video, audio, and speech models.

Default model: qwen-coder-32b (Professional/Enterprise) | qwen-7b (Free)

Which model should I use?

Coding assistant? qwen-coder-32b
Tool calling on a budget? qwen-coder-7b or qwen-7b
Best-in-class quality? claude-sonnet-4.5
Reasoning/chain-of-thought? deepseek-r1-distill-qwen-7b
Fastest response? gemma-2b or llama-3.3-8b-instruct

Basic Chat Completion

Anthropic Format

curl -s https://api.ainative.studio/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $AINATIVE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen-coder-32b",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Write a Python function that reverses a string"}
    ]
  }'

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "def reverse_string(s: str) -> str:\n    return s[::-1]"
    }
  ],
  "model": "qwen-coder-32b",
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 18, "output_tokens": 25 }
}

OpenAI Format

curl -s https://api.ainative.studio/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: $AINATIVE_API_KEY" \
  -d '{
    "model": "qwen-coder-32b",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Write a Python function that reverses a string"}
    ]
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "qwen-coder-32b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "def reverse_string(s: str) -> str:\n    return s[::-1]"
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 18, "completion_tokens": 25, "total_tokens": 43 }
}

Tool Calling (Function Calling)

Tool calling lets the model invoke functions you define. This is essential for building agents.

Supported Models

Only these models reliably produce tool calls:

qwen-coder-32b (best), qwen-coder-7b, nouscoder-14b
All Claude models

gemma-* and deepseek-r1-* do not support tools.

Step 1: Define Your Tools

{
  "tools": [
    {
      "name": "read_file",
      "description": "Read the contents of a file at the given path",
      "input_schema": {
        "type": "object",
        "properties": {
          "path": {
            "type": "string",
            "description": "Absolute path to the file"
          }
        },
        "required": ["path"]
      }
    },
    {
      "name": "run_command",
      "description": "Execute a shell command and return stdout",
      "input_schema": {
        "type": "object",
        "properties": {
          "command": {
            "type": "string",
            "description": "The shell command to run"
          }
        },
        "required": ["command"]
      }
    }
  ]
}

Step 2: Send Request with Tools

curl -s https://api.ainative.studio/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $AINATIVE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen-coder-32b",
    "max_tokens": 512,
    "tools": [{
      "name": "read_file",
      "description": "Read file contents",
      "input_schema": {
        "type": "object",
        "properties": {
          "path": {"type": "string", "description": "File path"}
        },
        "required": ["path"]
      }
    }],
    "messages": [
      {"role": "user", "content": "What is in /tmp/test.txt?"}
    ]
  }'

Step 3: Model Returns a Tool Use Block

When the model wants to call a tool, stop_reason is "tool_use":

{
  "id": "msg_abc123",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Let me read that file for you." },
    {
      "type": "tool_use",
      "id": "toolu_01ABC",
      "name": "read_file",
      "input": { "path": "/tmp/test.txt" }
    }
  ],
  "stop_reason": "tool_use"
}

Step 4: Execute the Tool and Return Results

Your code runs the function, then sends the result back:

{
  "model": "qwen-coder-32b",
  "max_tokens": 512,
  "tools": ["...same tools..."],
  "messages": [
    {"role": "user", "content": "What is in /tmp/test.txt?"},
    {
      "role": "assistant",
      "content": [
        {"type": "text", "text": "Let me read that file for you."},
        {"type": "tool_use", "id": "toolu_01ABC", "name": "read_file", "input": {"path": "/tmp/test.txt"}}
      ]
    },
    {
      "role": "user",
      "content": [{
        "type": "tool_result",
        "tool_use_id": "toolu_01ABC",
        "content": "Hello, world! This is a test file."
      }]
    }
  ]
}

Step 5: Model Produces Final Answer

{
  "content": [{
    "type": "text",
    "text": "The file /tmp/test.txt contains: \"Hello, world! This is a test file.\""
  }],
  "stop_reason": "end_turn"
}

Tool Calling Tips

Keep descriptions short (under 300 chars) — open-source models work best with concise descriptions
Flatten nested schemas — avoid deeply nested objects in input_schema. Prefer flat properties
Limit to ~16 tools max — open-source models degrade with too many tools. Claude handles 100+
Use tool_choice to force tool use: "tool_choice": {"type": "tool", "name": "read_file"}
Watch for XML artifacts — qwen sometimes emits <tool_call> XML in text blocks. Strip client-side:
```
text = text.replace(/<tool_call>[\s\S]*?<\/tool_call>/g, '').trim()
```

System Prompts

Anthropic Format

{
  "model": "qwen-coder-32b",
  "system": "You are a senior Python engineer. Always include type hints.",
  "messages": [{"role": "user", "content": "Write a CSV parser"}]
}

OpenAI Format

{
  "messages": [
    {"role": "system", "content": "You are a senior Python engineer."},
    {"role": "user", "content": "Write a CSV parser"}
  ]
}

Multi-Turn Conversations

Send the full history with every request:

{
  "model": "qwen-coder-32b",
  "max_tokens": 512,
  "messages": [
    {"role": "user", "content": "Write a fibonacci function"},
    {"role": "assistant", "content": "def fib(n):\n    if n <= 1: return n\n    return fib(n-1) + fib(n-2)"},
    {"role": "user", "content": "Now add memoization"}
  ]
}

Streaming

curl -s https://api.ainative.studio/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $AINATIVE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen-coder-32b",
    "max_tokens": 256,
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5"}]
  }'

Returns Server-Sent Events: message_start, content_block_delta, message_delta, message_stop.

Non-streaming is more reliable

For AINative-hosted models, "stream": false is recommended unless you need real-time output. The gateway's SSE simulation can cause parser issues with some SDK clients.

SDK Integration

Anthropic SDK

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
    apiKey: 'sk_your_ainative_key',
    baseURL: 'https://api.ainative.studio',
})

const message = await client.messages.create({
    model: 'qwen-coder-32b',
    max_tokens: 256,
    messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(message.content[0].text)

import anthropic

client = anthropic.Anthropic(
    api_key="sk_your_ainative_key",
    base_url="https://api.ainative.studio",
)
message = client.messages.create(
    model="qwen-coder-32b",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

OpenAI SDK

import OpenAI from 'openai'

const client = new OpenAI({
    apiKey: 'sk_your_ainative_key',
    baseURL: 'https://api.ainative.studio/api/v1',
})

const response = await client.chat.completions.create({
    model: 'qwen-coder-32b',
    max_tokens: 256,
    messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(response.choices[0].message.content)

Request Parameters

Parameter	Type	Default	Description
`model`	string	required	Model ID (see Available Models)
`messages`	array	required	Conversation messages
`system`	string	null	System prompt (Anthropic format)
`max_tokens`	number	`4096`	Max response tokens
`temperature`	number	`0.7`	0.0–2.0 randomness
`top_p`	number	`1.0`	Nucleus sampling
`stream`	boolean	`false`	Enable SSE streaming
`tools`	array	null	Tool definitions for function calling
`tool_choice`	object	`auto`	Force specific tool or auto-select

Error Handling

Status	Meaning	Action
200	Success	Parse response
400	Invalid request	Check request body
401	Bad API key	Verify `sk_` prefix (not `sk-ant-`)
429	Rate limited	Retry with backoff, check `retry-after` header
500	Server error	Retry with exponential backoff
502/503	Model loading	Retry in 5–10s

Retry Example

async function callWithRetry(body, maxRetries = 3) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        const resp = await fetch('https://api.ainative.studio/v1/messages', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'x-api-key': process.env.AINATIVE_API_KEY,
                'anthropic-version': '2023-06-01',
            },
            body: JSON.stringify(body),
            signal: AbortSignal.timeout(60_000),
        })
        if (resp.ok) return resp.json()
        if (resp.status === 429) {
            const retryAfter = resp.headers.get('retry-after')
            await new Promise(r => setTimeout(r, (parseInt(retryAfter) || 5) * 1000))
            continue
        }
        if (resp.status >= 500) {
            await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000))
            continue
        }
        throw new Error(`API error ${resp.status}: ${await resp.text()}`)
    }
    throw new Error('Max retries exceeded')
}

Rate Limits & Credits

Plan	Monthly Credits	Requests/hour	Price
Starter	1,000	60	Free
Pro	50,000	300	$49/mo
Business	150,000	1,000	$149/mo
Enterprise	200,000	5,000	$699/mo

Check your balance:

curl -s https://api.ainative.studio/api/v1/public/credits/balance \
  -H "x-api-key: $AINATIVE_API_KEY"

Full Example: Agent with Tool Loop

const BASE = 'https://api.ainative.studio'
const API_KEY = process.env.AINATIVE_API_KEY!

const tools = [{
    name: 'read_file',
    description: 'Read a file',
    input_schema: {
        type: 'object',
        properties: { path: { type: 'string' } },
        required: ['path'],
    },
}]

async function runAgent(userPrompt: string) {
    const messages: any[] = [{ role: 'user', content: userPrompt }]

    while (true) {
        const resp = await fetch(`${BASE}/v1/messages`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'x-api-key': API_KEY,
                'anthropic-version': '2023-06-01',
            },
            body: JSON.stringify({
                model: 'qwen-coder-32b',
                max_tokens: 4096,
                tools,
                messages,
            }),
        })
        const data = await resp.json()
        messages.push({ role: 'assistant', content: data.content })

        if (data.stop_reason !== 'tool_use') {
            return data.content.filter((b: any) => b.type === 'text')
                .map((b: any) => b.text).join('')
        }

        // Execute tool calls
        const toolResults = []
        for (const block of data.content) {
            if (block.type !== 'tool_use') continue
            const result = await executeToolLocally(block.name, block.input)
            toolResults.push({
                type: 'tool_result',
                tool_use_id: block.id,
                content: result,
            })
        }
        messages.push({ role: 'user', content: toolResults })
    }
}

Known Quirks

API key format: sk-ant- keys (Anthropic direct) return 401. Use sk_ keys only.
Tool schema: Keep schemas flat and descriptions under 300 chars for open-source models.
XML artifacts: Qwen may emit <tool_call> XML alongside proper tool_use blocks. Strip these client-side.
Streaming: Use "stream": false for maximum reliability with open-source models.
max_tokens: Always set explicitly. qwen-coder-32b supports up to 32768, Claude up to 8192.
Non-working endpoints: /api/oauth/roles, /api/oauth/profile don't exist on our gateway. Use /api/v1/auth/me instead.

Next Steps

Authentication — API keys and JWT tokens
SDKs — Client libraries for React, Next.js, Python
Models Reference — Full model catalog with pricing
Developer Program — Build apps and earn revenue

Quick Start​

Authentication​

API Key (server-to-server)​

Bearer Token (user-facing apps)​

Endpoints​

Available Models​

Code Models​

Text Models​

Reasoning Models​

Frontier Models (routed through our gateway)​

Multimodal Models​

Which model should I use?​

Basic Chat Completion​

Anthropic Format​

OpenAI Format​

Tool Calling (Function Calling)​

Supported Models​

Step 1: Define Your Tools​

Step 2: Send Request with Tools​

Step 3: Model Returns a Tool Use Block​

Step 4: Execute the Tool and Return Results​

Step 5: Model Produces Final Answer​

Tool Calling Tips​

System Prompts​

Anthropic Format​

OpenAI Format​

Multi-Turn Conversations​

Streaming​

SDK Integration​

Anthropic SDK​

OpenAI SDK​

Request Parameters​

Error Handling​

Retry Example​

Rate Limits & Credits​

Full Example: Agent with Tool Loop​

Known Quirks​

Next Steps​

Quick Start

Authentication

API Key (server-to-server)

Bearer Token (user-facing apps)

Endpoints

Available Models

Code Models

Text Models

Reasoning Models

Frontier Models (routed through our gateway)

Multimodal Models

Which model should I use?

Basic Chat Completion

Anthropic Format

OpenAI Format

Tool Calling (Function Calling)

Supported Models

Step 1: Define Your Tools

Step 2: Send Request with Tools

Step 3: Model Returns a Tool Use Block

Step 4: Execute the Tool and Return Results

Step 5: Model Produces Final Answer

Tool Calling Tips

System Prompts

Anthropic Format

OpenAI Format

Multi-Turn Conversations

Streaming

SDK Integration

Anthropic SDK

OpenAI SDK

Request Parameters

Error Handling

Retry Example

Rate Limits & Credits

Full Example: Agent with Tool Loop

Known Quirks

Next Steps