Skip to main content

Chat Completions API

Build agentic applications with open-source and frontier models. Full tool calling, streaming, and multi-turn conversation support.

Base URL: https://api.ainative.studio

Quick Start

curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_SK_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 64,
"messages": [{"role": "user", "content": "Say hello"}]
}' | jq .

If you get a JSON response with "role": "assistant", you're good.

Authentication

API Key (server-to-server)

POST /v1/messages
x-api-key: sk_your_key_here
Content-Type: application/json

Accepted key formats:

  • sk_ prefix — AINative native keys (from the dashboard)
  • Platform tokens — issued by OAuth flow

NOT accepted: sk-ant- prefix (Anthropic-direct keys return 401)

Bearer Token (user-facing apps)

POST /v1/messages
Authorization: Bearer eyJhbGc...
Content-Type: application/json

Bearer tokens expire in 24 hours. Refresh via POST /api/v1/auth/refresh.

Endpoints

Two formats, same backend:

EndpointFormatBest For
POST /v1/messagesAnthropic Messages APIAnthropic SDK, Cody CLI
POST /api/v1/chat/completionsOpenAI-compatibleOpenAI SDK, general integrations

Available Models

Code Models

Model IDParamsProviderTool CallingSpeedPlan
qwen-coder-32b32BHuggingFaceYesMediumPro+
qwen-coder-7b7BHuggingFaceYesFastBasic+
nouscoder-14b14BHuggingFaceYesMediumPro+
minimax-m2.7NVIDIA NIMNoFastPro+

Text Models

Model IDParamsProviderTool CallingSpeedPlan
qwen-7b7BHuggingFaceYesFastFree
gemma-2b4B*HuggingFaceYesFastestFree
gemma-9b27B*HuggingFaceYesMediumPro+
llama-3.3-8b-instruct8BMetaYesFastFree
llama-3.3-70b-instruct70BMetaYesSlowBasic+
llama-4-maverick-17b-128e17BMetaYesMediumPro+

*gemma models remapped to latest available versions (gemma-3n, gemma-3-27b)

Reasoning Models

Model IDParamsProviderTool CallingSpeedPlan
deepseek-r1-distill-qwen-7b7BHuggingFaceYesMediumPro+
deepseek-r1-distill-llama-8b8BHuggingFaceYesMediumPro+
deepseek-r1671BHuggingFaceYesSlowEnterprise

Frontier Models (routed through our gateway)

Model IDProviderTool CallingPlan
claude-3-5-haikuAnthropicYesBasic+
claude-sonnet-4.5AnthropicYesPro+
claude-opus-4AnthropicYesEnterprise

Your sk_ key works — no separate Anthropic key needed.

Multimodal Models

See Multimodal API and Audio API for image, video, audio, and speech models.

Default model: qwen-coder-32b (Professional/Enterprise) | qwen-7b (Free)

Which model should I use?

  • Coding assistant? qwen-coder-32b
  • Tool calling on a budget? qwen-coder-7b or qwen-7b
  • Best-in-class quality? claude-sonnet-4.5
  • Reasoning/chain-of-thought? deepseek-r1-distill-qwen-7b
  • Fastest response? gemma-2b or llama-3.3-8b-instruct

Basic Chat Completion

Anthropic Format

curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Write a Python function that reverses a string"}
]
}'

Response:

{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "def reverse_string(s: str) -> str:\n return s[::-1]"
}
],
"model": "qwen-coder-32b",
"stop_reason": "end_turn",
"usage": { "input_tokens": 18, "output_tokens": 25 }
}

OpenAI Format

curl -s https://api.ainative.studio/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Write a Python function that reverses a string"}
]
}'

Response:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "qwen-coder-32b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "def reverse_string(s: str) -> str:\n return s[::-1]"
},
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 18, "completion_tokens": 25, "total_tokens": 43 }
}

Tool Calling (Function Calling)

Tool calling lets the model invoke functions you define. This is essential for building agents.

Supported Models

Only these models reliably produce tool calls:

  • qwen-coder-32b (best), qwen-coder-7b, nouscoder-14b
  • All Claude models

gemma-* and deepseek-r1-* do not support tools.

Step 1: Define Your Tools

{
"tools": [
{
"name": "read_file",
"description": "Read the contents of a file at the given path",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute path to the file"
}
},
"required": ["path"]
}
},
{
"name": "run_command",
"description": "Execute a shell command and return stdout",
"input_schema": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The shell command to run"
}
},
"required": ["command"]
}
}
]
}

Step 2: Send Request with Tools

curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 512,
"tools": [{
"name": "read_file",
"description": "Read file contents",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
}],
"messages": [
{"role": "user", "content": "What is in /tmp/test.txt?"}
]
}'

Step 3: Model Returns a Tool Use Block

When the model wants to call a tool, stop_reason is "tool_use":

{
"id": "msg_abc123",
"role": "assistant",
"content": [
{ "type": "text", "text": "Let me read that file for you." },
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "read_file",
"input": { "path": "/tmp/test.txt" }
}
],
"stop_reason": "tool_use"
}

Step 4: Execute the Tool and Return Results

Your code runs the function, then sends the result back:

{
"model": "qwen-coder-32b",
"max_tokens": 512,
"tools": ["...same tools..."],
"messages": [
{"role": "user", "content": "What is in /tmp/test.txt?"},
{
"role": "assistant",
"content": [
{"type": "text", "text": "Let me read that file for you."},
{"type": "tool_use", "id": "toolu_01ABC", "name": "read_file", "input": {"path": "/tmp/test.txt"}}
]
},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": "toolu_01ABC",
"content": "Hello, world! This is a test file."
}]
}
]
}

Step 5: Model Produces Final Answer

{
"content": [{
"type": "text",
"text": "The file /tmp/test.txt contains: \"Hello, world! This is a test file.\""
}],
"stop_reason": "end_turn"
}

Tool Calling Tips

  1. Keep descriptions short (under 300 chars) — open-source models work best with concise descriptions
  2. Flatten nested schemas — avoid deeply nested objects in input_schema. Prefer flat properties
  3. Limit to ~16 tools max — open-source models degrade with too many tools. Claude handles 100+
  4. Use tool_choice to force tool use: "tool_choice": {"type": "tool", "name": "read_file"}
  5. Watch for XML artifacts — qwen sometimes emits <tool_call> XML in text blocks. Strip client-side:
    text = text.replace(/<tool_call>[\s\S]*?<\/tool_call>/g, '').trim()

System Prompts

Anthropic Format

{
"model": "qwen-coder-32b",
"system": "You are a senior Python engineer. Always include type hints.",
"messages": [{"role": "user", "content": "Write a CSV parser"}]
}

OpenAI Format

{
"messages": [
{"role": "system", "content": "You are a senior Python engineer."},
{"role": "user", "content": "Write a CSV parser"}
]
}

Multi-Turn Conversations

Send the full history with every request:

{
"model": "qwen-coder-32b",
"max_tokens": 512,
"messages": [
{"role": "user", "content": "Write a fibonacci function"},
{"role": "assistant", "content": "def fib(n):\n if n <= 1: return n\n return fib(n-1) + fib(n-2)"},
{"role": "user", "content": "Now add memoization"}
]
}

Streaming

curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"stream": true,
"messages": [{"role": "user", "content": "Count to 5"}]
}'

Returns Server-Sent Events: message_start, content_block_delta, message_delta, message_stop.

Non-streaming is more reliable

For AINative-hosted models, "stream": false is recommended unless you need real-time output. The gateway's SSE simulation can cause parser issues with some SDK clients.

SDK Integration

Anthropic SDK

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
apiKey: 'sk_your_ainative_key',
baseURL: 'https://api.ainative.studio',
})

const message = await client.messages.create({
model: 'qwen-coder-32b',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(message.content[0].text)
import anthropic

client = anthropic.Anthropic(
api_key="sk_your_ainative_key",
base_url="https://api.ainative.studio",
)
message = client.messages.create(
model="qwen-coder-32b",
max_tokens=256,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

OpenAI SDK

import OpenAI from 'openai'

const client = new OpenAI({
apiKey: 'sk_your_ainative_key',
baseURL: 'https://api.ainative.studio/api/v1',
})

const response = await client.chat.completions.create({
model: 'qwen-coder-32b',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(response.choices[0].message.content)

Request Parameters

ParameterTypeDefaultDescription
modelstringrequiredModel ID (see Available Models)
messagesarrayrequiredConversation messages
systemstringnullSystem prompt (Anthropic format)
max_tokensnumber4096Max response tokens
temperaturenumber0.70.0–2.0 randomness
top_pnumber1.0Nucleus sampling
streambooleanfalseEnable SSE streaming
toolsarraynullTool definitions for function calling
tool_choiceobjectautoForce specific tool or auto-select

Error Handling

StatusMeaningAction
200SuccessParse response
400Invalid requestCheck request body
401Bad API keyVerify sk_ prefix (not sk-ant-)
429Rate limitedRetry with backoff, check retry-after header
500Server errorRetry with exponential backoff
502/503Model loadingRetry in 5–10s

Retry Example

async function callWithRetry(body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const resp = await fetch('https://api.ainative.studio/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.AINATIVE_API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify(body),
signal: AbortSignal.timeout(60_000),
})
if (resp.ok) return resp.json()
if (resp.status === 429) {
const retryAfter = resp.headers.get('retry-after')
await new Promise(r => setTimeout(r, (parseInt(retryAfter) || 5) * 1000))
continue
}
if (resp.status >= 500) {
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000))
continue
}
throw new Error(`API error ${resp.status}: ${await resp.text()}`)
}
throw new Error('Max retries exceeded')
}

Rate Limits & Credits

PlanMonthly CreditsRequests/hourPrice
Starter1,00060Free
Pro50,000300$49/mo
Business150,0001,000$149/mo
Enterprise200,0005,000$699/mo

Check your balance:

curl -s https://api.ainative.studio/api/v1/public/credits/balance \
-H "x-api-key: $AINATIVE_API_KEY"

Full Example: Agent with Tool Loop

const BASE = 'https://api.ainative.studio'
const API_KEY = process.env.AINATIVE_API_KEY!

const tools = [{
name: 'read_file',
description: 'Read a file',
input_schema: {
type: 'object',
properties: { path: { type: 'string' } },
required: ['path'],
},
}]

async function runAgent(userPrompt: string) {
const messages: any[] = [{ role: 'user', content: userPrompt }]

while (true) {
const resp = await fetch(`${BASE}/v1/messages`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'qwen-coder-32b',
max_tokens: 4096,
tools,
messages,
}),
})
const data = await resp.json()
messages.push({ role: 'assistant', content: data.content })

if (data.stop_reason !== 'tool_use') {
return data.content.filter((b: any) => b.type === 'text')
.map((b: any) => b.text).join('')
}

// Execute tool calls
const toolResults = []
for (const block of data.content) {
if (block.type !== 'tool_use') continue
const result = await executeToolLocally(block.name, block.input)
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: result,
})
}
messages.push({ role: 'user', content: toolResults })
}
}

Known Quirks

  1. API key format: sk-ant- keys (Anthropic direct) return 401. Use sk_ keys only.
  2. Tool schema: Keep schemas flat and descriptions under 300 chars for open-source models.
  3. XML artifacts: Qwen may emit <tool_call> XML alongside proper tool_use blocks. Strip these client-side.
  4. Streaming: Use "stream": false for maximum reliability with open-source models.
  5. max_tokens: Always set explicitly. qwen-coder-32b supports up to 32768, Claude up to 8192.
  6. Non-working endpoints: /api/oauth/roles, /api/oauth/profile don't exist on our gateway. Use /api/v1/auth/me instead.

Next Steps