Chat Completions API
Build agentic applications with open-source and frontier models. Full tool calling, streaming, and multi-turn conversation support.
Base URL: https://api.ainative.studio
Quick Start
curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_SK_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 64,
"messages": [{"role": "user", "content": "Say hello"}]
}' | jq .
If you get a JSON response with "role": "assistant", you're good.
Authentication
API Key (server-to-server)
POST /v1/messages
x-api-key: sk_your_key_here
Content-Type: application/json
Accepted key formats:
sk_prefix — AINative native keys (from the dashboard)- Platform tokens — issued by OAuth flow
NOT accepted: sk-ant- prefix (Anthropic-direct keys return 401)
Bearer Token (user-facing apps)
POST /v1/messages
Authorization: Bearer eyJhbGc...
Content-Type: application/json
Bearer tokens expire in 24 hours. Refresh via POST /api/v1/auth/refresh.
Endpoints
Two formats, same backend:
| Endpoint | Format | Best For |
|---|---|---|
POST /v1/messages | Anthropic Messages API | Anthropic SDK, Cody CLI |
POST /api/v1/chat/completions | OpenAI-compatible | OpenAI SDK, general integrations |
Available Models
Code Models
| Model ID | Params | Provider | Tool Calling | Speed | Plan |
|---|---|---|---|---|---|
qwen-coder-32b | 32B | HuggingFace | Yes | Medium | Pro+ |
qwen-coder-7b | 7B | HuggingFace | Yes | Fast | Basic+ |
nouscoder-14b | 14B | HuggingFace | Yes | Medium | Pro+ |
minimax-m2.7 | — | NVIDIA NIM | No | Fast | Pro+ |
Text Models
| Model ID | Params | Provider | Tool Calling | Speed | Plan |
|---|---|---|---|---|---|
qwen-7b | 7B | HuggingFace | Yes | Fast | Free |
gemma-2b | 4B* | HuggingFace | Yes | Fastest | Free |
gemma-9b | 27B* | HuggingFace | Yes | Medium | Pro+ |
llama-3.3-8b-instruct | 8B | Meta | Yes | Fast | Free |
llama-3.3-70b-instruct | 70B | Meta | Yes | Slow | Basic+ |
llama-4-maverick-17b-128e | 17B | Meta | Yes | Medium | Pro+ |
*gemma models remapped to latest available versions (gemma-3n, gemma-3-27b)
Reasoning Models
| Model ID | Params | Provider | Tool Calling | Speed | Plan |
|---|---|---|---|---|---|
deepseek-r1-distill-qwen-7b | 7B | HuggingFace | Yes | Medium | Pro+ |
deepseek-r1-distill-llama-8b | 8B | HuggingFace | Yes | Medium | Pro+ |
deepseek-r1 | 671B | HuggingFace | Yes | Slow | Enterprise |
Frontier Models (routed through our gateway)
| Model ID | Provider | Tool Calling | Plan |
|---|---|---|---|
claude-3-5-haiku | Anthropic | Yes | Basic+ |
claude-sonnet-4.5 | Anthropic | Yes | Pro+ |
claude-opus-4 | Anthropic | Yes | Enterprise |
Your sk_ key works — no separate Anthropic key needed.
Multimodal Models
See Multimodal API and Audio API for image, video, audio, and speech models.
Default model: qwen-coder-32b (Professional/Enterprise) | qwen-7b (Free)
Which model should I use?
- Coding assistant?
qwen-coder-32b - Tool calling on a budget?
qwen-coder-7borqwen-7b - Best-in-class quality?
claude-sonnet-4.5 - Reasoning/chain-of-thought?
deepseek-r1-distill-qwen-7b - Fastest response?
gemma-2borllama-3.3-8b-instruct
Basic Chat Completion
Anthropic Format
curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Write a Python function that reverses a string"}
]
}'
Response:
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "def reverse_string(s: str) -> str:\n return s[::-1]"
}
],
"model": "qwen-coder-32b",
"stop_reason": "end_turn",
"usage": { "input_tokens": 18, "output_tokens": 25 }
}
OpenAI Format
curl -s https://api.ainative.studio/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Write a Python function that reverses a string"}
]
}'
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "qwen-coder-32b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "def reverse_string(s: str) -> str:\n return s[::-1]"
},
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 18, "completion_tokens": 25, "total_tokens": 43 }
}
Tool Calling (Function Calling)
Tool calling lets the model invoke functions you define. This is essential for building agents.
Supported Models
Only these models reliably produce tool calls:
qwen-coder-32b(best),qwen-coder-7b,nouscoder-14b- All Claude models
gemma-* and deepseek-r1-* do not support tools.
Step 1: Define Your Tools
{
"tools": [
{
"name": "read_file",
"description": "Read the contents of a file at the given path",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute path to the file"
}
},
"required": ["path"]
}
},
{
"name": "run_command",
"description": "Execute a shell command and return stdout",
"input_schema": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The shell command to run"
}
},
"required": ["command"]
}
}
]
}
Step 2: Send Request with Tools
curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 512,
"tools": [{
"name": "read_file",
"description": "Read file contents",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
}],
"messages": [
{"role": "user", "content": "What is in /tmp/test.txt?"}
]
}'
Step 3: Model Returns a Tool Use Block
When the model wants to call a tool, stop_reason is "tool_use":
{
"id": "msg_abc123",
"role": "assistant",
"content": [
{ "type": "text", "text": "Let me read that file for you." },
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "read_file",
"input": { "path": "/tmp/test.txt" }
}
],
"stop_reason": "tool_use"
}
Step 4: Execute the Tool and Return Results
Your code runs the function, then sends the result back:
{
"model": "qwen-coder-32b",
"max_tokens": 512,
"tools": ["...same tools..."],
"messages": [
{"role": "user", "content": "What is in /tmp/test.txt?"},
{
"role": "assistant",
"content": [
{"type": "text", "text": "Let me read that file for you."},
{"type": "tool_use", "id": "toolu_01ABC", "name": "read_file", "input": {"path": "/tmp/test.txt"}}
]
},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": "toolu_01ABC",
"content": "Hello, world! This is a test file."
}]
}
]
}
Step 5: Model Produces Final Answer
{
"content": [{
"type": "text",
"text": "The file /tmp/test.txt contains: \"Hello, world! This is a test file.\""
}],
"stop_reason": "end_turn"
}
Tool Calling Tips
- Keep descriptions short (under 300 chars) — open-source models work best with concise descriptions
- Flatten nested schemas — avoid deeply nested objects in
input_schema. Prefer flat properties - Limit to ~16 tools max — open-source models degrade with too many tools. Claude handles 100+
- Use
tool_choiceto force tool use:"tool_choice": {"type": "tool", "name": "read_file"} - Watch for XML artifacts — qwen sometimes emits
<tool_call>XML in text blocks. Strip client-side:text = text.replace(/<tool_call>[\s\S]*?<\/tool_call>/g, '').trim()
System Prompts
Anthropic Format
{
"model": "qwen-coder-32b",
"system": "You are a senior Python engineer. Always include type hints.",
"messages": [{"role": "user", "content": "Write a CSV parser"}]
}
OpenAI Format
{
"messages": [
{"role": "system", "content": "You are a senior Python engineer."},
{"role": "user", "content": "Write a CSV parser"}
]
}
Multi-Turn Conversations
Send the full history with every request:
{
"model": "qwen-coder-32b",
"max_tokens": 512,
"messages": [
{"role": "user", "content": "Write a fibonacci function"},
{"role": "assistant", "content": "def fib(n):\n if n <= 1: return n\n return fib(n-1) + fib(n-2)"},
{"role": "user", "content": "Now add memoization"}
]
}
Streaming
curl -s https://api.ainative.studio/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "qwen-coder-32b",
"max_tokens": 256,
"stream": true,
"messages": [{"role": "user", "content": "Count to 5"}]
}'
Returns Server-Sent Events: message_start, content_block_delta, message_delta, message_stop.
For AINative-hosted models, "stream": false is recommended unless you need real-time output. The gateway's SSE simulation can cause parser issues with some SDK clients.
SDK Integration
Anthropic SDK
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
apiKey: 'sk_your_ainative_key',
baseURL: 'https://api.ainative.studio',
})
const message = await client.messages.create({
model: 'qwen-coder-32b',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(message.content[0].text)
import anthropic
client = anthropic.Anthropic(
api_key="sk_your_ainative_key",
base_url="https://api.ainative.studio",
)
message = client.messages.create(
model="qwen-coder-32b",
max_tokens=256,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)
OpenAI SDK
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'sk_your_ainative_key',
baseURL: 'https://api.ainative.studio/api/v1',
})
const response = await client.chat.completions.create({
model: 'qwen-coder-32b',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(response.choices[0].message.content)
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model ID (see Available Models) |
messages | array | required | Conversation messages |
system | string | null | System prompt (Anthropic format) |
max_tokens | number | 4096 | Max response tokens |
temperature | number | 0.7 | 0.0–2.0 randomness |
top_p | number | 1.0 | Nucleus sampling |
stream | boolean | false | Enable SSE streaming |
tools | array | null | Tool definitions for function calling |
tool_choice | object | auto | Force specific tool or auto-select |
Error Handling
| Status | Meaning | Action |
|---|---|---|
| 200 | Success | Parse response |
| 400 | Invalid request | Check request body |
| 401 | Bad API key | Verify sk_ prefix (not sk-ant-) |
| 429 | Rate limited | Retry with backoff, check retry-after header |
| 500 | Server error | Retry with exponential backoff |
| 502/503 | Model loading | Retry in 5–10s |
Retry Example
async function callWithRetry(body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const resp = await fetch('https://api.ainative.studio/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.AINATIVE_API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify(body),
signal: AbortSignal.timeout(60_000),
})
if (resp.ok) return resp.json()
if (resp.status === 429) {
const retryAfter = resp.headers.get('retry-after')
await new Promise(r => setTimeout(r, (parseInt(retryAfter) || 5) * 1000))
continue
}
if (resp.status >= 500) {
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000))
continue
}
throw new Error(`API error ${resp.status}: ${await resp.text()}`)
}
throw new Error('Max retries exceeded')
}
Rate Limits & Credits
| Plan | Monthly Credits | Requests/hour | Price |
|---|---|---|---|
| Starter | 1,000 | 60 | Free |
| Pro | 50,000 | 300 | $49/mo |
| Business | 150,000 | 1,000 | $149/mo |
| Enterprise | 200,000 | 5,000 | $699/mo |
Check your balance:
curl -s https://api.ainative.studio/api/v1/public/credits/balance \
-H "x-api-key: $AINATIVE_API_KEY"
Full Example: Agent with Tool Loop
const BASE = 'https://api.ainative.studio'
const API_KEY = process.env.AINATIVE_API_KEY!
const tools = [{
name: 'read_file',
description: 'Read a file',
input_schema: {
type: 'object',
properties: { path: { type: 'string' } },
required: ['path'],
},
}]
async function runAgent(userPrompt: string) {
const messages: any[] = [{ role: 'user', content: userPrompt }]
while (true) {
const resp = await fetch(`${BASE}/v1/messages`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'qwen-coder-32b',
max_tokens: 4096,
tools,
messages,
}),
})
const data = await resp.json()
messages.push({ role: 'assistant', content: data.content })
if (data.stop_reason !== 'tool_use') {
return data.content.filter((b: any) => b.type === 'text')
.map((b: any) => b.text).join('')
}
// Execute tool calls
const toolResults = []
for (const block of data.content) {
if (block.type !== 'tool_use') continue
const result = await executeToolLocally(block.name, block.input)
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: result,
})
}
messages.push({ role: 'user', content: toolResults })
}
}
Known Quirks
- API key format:
sk-ant-keys (Anthropic direct) return401. Usesk_keys only. - Tool schema: Keep schemas flat and descriptions under 300 chars for open-source models.
- XML artifacts: Qwen may emit
<tool_call>XML alongside propertool_useblocks. Strip these client-side. - Streaming: Use
"stream": falsefor maximum reliability with open-source models. - max_tokens: Always set explicitly.
qwen-coder-32bsupports up to 32768, Claude up to 8192. - Non-working endpoints:
/api/oauth/roles,/api/oauth/profiledon't exist on our gateway. Use/api/v1/auth/meinstead.
Next Steps
- Authentication — API keys and JWT tokens
- SDKs — Client libraries for React, Next.js, Python
- Models Reference — Full model catalog with pricing
- Developer Program — Build apps and earn revenue