Multimodal API

Generate speech, images, and videos using GPU-powered models. All endpoints consume credits and return base64-encoded results.

Base URL: https://api.ainative.studio/api/v1/multimodal

Authentication

All endpoints require authentication via API key or Bearer token.

POST /api/v1/multimodal/tts
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json

Text-to-Speech (TTS)

Generate high-quality speech from text using MiniMax TTS.

POST/api/v1/multimodal/tts🔒

Cost: 14 credits per generation | Rate limit: 10/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/tts \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to AINative Studio.",
    "voice": "Wise_Woman"
  }'

import requests

response = requests.post(
    "https://api.ainative.studio/api/v1/multimodal/tts",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "text": "Welcome to AINative Studio.",
        "voice": "Wise_Woman"
    }
)

data = response.json()
# data["audio_base64"] — base64-encoded MP3
# data["credits_used"] — credits consumed

Parameters:

Parameter	Type	Required	Default	Description
`text`	`string`	Yes	—	Text to synthesize (1-5000 chars)
`voice`	`string`	No	`Wise_Woman`	Voice profile

Response:

Field	Type	Description
`audio_base64`	`string`	Base64-encoded MP3 audio
`format`	`string`	Audio format (`mp3`)
`duration_ms`	`int`	Audio duration in milliseconds
`usage_id`	`string`	Usage record ID for tracking
`credits_used`	`int`	Credits consumed
`credits_remaining`	`int`	Remaining credit balance

Image Generation

Generate images from text prompts using Qwen Image Edit with optional LoRA styles.

POST/api/v1/multimodal/image🔒

Cost: 50 credits per image | Rate limit: 5/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/image \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A futuristic cityscape at sunset with neon lights",
    "width": 1024,
    "height": 1024
  }'

import requests, base64

response = requests.post(
    "https://api.ainative.studio/api/v1/multimodal/image",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "prompt": "A futuristic cityscape at sunset with neon lights",
        "width": 1024,
        "height": 1024
    }
)

data = response.json()
# Save image
with open("output.png", "wb") as f:
    f.write(base64.b64decode(data["image_base64"]))

Parameters:

Parameter	Type	Required	Default	Description
`prompt`	`string`	Yes	—	Image description (1-2000 chars)
`width`	`int`	No	`1024`	Image width (512-2048)
`height`	`int`	No	`1024`	Image height (512-2048)
`style`	`string`	No	`null`	Optional LoRA style preset

Response:

Field	Type	Description
`image_base64`	`string`	Base64-encoded image
`format`	`string`	Image format (`jpg` or `png`)
`width` / `height`	`int`	Actual dimensions
`credits_used`	`int`	Credits consumed

Video: Image-to-Video (I2V)

Animate a static image into a video with motion prompts.

POST/api/v1/multimodal/video/i2v🔒

Rate limit: 2/min

Provider	Cost	Duration	Resolution	Quality
`wan22` (default)	400 credits	5s	1280x720	Best value
`seedance`	520 credits	5s	1280x720	High quality
`sora2` (premium)	800 credits	4s	1280x720	Cinematic

curl -X POST https://api.ainative.studio/api/v1/multimodal/video/i2v \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/photo.jpg",
    "motion_prompt": "Camera slowly zooms in while clouds drift across the sky",
    "provider": "wan22"
  }'

Parameters:

Parameter	Type	Required	Default	Description
`image_url`	`string`	Yes	—	Public URL of source image
`motion_prompt`	`string`	Yes	—	Motion description (1-1000 chars)
`provider`	`string`	No	`wan22`	`wan22`, `seedance`, or `sora2`

Video: Text-to-Video (T2V)

Generate a full video from a text description.

POST/api/v1/multimodal/video/t2v🔒

Cost: 1000 credits | Rate limit: 1/min | Tier: Premium only

curl -X POST https://api.ainative.studio/api/v1/multimodal/video/t2v \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A serene ocean wave crashing on a beach at golden hour",
    "duration": 5
  }'

Parameters:

Parameter	Type	Required	Default	Description
`prompt`	`string`	Yes	—	Video description (1-1000 chars)
`duration`	`int`	No	`5`	Duration in seconds (1-10)

Video: CogVideoX

Generate video using the dedicated CogVideoX-2B model with fine-grained control.

POST/api/v1/multimodal/video/cogvideox🔒

Cost: 800 credits | Rate limit: 1/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/video/cogvideox \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A robot walking through a garden in the rain",
    "num_frames": 49,
    "guidance_scale": 6.0
  }'

Parameters:

Parameter	Type	Required	Default	Description
`prompt`	`string`	Yes	—	Video description (1-1000 chars)
`num_frames`	`int`	No	`49`	Frame count: 17, 33, or 49
`guidance_scale`	`float`	No	`6.0`	CFG scale (1.0-20.0)
`num_inference_steps`	`int`	No	`50`	Denoising steps (20-100)

Video Response (all video endpoints):

Field	Type	Description
`video_url`	`string`	URL to download generated video
`format`	`string`	Video format (`mp4`)
`duration_seconds`	`float`	Video duration
`usage_id`	`string`	Usage tracking ID
`credits_used`	`int`	Credits consumed
`credits_remaining`	`int`	Remaining balance

Usage History

Track your multimodal API usage across all endpoints.

GET/api/v1/multimodal/usage🔒

curl https://api.ainative.studio/api/v1/multimodal/usage?limit=10 \
  -H "Authorization: Bearer $TOKEN"

Query Parameters:

Parameter	Type	Default	Description
`skip`	`int`	`0`	Pagination offset
`limit`	`int`	`50`	Max records (1-100)
`endpoint_type`	`string`	—	Filter: `tts`, `image`, `video_i2v`, `video_t2v`, `video_cogvideox`

Credit Costs Summary

Endpoint	Credits	Approx. Cost
TTS	14	$0.007
Image	50	$0.025
I2V (wan22)	400	$0.20
I2V (seedance)	520	$0.26
I2V (sora2)	800	$0.40
T2V	1000	$0.50
CogVideoX	800	$0.40

Error Codes

Code	Meaning
`402`	Insufficient credits
`400`	Invalid request (bad params, unsupported format)
`429`	Rate limit exceeded
`500`	GPU provider error — retry after 30s

For AI Agents

Agents can use these endpoints to generate media autonomously. Best practices:

Check credits first via GET /api/v1/managed/usage before starting expensive operations
Use wan22 for I2V — best quality-to-cost ratio
Poll usage records via GET /api/v1/multimodal/usage/{usage_id} to check generation status
Base64 responses can be piped directly to file storage via POST /api/v1/public/zerodb/files/upload

Authentication​

Text-to-Speech (TTS)​

Image Generation​

Video: Image-to-Video (I2V)​

Video: Text-to-Video (T2V)​

Video: CogVideoX​

Usage History​

Credit Costs Summary​

Error Codes​

For AI Agents​

Authentication

Text-to-Speech (TTS)

Image Generation

Video: Image-to-Video (I2V)

Video: Text-to-Video (T2V)

Video: CogVideoX

Usage History

Credit Costs Summary

Error Codes

For AI Agents