Multimodal API
Generate speech, images, and videos using GPU-powered models. All endpoints consume credits and return base64-encoded results.
Base URL: https://api.ainative.studio/api/v1/multimodal
Authentication
All endpoints require authentication via API key or Bearer token.
POST /api/v1/multimodal/tts
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json
Text-to-Speech (TTS)
Generate high-quality speech from text using MiniMax TTS.
/api/v1/multimodal/tts🔒Cost: 14 credits per generation | Rate limit: 10/min
curl -X POST https://api.ainative.studio/api/v1/multimodal/tts \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to AINative Studio.",
"voice": "Wise_Woman"
}'
import requests
response = requests.post(
"https://api.ainative.studio/api/v1/multimodal/tts",
headers={"Authorization": f"Bearer {token}"},
json={
"text": "Welcome to AINative Studio.",
"voice": "Wise_Woman"
}
)
data = response.json()
# data["audio_base64"] — base64-encoded MP3
# data["credits_used"] — credits consumed
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | — | Text to synthesize (1-5000 chars) |
voice | string | No | Wise_Woman | Voice profile |
Response:
| Field | Type | Description |
|---|---|---|
audio_base64 | string | Base64-encoded MP3 audio |
format | string | Audio format (mp3) |
duration_ms | int | Audio duration in milliseconds |
usage_id | string | Usage record ID for tracking |
credits_used | int | Credits consumed |
credits_remaining | int | Remaining credit balance |
Image Generation
Generate images from text prompts using Qwen Image Edit with optional LoRA styles.
/api/v1/multimodal/image🔒Cost: 50 credits per image | Rate limit: 5/min
curl -X POST https://api.ainative.studio/api/v1/multimodal/image \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A futuristic cityscape at sunset with neon lights",
"width": 1024,
"height": 1024
}'
import requests, base64
response = requests.post(
"https://api.ainative.studio/api/v1/multimodal/image",
headers={"Authorization": f"Bearer {token}"},
json={
"prompt": "A futuristic cityscape at sunset with neon lights",
"width": 1024,
"height": 1024
}
)
data = response.json()
# Save image
with open("output.png", "wb") as f:
f.write(base64.b64decode(data["image_base64"]))
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | — | Image description (1-2000 chars) |
width | int | No | 1024 | Image width (512-2048) |
height | int | No | 1024 | Image height (512-2048) |
style | string | No | null | Optional LoRA style preset |
Response:
| Field | Type | Description |
|---|---|---|
image_base64 | string | Base64-encoded image |
format | string | Image format (jpg or png) |
width / height | int | Actual dimensions |
credits_used | int | Credits consumed |
Video: Image-to-Video (I2V)
Animate a static image into a video with motion prompts.
/api/v1/multimodal/video/i2v🔒Rate limit: 2/min
| Provider | Cost | Duration | Resolution | Quality |
|---|---|---|---|---|
wan22 (default) | 400 credits | 5s | 1280x720 | Best value |
seedance | 520 credits | 5s | 1280x720 | High quality |
sora2 (premium) | 800 credits | 4s | 1280x720 | Cinematic |
curl -X POST https://api.ainative.studio/api/v1/multimodal/video/i2v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/photo.jpg",
"motion_prompt": "Camera slowly zooms in while clouds drift across the sky",
"provider": "wan22"
}'
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
image_url | string | Yes | — | Public URL of source image |
motion_prompt | string | Yes | — | Motion description (1-1000 chars) |
provider | string | No | wan22 | wan22, seedance, or sora2 |
Video: Text-to-Video (T2V)
Generate a full video from a text description.
/api/v1/multimodal/video/t2v🔒Cost: 1000 credits | Rate limit: 1/min | Tier: Premium only
curl -X POST https://api.ainative.studio/api/v1/multimodal/video/t2v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A serene ocean wave crashing on a beach at golden hour",
"duration": 5
}'
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | — | Video description (1-1000 chars) |
duration | int | No | 5 | Duration in seconds (1-10) |
Video: CogVideoX
Generate video using the dedicated CogVideoX-2B model with fine-grained control.
/api/v1/multimodal/video/cogvideox🔒Cost: 800 credits | Rate limit: 1/min
curl -X POST https://api.ainative.studio/api/v1/multimodal/video/cogvideox \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A robot walking through a garden in the rain",
"num_frames": 49,
"guidance_scale": 6.0
}'
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | — | Video description (1-1000 chars) |
num_frames | int | No | 49 | Frame count: 17, 33, or 49 |
guidance_scale | float | No | 6.0 | CFG scale (1.0-20.0) |
num_inference_steps | int | No | 50 | Denoising steps (20-100) |
Video Response (all video endpoints):
| Field | Type | Description |
|---|---|---|
video_url | string | URL to download generated video |
format | string | Video format (mp4) |
duration_seconds | float | Video duration |
usage_id | string | Usage tracking ID |
credits_used | int | Credits consumed |
credits_remaining | int | Remaining balance |
Usage History
Track your multimodal API usage across all endpoints.
/api/v1/multimodal/usage🔒curl https://api.ainative.studio/api/v1/multimodal/usage?limit=10 \
-H "Authorization: Bearer $TOKEN"
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
skip | int | 0 | Pagination offset |
limit | int | 50 | Max records (1-100) |
endpoint_type | string | — | Filter: tts, image, video_i2v, video_t2v, video_cogvideox |
Credit Costs Summary
| Endpoint | Credits | Approx. Cost |
|---|---|---|
| TTS | 14 | $0.007 |
| Image | 50 | $0.025 |
| I2V (wan22) | 400 | $0.20 |
| I2V (seedance) | 520 | $0.26 |
| I2V (sora2) | 800 | $0.40 |
| T2V | 1000 | $0.50 |
| CogVideoX | 800 | $0.40 |
Error Codes
| Code | Meaning |
|---|---|
402 | Insufficient credits |
400 | Invalid request (bad params, unsupported format) |
429 | Rate limit exceeded |
500 | GPU provider error — retry after 30s |
For AI Agents
Agents can use these endpoints to generate media autonomously. Best practices:
- Check credits first via
GET /api/v1/managed/usagebefore starting expensive operations - Use
wan22for I2V — best quality-to-cost ratio - Poll usage records via
GET /api/v1/multimodal/usage/{usage_id}to check generation status - Base64 responses can be piped directly to file storage via
POST /api/v1/public/zerodb/files/upload