Skip to main content

Multimodal API

Generate speech, images, and videos using GPU-powered models. All endpoints consume credits and return base64-encoded results.

Base URL: https://api.ainative.studio/api/v1/multimodal

Authentication

All endpoints require authentication via API key or Bearer token.

POST /api/v1/multimodal/tts
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json

Text-to-Speech (TTS)

Generate high-quality speech from text using MiniMax TTS.

POST/api/v1/multimodal/tts🔒

Cost: 14 credits per generation | Rate limit: 10/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/tts \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to AINative Studio.",
"voice": "Wise_Woman"
}'
import requests

response = requests.post(
"https://api.ainative.studio/api/v1/multimodal/tts",
headers={"Authorization": f"Bearer {token}"},
json={
"text": "Welcome to AINative Studio.",
"voice": "Wise_Woman"
}
)

data = response.json()
# data["audio_base64"] — base64-encoded MP3
# data["credits_used"] — credits consumed

Parameters:

ParameterTypeRequiredDefaultDescription
textstringYesText to synthesize (1-5000 chars)
voicestringNoWise_WomanVoice profile

Response:

FieldTypeDescription
audio_base64stringBase64-encoded MP3 audio
formatstringAudio format (mp3)
duration_msintAudio duration in milliseconds
usage_idstringUsage record ID for tracking
credits_usedintCredits consumed
credits_remainingintRemaining credit balance

Image Generation

Generate images from text prompts using Qwen Image Edit with optional LoRA styles.

POST/api/v1/multimodal/image🔒

Cost: 50 credits per image | Rate limit: 5/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/image \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A futuristic cityscape at sunset with neon lights",
"width": 1024,
"height": 1024
}'
import requests, base64

response = requests.post(
"https://api.ainative.studio/api/v1/multimodal/image",
headers={"Authorization": f"Bearer {token}"},
json={
"prompt": "A futuristic cityscape at sunset with neon lights",
"width": 1024,
"height": 1024
}
)

data = response.json()
# Save image
with open("output.png", "wb") as f:
f.write(base64.b64decode(data["image_base64"]))

Parameters:

ParameterTypeRequiredDefaultDescription
promptstringYesImage description (1-2000 chars)
widthintNo1024Image width (512-2048)
heightintNo1024Image height (512-2048)
stylestringNonullOptional LoRA style preset

Response:

FieldTypeDescription
image_base64stringBase64-encoded image
formatstringImage format (jpg or png)
width / heightintActual dimensions
credits_usedintCredits consumed

Video: Image-to-Video (I2V)

Animate a static image into a video with motion prompts.

POST/api/v1/multimodal/video/i2v🔒

Rate limit: 2/min

ProviderCostDurationResolutionQuality
wan22 (default)400 credits5s1280x720Best value
seedance520 credits5s1280x720High quality
sora2 (premium)800 credits4s1280x720Cinematic
curl -X POST https://api.ainative.studio/api/v1/multimodal/video/i2v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/photo.jpg",
"motion_prompt": "Camera slowly zooms in while clouds drift across the sky",
"provider": "wan22"
}'

Parameters:

ParameterTypeRequiredDefaultDescription
image_urlstringYesPublic URL of source image
motion_promptstringYesMotion description (1-1000 chars)
providerstringNowan22wan22, seedance, or sora2

Video: Text-to-Video (T2V)

Generate a full video from a text description.

POST/api/v1/multimodal/video/t2v🔒

Cost: 1000 credits | Rate limit: 1/min | Tier: Premium only

curl -X POST https://api.ainative.studio/api/v1/multimodal/video/t2v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A serene ocean wave crashing on a beach at golden hour",
"duration": 5
}'

Parameters:

ParameterTypeRequiredDefaultDescription
promptstringYesVideo description (1-1000 chars)
durationintNo5Duration in seconds (1-10)

Video: CogVideoX

Generate video using the dedicated CogVideoX-2B model with fine-grained control.

POST/api/v1/multimodal/video/cogvideox🔒

Cost: 800 credits | Rate limit: 1/min

curl -X POST https://api.ainative.studio/api/v1/multimodal/video/cogvideox \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A robot walking through a garden in the rain",
"num_frames": 49,
"guidance_scale": 6.0
}'

Parameters:

ParameterTypeRequiredDefaultDescription
promptstringYesVideo description (1-1000 chars)
num_framesintNo49Frame count: 17, 33, or 49
guidance_scalefloatNo6.0CFG scale (1.0-20.0)
num_inference_stepsintNo50Denoising steps (20-100)

Video Response (all video endpoints):

FieldTypeDescription
video_urlstringURL to download generated video
formatstringVideo format (mp4)
duration_secondsfloatVideo duration
usage_idstringUsage tracking ID
credits_usedintCredits consumed
credits_remainingintRemaining balance

Usage History

Track your multimodal API usage across all endpoints.

GET/api/v1/multimodal/usage🔒
curl https://api.ainative.studio/api/v1/multimodal/usage?limit=10 \
-H "Authorization: Bearer $TOKEN"

Query Parameters:

ParameterTypeDefaultDescription
skipint0Pagination offset
limitint50Max records (1-100)
endpoint_typestringFilter: tts, image, video_i2v, video_t2v, video_cogvideox

Credit Costs Summary

EndpointCreditsApprox. Cost
TTS14$0.007
Image50$0.025
I2V (wan22)400$0.20
I2V (seedance)520$0.26
I2V (sora2)800$0.40
T2V1000$0.50
CogVideoX800$0.40

Error Codes

CodeMeaning
402Insufficient credits
400Invalid request (bad params, unsupported format)
429Rate limit exceeded
500GPU provider error — retry after 30s

For AI Agents

Agents can use these endpoints to generate media autonomously. Best practices:

  • Check credits first via GET /api/v1/managed/usage before starting expensive operations
  • Use wan22 for I2V — best quality-to-cost ratio
  • Poll usage records via GET /api/v1/multimodal/usage/{usage_id} to check generation status
  • Base64 responses can be piped directly to file storage via POST /api/v1/public/zerodb/files/upload