Embeddings API
Generate vector embeddings for semantic search, RAG pipelines, clustering, and similarity matching. All models run on DigitalOcean GPU inference.
Base URL: https://api.ainative.studio/api/v1/public/{project_id}/embeddings
Quick Start
curl -X POST https://api.ainative.studio/api/v1/public/YOUR_PROJECT_ID/embeddings/generate \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-d '{
"text": "The quick brown fox jumps over the lazy dog",
"model": "bge-m3"
}'
import requests
response = requests.post(
f"https://api.ainative.studio/api/v1/public/{project_id}/embeddings/generate",
headers={"x-api-key": api_key},
json={"text": "The quick brown fox jumps over the lazy dog", "model": "bge-m3"}
)
embedding = response.json()["embedding"] # 1024-dim vector
Authentication
All endpoints require an API key or Bearer token plus a valid project ID in the URL path.
POST /api/v1/public/{project_id}/embeddings/generate
x-api-key: sk_your_key_here
Content-Type: application/json
Generate Embedding
POST
/api/v1/public/{project_id}/embeddings/generate🔒Generate a vector embedding for a text input.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | — | Text to embed (max 8192 tokens) |
model | string | No | bge-m3 | Embedding model ID |
Response:
| Field | Type | Description |
|---|---|---|
embedding | float[] | Vector embedding |
model | string | Model used |
dimensions | int | Vector dimensions |
usage.tokens | int | Tokens processed |
Available Models
| Model ID | Alias | Dimensions | Best For | Plan |
|---|---|---|---|---|
all-mini-lm-l6-v2 | all-minilm | 384 | Fast similarity search, prototyping | Free |
bge-m3 | bge-m3 | 1024 | Multilingual retrieval, RAG (recommended) | Pro+ |
bge-reranker-v2-m3 | bge-reranker | — | Reranking search results (not for storage) | Pro+ |
e5-large-v2 | e5-large | 1024 | English document retrieval | Pro+ |
gte-large-en-v1.5 | gte-large | 1024 | English general-purpose embeddings | Pro+ |
qwen3-embedding-0.6b | qwen-embed | 1024 | Multilingual, code-aware embeddings | Pro+ |
multi-qa-mpnet-base-dot-v1 | mpnet | 768 | Question-answering, semantic search | Pro+ |
Which model should I use?
- Starting out?
all-minilm— fast, free, 384 dimensions - Production RAG?
bge-m3— best multilingual performance at 1024 dimensions - Reranking results?
bge-reranker— use as a second pass to reorder search results - Code + text?
qwen-embed— understands both natural language and code
Model Aliases
You can use short aliases instead of full model IDs:
{"model": "bge-m3"} // full ID
{"model": "all-minilm"} // alias for all-mini-lm-l6-v2
{"model": "e5-large"} // alias for e5-large-v2
{"model": "mpnet"} // alias for multi-qa-mpnet-base-dot-v1
Reranking
The bge-reranker-v2-m3 model is a cross-encoder reranker, not a standard embedding model. Use it to reorder search results by relevance.
import requests
# First, get candidate results from vector search
candidates = ["Result A", "Result B", "Result C"]
# Then rerank against the query
response = requests.post(
f"https://api.ainative.studio/api/v1/public/{project_id}/embeddings/generate",
headers={"x-api-key": api_key},
json={
"text": "What is machine learning?",
"model": "bge-reranker",
"candidates": candidates
}
)
Storing Embeddings in ZeroDB
Combine the Embeddings API with ZeroDB vector storage for a complete RAG pipeline:
import requests
API = "https://api.ainative.studio/api/v1/public"
PROJECT_ID = "your_project_id"
HEADERS = {"x-api-key": "sk_your_key", "Content-Type": "application/json"}
# 1. Generate embedding
embed_resp = requests.post(
f"{API}/{PROJECT_ID}/embeddings/generate",
headers=HEADERS,
json={"text": "Machine learning is a subset of AI", "model": "bge-m3"}
)
vector = embed_resp.json()["embedding"]
# 2. Store in ZeroDB
requests.post(
f"{API}/{PROJECT_ID}/vectors",
headers=HEADERS,
json={
"vector": vector,
"metadata": {"source": "docs", "text": "Machine learning is a subset of AI"}
}
)
# 3. Search by similarity
search_embed = requests.post(
f"{API}/{PROJECT_ID}/embeddings/generate",
headers=HEADERS,
json={"text": "What is ML?", "model": "bge-m3"}
).json()["embedding"]
results = requests.post(
f"{API}/{PROJECT_ID}/vectors/search",
headers=HEADERS,
json={"vector": search_embed, "top_k": 5}
).json()
Error Handling
| Status | Meaning | Action |
|---|---|---|
| 200 | Success | Parse embedding from response |
| 400 | Invalid model or empty text | Check model ID and text content |
| 401 | Unauthorized | Verify API key |
| 402 | Insufficient credits | Top up credits |
| 404 | Project not found | Check project ID in URL |
| 429 | Rate limited | Retry with backoff |
| 500 | Model inference error | Retry; check model availability |
For AI Agents
- Use
bge-m3for storing and retrieving agent memory (pairs with ZeroMemory) - Use
all-minilmfor fast, low-cost similarity checks during tool selection - Use
bge-rerankerto improve retrieval precision before feeding context to LLMs - Embeddings are deterministic — same input always produces the same vector
Next Steps
- ZeroDB Vectors — Store and search embeddings
- ZeroMemory — Cognitive memory for agents
- GraphRAG Guide — Combine embeddings with knowledge graphs
- Chat Completions — Use retrieved context in prompts