Skip to main content

Embeddings API

Generate vector embeddings for semantic search, RAG pipelines, clustering, and similarity matching. All models run on DigitalOcean GPU inference.

Base URL: https://api.ainative.studio/api/v1/public/{project_id}/embeddings

Quick Start

curl -X POST https://api.ainative.studio/api/v1/public/YOUR_PROJECT_ID/embeddings/generate \
-H "Content-Type: application/json" \
-H "x-api-key: $AINATIVE_API_KEY" \
-d '{
"text": "The quick brown fox jumps over the lazy dog",
"model": "bge-m3"
}'
import requests

response = requests.post(
f"https://api.ainative.studio/api/v1/public/{project_id}/embeddings/generate",
headers={"x-api-key": api_key},
json={"text": "The quick brown fox jumps over the lazy dog", "model": "bge-m3"}
)

embedding = response.json()["embedding"] # 1024-dim vector

Authentication

All endpoints require an API key or Bearer token plus a valid project ID in the URL path.

POST /api/v1/public/{project_id}/embeddings/generate
x-api-key: sk_your_key_here
Content-Type: application/json

Generate Embedding

POST/api/v1/public/{project_id}/embeddings/generate🔒

Generate a vector embedding for a text input.

Parameters:

ParameterTypeRequiredDefaultDescription
textstringYesText to embed (max 8192 tokens)
modelstringNobge-m3Embedding model ID

Response:

FieldTypeDescription
embeddingfloat[]Vector embedding
modelstringModel used
dimensionsintVector dimensions
usage.tokensintTokens processed

Available Models

Model IDAliasDimensionsBest ForPlan
all-mini-lm-l6-v2all-minilm384Fast similarity search, prototypingFree
bge-m3bge-m31024Multilingual retrieval, RAG (recommended)Pro+
bge-reranker-v2-m3bge-rerankerReranking search results (not for storage)Pro+
e5-large-v2e5-large1024English document retrievalPro+
gte-large-en-v1.5gte-large1024English general-purpose embeddingsPro+
qwen3-embedding-0.6bqwen-embed1024Multilingual, code-aware embeddingsPro+
multi-qa-mpnet-base-dot-v1mpnet768Question-answering, semantic searchPro+
Which model should I use?
  • Starting out? all-minilm — fast, free, 384 dimensions
  • Production RAG? bge-m3 — best multilingual performance at 1024 dimensions
  • Reranking results? bge-reranker — use as a second pass to reorder search results
  • Code + text? qwen-embed — understands both natural language and code

Model Aliases

You can use short aliases instead of full model IDs:

{"model": "bge-m3"}        // full ID
{"model": "all-minilm"} // alias for all-mini-lm-l6-v2
{"model": "e5-large"} // alias for e5-large-v2
{"model": "mpnet"} // alias for multi-qa-mpnet-base-dot-v1

Reranking

The bge-reranker-v2-m3 model is a cross-encoder reranker, not a standard embedding model. Use it to reorder search results by relevance.

import requests

# First, get candidate results from vector search
candidates = ["Result A", "Result B", "Result C"]

# Then rerank against the query
response = requests.post(
f"https://api.ainative.studio/api/v1/public/{project_id}/embeddings/generate",
headers={"x-api-key": api_key},
json={
"text": "What is machine learning?",
"model": "bge-reranker",
"candidates": candidates
}
)

Storing Embeddings in ZeroDB

Combine the Embeddings API with ZeroDB vector storage for a complete RAG pipeline:

import requests

API = "https://api.ainative.studio/api/v1/public"
PROJECT_ID = "your_project_id"
HEADERS = {"x-api-key": "sk_your_key", "Content-Type": "application/json"}

# 1. Generate embedding
embed_resp = requests.post(
f"{API}/{PROJECT_ID}/embeddings/generate",
headers=HEADERS,
json={"text": "Machine learning is a subset of AI", "model": "bge-m3"}
)
vector = embed_resp.json()["embedding"]

# 2. Store in ZeroDB
requests.post(
f"{API}/{PROJECT_ID}/vectors",
headers=HEADERS,
json={
"vector": vector,
"metadata": {"source": "docs", "text": "Machine learning is a subset of AI"}
}
)

# 3. Search by similarity
search_embed = requests.post(
f"{API}/{PROJECT_ID}/embeddings/generate",
headers=HEADERS,
json={"text": "What is ML?", "model": "bge-m3"}
).json()["embedding"]

results = requests.post(
f"{API}/{PROJECT_ID}/vectors/search",
headers=HEADERS,
json={"vector": search_embed, "top_k": 5}
).json()

Error Handling

StatusMeaningAction
200SuccessParse embedding from response
400Invalid model or empty textCheck model ID and text content
401UnauthorizedVerify API key
402Insufficient creditsTop up credits
404Project not foundCheck project ID in URL
429Rate limitedRetry with backoff
500Model inference errorRetry; check model availability

For AI Agents

  • Use bge-m3 for storing and retrieving agent memory (pairs with ZeroMemory)
  • Use all-minilm for fast, low-cost similarity checks during tool selection
  • Use bge-reranker to improve retrieval precision before feeding context to LLMs
  • Embeddings are deterministic — same input always produces the same vector

Next Steps