OpenAI-compatible chat completions API with intelligent routing, session memory, and automatic failover.
Base URL
https://api.prism.ssimplifi.com/v1Get an API key from the signup page, then make your first call:
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Prism-Mode: eco" \
-H "Content-Type: application/json" \
-d '{
"model": "any",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'import requests
response = requests.post(
"https://api.prism.ssimplifi.com/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"X-Prism-Mode": "balanced",
},
json={
"model": "any",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])All API requests require a Bearer token in the Authorization header. API keys start with prism_sk_.
Authorization: Bearer prism_sk_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
Every request requires the X-Prism-Mode header. This controls the quality/cost tradeoff:
eco
Optimizes aggressively for cost. Simple tasks go cheap. Complex tasks still get capable models. 15% markup.
balanced
Best balance of quality and cost. Smart routing for every query type. 20% markup.
sport
Best model for every task. Quality first, cost second. 30% markup.
Prism classifies your query as simple, code, reasoning, or complex, then picks the optimal model for your mode. All modes maintain a quality floor — Prism never returns a bad answer to save money.
POST/v1/chat/completionsOpenAI-compatible chat completion endpoint. Send the same request body you would send to OpenAI.
AuthorizationBearer token with your API key.
X-Prism-Modeeco, balanced, or sport.
X-Prism-Model-PreferPin a specific model. See Model Pinning.
X-Prism-SessionSession ID for conversation memory. See Session Memory.
modelAny value accepted. Prism selects the model based on mode and classification.
messagesArray of message objects with role (system/user/assistant) and content.
streamSet to true for SSE streaming. Default false.
max_tokensMaximum tokens to generate. Default 4096.
temperatureSampling temperature, 0 to 2. Default 1.
Responses follow the OpenAI chat completion format:
{
"id": "prism-a1b2c3d4",
"object": "chat.completion",
"created": 1712150400,
"model": "claude-haiku-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 4."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}Every response includes Prism-specific headers:
X-Prism-ModelThe actual model used (e.g. claude-haiku, gpt-4o-mini, gemini-flash).
X-Prism-CostTotal cost in USD (e.g. 0.000234).
X-Prism-Tokens-InInput token count.
X-Prism-Tokens-OutOutput token count.
X-Prism-Task-TypeClassification result: simple, code, reasoning, or complex.
X-Prism-FailoverPresent and set to "true" only if the request was rerouted to a different provider.
Set stream: true in the request body to receive Server-Sent Events. Chunks follow the OpenAI delta format:
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":12,"completion_tokens":5,"total_tokens":17}}
data: [DONE]The final chunk includes usage data. Prism headers (X-Prism-Model, X-Prism-Task-Type) are in the HTTP response headers.
Add the X-Prism-Session header with any unique string to enable conversation memory. Prism stores message history in Redis and automatically includes prior context in subsequent calls.
# First message — creates session
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "X-Prism-Mode: balanced" \
-H "X-Prism-Session: my-session-123" \
-d '{"model":"any","messages":[{"role":"user","content":"My name is Ravi"}]}'
# Second message — Prism includes history automatically
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "X-Prism-Mode: balanced" \
-H "X-Prism-Session: my-session-123" \
-d '{"model":"any","messages":[{"role":"user","content":"What is my name?"}]}'
# Response: "Your name is Ravi."Sessions expire after 30 minutes of inactivity. Each access refreshes the TTL.
Maximum 100 messages per session. At 90 messages, older messages are summarized to free space.
System messages persist automatically across the session without re-sending.
Use GET /v1/sessions/{id} to check session info and DELETE /v1/sessions/{id} to clear it.
Override Prism's routing by setting the X-Prism-Model-Prefer header to a specific model:
| Model name | Provider | Actual model |
|---|---|---|
| claude-opus | Anthropic | claude-opus-4-6 |
| claude-sonnet | Anthropic | claude-sonnet-4-6 |
| claude-haiku | Anthropic | claude-haiku-4-5-20251001 |
| gpt-4o | OpenAI | gpt-4o |
| gpt-4o-mini | OpenAI | gpt-4o-mini |
| gemini-pro | gemini-2.5-pro | |
| gemini-flash | gemini-2.5-flash |
Free accounts get daily access with these limits:
Daily input tokensResets at midnight UTC.
Daily output tokensResets at midnight UTC.
Max tokens per requestmax_tokens cannot exceed 4,000.
ModesBalanced and sport modes require a paid account.
StreamingFree tier does not support streaming.
Session messagesEach session limited to 5 messages on the free tier.
/v1/chat/completionsSend a chat completion request/v1/sessions/{id}Get session info (message count, tokens, created_at)/v1/sessions/{id}Delete a session/v1/usageList usage logs with filters (from, to, mode, limit, offset)/v1/usage/summaryAggregated usage summary with cost comparison/v1/usage/exportExport usage logs as CSV (from, to params)/v1/balanceGet current balance, tier, and auto-topup status/v1/keysCreate a new API key/v1/keysList active API keys/v1/keys/{id}Revoke an API keyAll endpoints except sessions require API key authentication.
Errors return a consistent JSON format:
{
"error": {
"type": "insufficient_balance",
"message": "Insufficient balance. Please top up your account."
}
}| Status | Type | Description |
|---|---|---|
| 400 | invalid_request | Malformed request body or invalid parameters |
| 400 | missing_mode_header | X-Prism-Mode header not provided |
| 401 | invalid_api_key | Missing, invalid, or revoked API key |
| 402 | insufficient_balance | Account balance too low for estimated cost |
| 403 | free_tier_limit | Request exceeds free tier limits |
| 429 | rate_limited | Too many requests. Check Retry-After header |
| 502 | provider_error | All AI providers failed after failover attempts |