Prism is an OpenAI-compatible HTTP API for chat completions. It classifies each request as simple, code, reasoning, or complex, then routes it to the cheapest model capable of handling it across Anthropic, OpenAI, and Google. Session memory and provider failover are built in.
Base URL
https://api.prism.ssimplifi.com/v1prism_sk_ bearer tokenstream: trueGet an API key from the signup page, then make your first call:
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Prism-Mode: eco" \
-H "Content-Type: application/json" \
-d '{
"model": "any",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'import requests
response = requests.post(
"https://api.prism.ssimplifi.com/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"X-Prism-Mode": "balanced",
},
json={
"model": "any",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])All API requests require a Bearer token in the Authorization header. API keys start with prism_sk_.
Authorization: Bearer prism_sk_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
Every request requires the X-Prism-Mode header. This controls the quality/cost tradeoff:
eco
Optimizes aggressively for cost. Simple tasks go cheap. Complex tasks still get capable models. 15% markup.
balanced
Best balance of quality and cost. Smart routing for every query type. 20% markup.
sport
Best model for every task. Quality first, cost second. 30% markup.
Prism classifies your query as simple, code, reasoning, or complex, then picks the optimal model for your mode. All modes maintain a quality floor — Prism never returns a bad answer to save money.
POST/v1/chat/completionsOpenAI-compatible chat completion endpoint. Send the same request body you would send to OpenAI.
AuthorizationBearer token with your API key.
X-Prism-Modeeco, balanced, or sport.
X-Prism-Model-PreferPin a specific model. See Model Pinning.
X-Prism-SessionSession ID for conversation memory. See Session Memory.
modelAny value accepted. Prism selects the model based on mode and classification.
messagesArray of message objects with role (system/user/assistant) and content.
streamSet to true for SSE streaming. Default false.
max_tokensMaximum tokens to generate. Default 4096.
temperatureSampling temperature, 0 to 2. Default 1.
Responses follow the OpenAI chat completion format:
{
"id": "prism-a1b2c3d4",
"object": "chat.completion",
"created": 1712150400,
"model": "claude-haiku-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 4."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}Every response includes Prism-specific headers:
X-Prism-ModelThe actual model used (e.g. claude-haiku, gpt-4o-mini, gemini-flash).
X-Prism-CostTotal cost in USD (e.g. 0.000234).
X-Prism-Tokens-InInput token count.
X-Prism-Tokens-OutOutput token count.
X-Prism-Task-TypeClassification result: simple, code, reasoning, or complex.
X-Prism-FailoverPresent and set to "true" only if the request was rerouted to a different provider.
X-Prism-Cache-StatusOne of hit-exact, hit-semantic, miss, bypass, error, or disabled.
X-Prism-Cache-Saved-CentsUSD cents saved on this hit (0 on miss).
X-Prism-Cache-Age-SecondsAge of the cached entry in seconds (only on hits).
X-Prism-Cache-SimilarityCosine similarity for hit-semantic only (e.g. 0.9831).
X-Prism-Feedback-IdUUID for this request. POST it back to /v1/feedback with thumbs / rating / comment to attach feedback. See Feedback.
Set stream: true in the request body to receive Server-Sent Events. Chunks follow the OpenAI delta format:
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":12,"completion_tokens":5,"total_tokens":17}}
data: [DONE]The final chunk includes usage data. Prism headers (X-Prism-Model, X-Prism-Task-Type) are in the HTTP response headers.
Prism caches responses automatically. No opt-in, no header to enable it — caching is free for every tier and engages the moment you send traffic. Two layers run in series: an exact-match layer (Redis SHA-256 fingerprint of the request) and a semantic-match layer (cosine similarity over an embedding of the user message).
Inspect cache behavior on any response via the X-Prism-Cache-* headers. Disable caching for a single request with X-Prism-Cache: off.
Free / Paid: 1-hour TTL, 0.95 similarity threshold, key-scoped (per-API-key namespace). Pro tunes TTL (60s–30d), threshold (0.70–0.99), and scope (project-level coming v1.3), plus the cache inspector for browsing and manual eviction.
Every /v1/chat/completions response includes an X-Prism-Feedback-Id response header (a UUID). POST it to /v1/feedback to attach thumbs / rating / comment / tag.
curl https://api.prism.ssimplifi.com/v1/feedback \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"feedback_id": "<uuid from X-Prism-Feedback-Id>",
"thumbs": 1,
"rating": 5,
"comment": "Helpful response",
"tag": "factually-correct"
}'Body fields (all optional except feedback_id): thumbs ∈ {-1, 0, 1}, rating ∈ 1–5, comment (≤4000 chars), tag (≤200 chars, free-form).
UPSERT semantics: latest non-null field wins per feedback_id. Send a thumb first, a comment later — both stick.
Unknown feedback_id still returns 200 (no link to a request log, but the feedback is captured).
Aggregates view (thumbs split, rating histogram, recent comments) is on /dashboard/usage → Feedback. Available on all tiers — you can only see your own data anyway.
Per-project routing rules + monthly budget caps + an append-only audit log. Configured at /dashboard/policy(Pro/Team only). The hot path enforces in <5 ms; every firing is captured for compliance review.
When a rule blocks a request, Prism returns HTTP 403 with a structured envelope. Branch on error.rule to handle each kind gracefully.
{
"error": {
"type": "policy_rule",
"rule": "denied_mode",
"message": "Mode 'eco' is denied by project policy",
"denied_value": "eco",
"policy_url": "/dashboard/policy"
}
}Possible error.rule values:
denied_mode — the request specified a mode in the deny list. Body includes denied_value.denied_model — the resolved model is in the deny list. Body includes denied_value (the resolved Prism model name).max_input_tokens — estimated input tokens (4 chars ≈ 1 token) exceeded the cap. Body includes max_input_tokens and estimated_tokens.Force-model-by-task is the one rule that does not 403 — it silently overrides the model your request would have used for a given task type (simple / code / reasoning / complex). The override surfaces in the response as the actual X-Prism-Model header and lands on the usage log row.
When a project would exceed its monthly USD cap, Prism returns HTTP 402 Payment Required. Soft warns (default 80%) email the project owner once per calendar month but never block.
{
"error": {
"type": "budget_exceeded",
"message": "Project would exceed monthly cap of $50.00 (current $49.87, this request est. $0.18)",
"monthly_cap_usd": 50.00,
"current_spend_usd": 49.87,
"policy_url": "/dashboard/policy"
}
}max_tokens × output price + tokens_in × input price with a 10% safety margin. Actual usage is usually lower.usage_logs SUM.Every rule change and every enforcement firing is recorded in policy_audit_log with actor, before/after, and details. View on /dashboard/usage → Audit. Retention: Pro 30 days, Team 365 days.
The contract: your request is retried, your stream closes cleanly, you get a response or a structured error. No silent hangs, no SSE that never terminates, no "the provider went down and your app froze for 30 seconds." Live health visible at the top of /dashboard.
When the primary provider for your routed model 5xxs or times out (30s non-streaming, 10s to first-token for streaming), Prism retries once on the same provider, then walks the fallback chain in order. The response includes X-Prism-Failover: true when this happened. Health observations from the failed attempts feed a rolling 5-minute window — the next request from any caller routes around the unhealthy provider automatically.
If the upstream provider drops the connection AFTER first-token has flowed (network glitch, provider OOM, etc.), Prism does NOT retry mid-stream — that would corrupt the SSE you're already consuming. Instead the stream emits a final data: {"error": {"type": "stream_error", ...}} followed by data: [DONE], your client sees a clean close, and the failure is recorded in the provider's health window so the NEXT request routes around.
On X-Prism-Mode: sport, Pro and Team accounts fire the primary AND first healthy fallback in parallel and return whichever responds first. The loser is cancelled. The response includes X-Prism-Speculative: true when this happens. Trade: ~1.3x average token cost on the provider side (loser keeps generating until cancel propagates) for hedged p99 latency + immunity to single-provider degradation. Streaming requests stay serial in v1.5 — hedging two SSE streams is messy. We absorb the loser's token cost; you're only billed for the winner.
You get a clean HTTP 502 with {"error": {"type": "provider_error", "message": "All providers failed: ..."}}. This is the only failure mode that escapes the retry/failover loop — and it requires Anthropic, OpenAI, AND Google to all be down for your routed model class at the same moment. Hasn't happened to us since v1.0 launch.
api.prism.ssimplifi.comis fronted by a Cloudflare Worker. Requests hit the customer's nearest Cloudflare PoP first; the worker handles three things there before deciding whether to forward to Mumbai:
X-Prism-Edge-Cache: hit.X-Prism-Edge-Cache — hit when served from edge cache, passthrough when forwarded to Mumbai, auth-reject when the edge 401'd the request.X-Prism-Edge-Region — IATA airport code of the PoP that handled the request (e.g. SIN for Singapore, SFO for San Francisco, FRA for Frankfurt). Useful for debugging routing in multi-region deployments.X-Prism-Cache-Status: hit-exact-edge on cache hits served from the edge — distinguishes from Mumbai-side hit-exact.International customers benefit most. Without the edge layer, every request from San Francisco, London, or Sydney pays a ~600ms round-trip to Mumbai before the response even starts coming back. With the edge:
The edge reads the exact same Redis cache Mumbai writes to, so deleting a cached entry from /dashboard/cache propagates to the edge on the very next read. API key revocations have a 60-second lag at the edge — the worker caches the auth lookup for 60s to avoid pummeling Supabase. During those 60s a revoked key may auth at the edge, but Mumbai re-validates and rejects, so security is unaffected; the request just gets a slightly slower 401.
Add the X-Prism-Session header with any unique string to enable conversation memory. Prism stores message history in Redis and automatically includes prior context in subsequent calls.
# First message — creates session
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "X-Prism-Mode: balanced" \
-H "X-Prism-Session: my-session-123" \
-d '{"model":"any","messages":[{"role":"user","content":"My name is Ravi"}]}'
# Second message — Prism includes history automatically
curl https://api.prism.ssimplifi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "X-Prism-Mode: balanced" \
-H "X-Prism-Session: my-session-123" \
-d '{"model":"any","messages":[{"role":"user","content":"What is my name?"}]}'
# Response: "Your name is Ravi."Sessions expire after 30 minutes of inactivity. Each access refreshes the TTL.
Maximum 100 messages per session. At 90 messages, older messages are summarized to free space.
System messages persist automatically across the session without re-sending.
Use GET /v1/sessions/{id} to check session info and DELETE /v1/sessions/{id} to clear it.
Override Prism's routing by setting the X-Prism-Model-Prefer header to a specific model:
| Model name | Provider | Actual model |
|---|---|---|
| claude-opus | Anthropic | claude-opus-4-6 |
| claude-sonnet | Anthropic | claude-sonnet-4-6 |
| claude-haiku | Anthropic | claude-haiku-4-5-20251001 |
| gpt-4o | OpenAI | gpt-4o |
| gpt-4o-mini | OpenAI | gpt-4o-mini |
| gemini-pro | gemini-2.5-pro | |
| gemini-flash | gemini-2.5-flash |
Free accounts get daily access with these limits:
Daily input tokensResets at midnight UTC.
Daily output tokensResets at midnight UTC.
Max tokens per requestmax_tokens cannot exceed 4,000.
ModesBalanced and sport modes require a paid account.
StreamingFree tier does not support streaming.
Session messagesEach session limited to 5 messages on the free tier.
/v1/chat/completionsSend a chat completion request/v1/sessions/{id}Get session info (message count, tokens, created_at)/v1/sessions/{id}Delete a session/v1/usageList usage logs with filters (from, to, mode, limit, offset)/v1/usage/summaryAggregated usage summary with cost comparison/v1/usage/exportExport usage logs as CSV (from, to params)/v1/balanceGet current balance, tier, and auto-topup status/v1/keysCreate a new API key/v1/keysList active API keys/v1/keys/{id}Revoke an API keyAll endpoints except sessions require API key authentication.
Errors return a consistent JSON format:
{
"error": {
"type": "insufficient_balance",
"message": "Insufficient balance. Please top up your account."
}
}| Status | Type | Description |
|---|---|---|
| 400 | invalid_request | Malformed request body or invalid parameters |
| 400 | missing_mode_header | X-Prism-Mode header not provided |
| 401 | invalid_api_key | Missing, invalid, or revoked API key |
| 402 | insufficient_balance | Account balance too low for estimated cost |
| 403 | free_tier_limit | Request exceeds free tier limits |
| 429 | rate_limited | Too many requests. Check Retry-After header |
| 502 | provider_error | All AI providers failed after failover attempts |
See the Prism FAQ for answers to common developer questions, or email hello@ssimplifi.com.