PrismAPI Documentation

Prism API

OpenAI-compatible chat completions API with intelligent routing, session memory, and automatic failover.

Base URL

https://api.prism.ssimplifi.com/v1

Quickstart

Get an API key from the signup page, then make your first call:

curl
curl https://api.prism.ssimplifi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Prism-Mode: eco" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'
Python
import requests

response = requests.post(
    "https://api.prism.ssimplifi.com/v1/chat/completions",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "X-Prism-Mode": "balanced",
    },
    json={
        "model": "any",
        "messages": [{"role": "user", "content": "Explain quantum computing"}],
    },
)

data = response.json()
print(data["choices"][0]["message"]["content"])

Authentication

All API requests require a Bearer token in the Authorization header. API keys start with prism_sk_.

Authorization: Bearer prism_sk_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6

Modes

Every request requires the X-Prism-Mode header. This controls the quality/cost tradeoff:

eco

Optimizes aggressively for cost. Simple tasks go cheap. Complex tasks still get capable models. 15% markup.

balanced

Best balance of quality and cost. Smart routing for every query type. 20% markup.

sport

Best model for every task. Quality first, cost second. 30% markup.

Prism classifies your query as simple, code, reasoning, or complex, then picks the optimal model for your mode. All modes maintain a quality floor — Prism never returns a bad answer to save money.

Chat Completions

POST/v1/chat/completions

OpenAI-compatible chat completion endpoint. Send the same request body you would send to OpenAI.

Headers

Authorization
stringrequired

Bearer token with your API key.

X-Prism-Mode
stringrequired

eco, balanced, or sport.

X-Prism-Model-Prefer
string

Pin a specific model. See Model Pinning.

X-Prism-Session
string

Session ID for conversation memory. See Session Memory.

Request body

model
stringrequired

Any value accepted. Prism selects the model based on mode and classification.

messages
arrayrequired

Array of message objects with role (system/user/assistant) and content.

stream
boolean

Set to true for SSE streaming. Default false.

max_tokens
integer

Maximum tokens to generate. Default 4096.

temperature
number

Sampling temperature, 0 to 2. Default 1.

Response Format

Responses follow the OpenAI chat completion format:

Response
{
  "id": "prism-a1b2c3d4",
  "object": "chat.completion",
  "created": 1712150400,
  "model": "claude-haiku-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The answer is 4."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Response Headers

Every response includes Prism-specific headers:

X-Prism-Model
string

The actual model used (e.g. claude-haiku, gpt-4o-mini, gemini-flash).

X-Prism-Cost
string

Total cost in USD (e.g. 0.000234).

X-Prism-Tokens-In
string

Input token count.

X-Prism-Tokens-Out
string

Output token count.

X-Prism-Task-Type
string

Classification result: simple, code, reasoning, or complex.

X-Prism-Failover
string

Present and set to "true" only if the request was rerouted to a different provider.

Streaming

Set stream: true in the request body to receive Server-Sent Events. Chunks follow the OpenAI delta format:

SSE stream
data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"index":0}]}

data: {"id":"prism-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":12,"completion_tokens":5,"total_tokens":17}}

data: [DONE]

The final chunk includes usage data. Prism headers (X-Prism-Model, X-Prism-Task-Type) are in the HTTP response headers.

Session Memory

Add the X-Prism-Session header with any unique string to enable conversation memory. Prism stores message history in Redis and automatically includes prior context in subsequent calls.

Using sessions
# First message — creates session
curl https://api.prism.ssimplifi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "X-Prism-Mode: balanced" \
  -H "X-Prism-Session: my-session-123" \
  -d '{"model":"any","messages":[{"role":"user","content":"My name is Ravi"}]}'

# Second message — Prism includes history automatically
curl https://api.prism.ssimplifi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "X-Prism-Mode: balanced" \
  -H "X-Prism-Session: my-session-123" \
  -d '{"model":"any","messages":[{"role":"user","content":"What is my name?"}]}'
# Response: "Your name is Ravi."

Sessions expire after 30 minutes of inactivity. Each access refreshes the TTL.

Maximum 100 messages per session. At 90 messages, older messages are summarized to free space.

System messages persist automatically across the session without re-sending.

Use GET /v1/sessions/{id} to check session info and DELETE /v1/sessions/{id} to clear it.

Model Pinning

Override Prism's routing by setting the X-Prism-Model-Prefer header to a specific model:

Model nameProviderActual model
claude-opusAnthropicclaude-opus-4-6
claude-sonnetAnthropicclaude-sonnet-4-6
claude-haikuAnthropicclaude-haiku-4-5-20251001
gpt-4oOpenAIgpt-4o
gpt-4o-miniOpenAIgpt-4o-mini
gemini-proGooglegemini-2.5-pro
gemini-flashGooglegemini-2.5-flash

Free Tier

Free accounts get daily access with these limits:

Daily input tokens
50,000

Resets at midnight UTC.

Daily output tokens
10,000

Resets at midnight UTC.

Max tokens per request
4,000

max_tokens cannot exceed 4,000.

Modes
eco only

Balanced and sport modes require a paid account.

Streaming
disabled

Free tier does not support streaming.

Session messages
5 max

Each session limited to 5 messages on the free tier.

All Endpoints

POST/v1/chat/completionsSend a chat completion request
GET/v1/sessions/{id}Get session info (message count, tokens, created_at)
DELETE/v1/sessions/{id}Delete a session
GET/v1/usageList usage logs with filters (from, to, mode, limit, offset)
GET/v1/usage/summaryAggregated usage summary with cost comparison
GET/v1/usage/exportExport usage logs as CSV (from, to params)
GET/v1/balanceGet current balance, tier, and auto-topup status
POST/v1/keysCreate a new API key
GET/v1/keysList active API keys
DELETE/v1/keys/{id}Revoke an API key

All endpoints except sessions require API key authentication.

Error Codes

Errors return a consistent JSON format:

Error response
{
  "error": {
    "type": "insufficient_balance",
    "message": "Insufficient balance. Please top up your account."
  }
}
StatusTypeDescription
400invalid_requestMalformed request body or invalid parameters
400missing_mode_headerX-Prism-Mode header not provided
401invalid_api_keyMissing, invalid, or revoked API key
402insufficient_balanceAccount balance too low for estimated cost
403free_tier_limitRequest exceeds free tier limits
429rate_limitedToo many requests. Check Retry-After header
502provider_errorAll AI providers failed after failover attempts