Question 1

What is Prism and how is it different from OpenRouter?

Accepted Answer

Prism is an OpenAI-compatible AI API proxy. You point your existing OpenAI SDK at Prism, send an X-Prism-Mode header (eco, balanced, or sport), and Prism classifies each query, picks the optimal model across Anthropic, OpenAI, and Google, calls that provider, and returns the response. Compared to OpenRouter, Prism is opinionated about routing by default: instead of forcing you to pick a model per request, Prism classifies your query and chooses the cheapest model that can handle it well based on the mode you set. You can still pin a specific model via X-Prism-Model-Prefer if you want — but the whole point of Prism is that you usually don't have to. Prism also bundles server-side session memory (Redis-backed) and automatic cross-provider failover as first-class features, not paid add-ons. It's designed for developers who want to stop manually juggling models and just get the right answer at the right cost.

Question 2

How does intelligent routing work?

Accepted Answer

Every incoming request goes through a rule-based classifier that inspects the prompt — looking for code blocks, reasoning keywords, token length, and other signals — and labels it as simple, code, reasoning, or complex. The classifier is fast and deterministic; it doesn't call an AI model, so it adds almost no latency. That classification, combined with the mode header you set, maps to a specific model via a routing table. For example, in eco mode a simple query goes to Gemini Flash (the cheapest option), a code query goes to Claude Haiku, and a complex query still goes to Claude Sonnet — Prism maintains a quality floor so eco never returns garbage just to save money. In sport mode, reasoning and complex queries go to Claude Opus for maximum quality. Balanced sits in between. You can see which model was actually used on every response via the X-Prism-Model header, so there's no black box.

Question 3

What does session memory do and when should I use it?

Accepted Answer

Session memory lets you have multi-turn conversations without resending the entire message history on every call. Pass an X-Prism-Session header with any unique string (like a UUID or user ID), and Prism stores the message history server-side in Redis with a 30-minute TTL that refreshes on every access. On subsequent calls with the same session ID, Prism automatically prepends the stored history to your new message before calling the provider, so you just send the latest user turn and get a contextually aware response. Use it when you're building chat interfaces, AI agents with memory between steps, or anything where the conversation flow matters. Don't use it for one-shot classification or single-turn completions — it just adds overhead. Sessions are capped at 100 messages (5 on free tier), with automatic summarization at 90 to prevent context bloat. System messages persist across the session without re-sending.

Question 4

Is Prism really OpenAI-compatible? What about edge cases?

Accepted Answer

Yes — Prism implements the OpenAI /v1/chat/completions schema, including streaming via SSE, the messages array (system, user, assistant, tool roles), max_tokens, temperature, and standard response fields like choices, usage, and finish_reason. You can drop it into any existing OpenAI SDK by changing the base_url to https://api.prism.ssimplifi.com/v1 and using a Prism API key. There are two edge cases to know about. First, the `model` field in your request body is ignored by default — Prism picks the model based on mode and classification. If you want to force a specific model, use the X-Prism-Model-Prefer header instead. Second, provider-specific features that don't have a cross-provider equivalent (like OpenAI's reproducible outputs via seed, or Anthropic's cache_control) aren't exposed through the proxy yet. If you need those, pin the provider with X-Prism-Model-Prefer and Prism will pass through most parameters unchanged.

Question 5

How does pricing actually work?

Accepted Answer

Prism is pay-as-you-go with no subscription. Every paid request is billed at the underlying provider's list price plus a mode-based markup: 15% for eco, 20% for balanced, 30% for sport. That markup covers classification, routing, session memory, failover, the dashboard, and all infra. You prepay by adding balance (minimum $5 top-up via Razorpay) and each call deducts from your balance. The dashboard shows the exact cost, token counts, and model used for every call, plus aggregate stats and CSV export. There's a free tier with 50K input / 10K output tokens per day on eco mode — no credit card required. Most developers save 30-50% on their total AI spend even after Prism's markup, because Prism routes cheap-but-capable models for queries that don't need top-tier reasoning. See the /pricing page for the full breakdown.

Question 6

What happens when one provider goes down?

Accepted Answer

Prism has automatic cross-provider failover built in. If your request goes to, say, Claude Sonnet and Anthropic returns a 5xx error or times out, Prism immediately retries on an equivalent-or-higher-capability model from a different provider — for example, GPT-4o. You get a successful response with no code changes on your side, and the response includes X-Prism-Failover: true so you can see that a failover happened. The failover logic respects your mode's quality floor: it never downgrades to a weaker model class just to succeed. If every provider in the fallback chain fails (which is rare), you get a 502 provider_error response. Failover kicks in on network errors, rate limits from upstream, and 5xx responses — not on 4xx errors from the provider, since those usually indicate a problem with your request that won't fix itself by trying another provider.

Question 7

Can I use Prism with my existing code?

Accepted Answer

Yes — that's the whole point. If you're already using the OpenAI Python SDK, JS SDK, or any HTTP client that speaks the OpenAI chat completions format, switching to Prism takes about 30 seconds. Change base_url from https://api.openai.com/v1 to https://api.prism.ssimplifi.com/v1, swap your OpenAI API key for a Prism API key (starts with prism_sk_), and add an X-Prism-Mode header (eco, balanced, or sport) to each request. That's it. Your request bodies, response parsing, streaming code, error handling — everything else stays exactly the same. Prism also works with LangChain, LlamaIndex, Vercel AI SDK, and any other framework that lets you configure a custom OpenAI-compatible endpoint. Check /docs for copy-paste snippets in curl, Python, and JavaScript.

Question 8

How is my data handled? Do you store my prompts?

Accepted Answer

No. Prism never logs, stores, or inspects the content of your prompts or the AI responses — not for training, not for analytics, not for debugging. Your messages pass through Prism to the AI provider and back to you. What we do store is metadata: timestamps, token counts (input and output), the model that handled the request, which provider answered, the cost, and the latency. That metadata powers the usage dashboard and billing. If you use session memory, the conversation history is stored in Redis with a 30-minute TTL and is automatically deleted after that. You can delete a session at any time via the API. API keys are stored only as SHA-256 hashes — we never keep the raw key after creation. Payment processing goes through Razorpay; we never see or store card numbers. See /privacy for the full policy.

Question 9

What's the free tier limit and what happens when I exceed it?

Accepted Answer

Free tier gives you 50,000 input tokens and 10,000 output tokens per day on eco mode, resetting at midnight UTC. It also includes one API key, sessions capped at 5 messages, max_tokens capped at 4,000 per request, and no streaming. When you exceed any of these limits, Prism returns a 403 with a free_tier_limit error type and a message explaining which limit you hit. To upgrade, add $5 or more to your account balance in the dashboard — the first payment automatically upgrades your account to paid. Paid accounts unlock all three modes (eco, balanced, sport), streaming, unlimited API keys, sessions up to 100 messages, no daily token limits, and the full dashboard features. Your unused free-tier balance doesn't carry over (there's nothing to carry), but you keep your API key and all your settings.

Question 10

How do I get started? (with curl example)

Accepted Answer

Sign up at prism.ssimplifi.com/signup, copy the API key that's shown to you once (it starts with prism_sk_ and is only displayed during creation), and make your first call. Here's a minimal curl example: curl https://api.prism.ssimplifi.com/v1/chat/completions -H "Authorization: Bearer prism_sk_your_key" -H "X-Prism-Mode: eco" -H "Content-Type: application/json" -d '{"model":"any","messages":[{"role":"user","content":"What is 2+2?"}]}'. The response comes back in standard OpenAI format, and the HTTP headers include X-Prism-Model (the actual model used), X-Prism-Cost (USD cost of this call), X-Prism-Tokens-In and X-Prism-Tokens-Out (token counts), and X-Prism-Task-Type (classification result). From there, swap curl for your OpenAI SDK of choice, pointing base_url at https://api.prism.ssimplifi.com/v1. See /docs for full reference.

Frequently asked questions

Didn't find your answer?

FAQ list

Didn't find your answer?