Question 1

What is an AI API proxy?

Accepted Answer

An AI API proxy sits between your app and AI providers like Anthropic, OpenAI, and Google. Instead of calling each provider directly, you call the proxy, which classifies the query, picks the optimal model, manages session memory, and falls back to another provider if one is down. Prism is OpenAI-compatible, so it works as a drop-in replacement.

Question 2

How much does Prism cost?

Accepted Answer

Prism is free to start — 50,000 input tokens per day with no credit card. Paid usage is pay-as-you-go: provider list price plus a small markup of 15% (Eco), 20% (Balanced), or 30% (Sport). No monthly fee, no per-seat charge. Minimum top-up is $5.

Question 3

How do I integrate Prism into my app?

Accepted Answer

Point your existing OpenAI SDK at https://api.prism.ssimplifi.com/v1 and use a Prism API key (prism_sk_...). Add an X-Prism-Mode header (eco, balanced, or sport). That's the entire integration — no other code changes.

Question 4

Which AI models does Prism support?

Accepted Answer

Prism routes across Anthropic Claude (Opus, Sonnet, Haiku), OpenAI (GPT-4o, GPT-4o-mini), and Google Gemini (2.5 Pro, 2.5 Flash). The proxy picks the best model per query based on the quality mode you choose.

Question 5

Does Prism handle conversation memory?

Accepted Answer

Yes. Pass an X-Prism-Session header with any session ID and Prism stores the conversation history server-side in Redis. You don't need to resend the entire message history on each call — just send the new user message and the same session ID.

Question 6

What happens when an AI provider is down?

Accepted Answer

Prism automatically retries on a fallback provider of equivalent or higher capability and sets X-Prism-Failover: true on the response. Your request still succeeds without any code changes on your side.

Question 7

Does Prism cache AI responses?

Accepted Answer

Yes. Prism stacks three caching layers: exact match (byte-identical requests served from Redis in sub-10ms), semantic match (cosine-similarity search over embeddings, default threshold 0.95), and provider-native cache passthrough (Anthropic prompt caching, OpenAI cached input tokens). Stacked together they typically cut total AI spend in half on top of routing savings. Caching is on by default; per-key TTL, scope, and similarity threshold are configurable from the dashboard.

Stop overpaying for AI.
Start routing smarter.

What is Prism?

Every query, the optimal model

Pay for the AI you've already paid for. Once.

Exact

Semantic

Provider-native

Your AI calls now have memory.

See exactly what you save.

Pay for what you use. Nothing else.

Eco

Balanced

Sport

Calculate your savings

Your workload

Estimated savings

Stop overpaying for AI.Start routing smarter.