Stop overpaying for AI.
Start routing smarter.

Prism routes simple queries to Gemini Flash or Claude Haiku, and complex queries to Claude Sonnet or Opus. Classification is rule-based and runs per request. Your code doesn't change — just swap the base URL.

client.py
-base_url = "https://api.openai.com/v1"
+base_url = "https://api.prism.ssimplifi.com/v1"

What is Prism?

Prism is an OpenAI-compatible HTTP API proxy at api.prism.ssimplifi.com/v1. It classifies each request as simple, code, reasoning, or complex, then routes it to the cheapest model capable of handling it — across Anthropic Claude (Opus, Sonnet, Haiku), OpenAI (GPT-4o, GPT-4o-mini), and Google Gemini (2.5 Pro, 2.5 Flash). Integration is a one-line URL change; the request and response formats are identical to the OpenAI Chat Completions API. Three-layer response caching, session memory, and automatic provider failover are built in.

Providers
3
Models
7
Free tier
50K tokens/day
Markup
15–30%

Every query, the optimal model

Prism classifies your query and routes it to the cheapest model that can handle it well.

Summarize this paragraph

simple

Debug this Python function

code

Analyse quarterly revenue trends

reasoning

Translate to Hindi

simple

Explain how TCP handshake works

complex

Fast

Gemini Flash, Haiku

$0.05-0.12

Mid

Sonnet, GPT-4o

$0.70-0.80

Premium

Opus

$2.50

Quality floor: complex tasks always get capable models, even in Eco mode.

Direct (single model)

$0.00

Through Prism

$0.00

Pay for the AI you've already paid for. Once.

Most production AI traffic is repeat traffic. Prism stacks three caching layers and skips the model when the answer already exists.

Exact

Byte-identical request → previous response. Sub-10ms, zero model cost. Catches the 5–15% of traffic that repeats verbatim.

Semantic

Same meaning, different words. Cosine-similarity match against your prior responses. Catches the 30–60% of near-duplicates that exact misses.

Provider-native

Anthropic prompt caching + OpenAI cached input. 60–90% off the input tokens of stable system prompts, even on cache misses.

Stacked, these layers typically cut total AI spend in half on top of routing savings. Read the math →

Your AI calls now have memory.

Add one header. Prism remembers the conversation. No database. No history management.

What you send

API call 1
userMy name is Ravi

What Prism sends to provider

1 message
userMy name is Ravi

You sent 1 message. Prism sent 1 message.

3 API calls. You sent 3 messages. Prism handled 9 messages of history behind the scenes. No conversation database. No history management. One header.

See exactly what you save.

prism.ssimplifi.com/dashboard

Balance

$47.60

Saved this month

41%

Calls today

847

Active sessions

23

Mode distribution

Eco
72%
Balanced
24%
Sport
4%

Every call logged. Every model choice transparent. Export to CSV anytime.

Pay for what you use. Nothing else.

15%markup

Eco

Maximum savings. Routes aggressively to fast models while keeping a quality floor.

20%markup

Balanced

Best of both worlds. Smart routing optimizes cost without compromising quality.

30%markup

Sport

Best model for every task. Quality first, cost second.

Free tier: 50K tokens/day, eco mode, no credit card required.

Most developers save 30–50% on their AI spend after Prism markup. Three-layer response caching typically cuts the rest in half on top.

Calculate your savings

Enter your real workload. Defaults reflect a typical customer-support bot.

Your workload

System prompt + retrieved context + user message.

Length of the model's reply.

List price: $3.00 input / $15.00 output per 1M tokens

Quality mode

Estimated savings

$204.58/month saved

45% off your direct Claude Sonnet 4 cost.

Direct Claude Sonnet 4 cost$450.00
Saved by exact + semantic cache− $135.36
Saved by provider-native cache− $110.12
Prism markup (balanced, +20%)+ $40.90
Net Prism cost$245.42
Get API key — free

Estimate based on a 30% combined cache hit rate (8% exact + 22% semantic). Real numbers depend on your traffic mix.