Is this the real Claude / Gemini?

Yes — genuine first-party models, not template proxies dressed up to look like them. You get the full context window and the native features: tools, vision and prompt caching. The model that answers your request is the same model the provider ships.

Why not go direct to each provider?

One key, one bill and one OpenAI-compatible endpoint instead of three contracts, three SDKs and three dashboards — plus a small discount on top. Switch models by changing a single string, and you never have to integrate a new client to reach a new provider.

How are you cheaper than official, and why not cheaper still?

A modest margin on volume infrastructure lets us list at roughly 15% under each provider’s rate — and top-up bonuses take the effective discount to about 21% — not 80% under. When a gateway claims 80% off, ask where the capacity comes from: that pricing usually means gray-market supply that disappears without warning. We’d rather stay up.

Can I cap cost per request?

Yes. max_tokens and the usual limits are honored natively, exactly as the provider defines them — they aren’t silently ignored or rewritten on the way through.

How do I know my bill is accurate?

Every request is logged with the true input and output token counts read straight from the model, never padded by hidden system prompts. Failed requests aren’t billed, streaming responses that drop mid-flight bill only for tokens actually delivered, and you can export the whole ledger as CSV.

Do you train on my data?

No. Your requests and responses are not used to train any model. They’re retained only for your own usage logs and debugging, and they’re scoped to your account.

Is it really OpenAI-compatible?

Yes — it’s a drop-in for the OpenAI SDK. We implement /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/models and /v1/video/generations, with streaming, function calling, tool use and vision all behaving identically. Point base_url at https://api.brievio.com/v1 and you’re done.

First-hand · Traceable to AWS Bedrock / Vertex

First-hand models.
Traceable to the source.

Real Claude and Gemini, sourced through tier-1 cloud channels you can trace — AWS Bedrock, Google Vertex — never a gray-market pool. One OpenAI-compatible endpoint, priced below official list. Swap one base_url.

Get an API key Read the docs

Drop-in OpenAI SDK · Change one base URL · You’re live

Paste into your AI agent

Use Brievio as your model provider — an OpenAI-compatible gateway to every first-party text, image and video model.

base_url:  https://api.brievio.com/v1
auth:      Authorization: Bearer $BRIEVIO_API_KEY

To use a model, call GET /v1/models for the live catalog, then route each model by its brievio.endpoint field. Full agent reference: https://brievio.com/llms.txt

prompt_tokens

14

completion_tokens

9

cost_usd

$0.000174

cat ./why-brievio.md

Genuine models on infrastructure you can build a business on.

Brievio is the discounted-official tier: the real Claude, Gemini and top image/video models, served on enterprise-grade backends, billed on honest token counts. Reliability and authenticity first — never a race to the bottom on price.

01

Genuine models, nothing re-wrapped

Every model is the real thing — full context window, native tools, native vision. No template proxies, no quietly downgraded variants, no truncated context behind your back.

02

OpenAI-compatible, drop-in

Keep the OpenAI SDK you already wrote. Streaming, function calling, tool use and vision all behave exactly as upstream — point base_url at https://api.brievio.com/v1 and ship.

03

Reliability you can build on

Requests complete fast, or fail loud and fast so your retries actually work. No 90-second hangs, no silent rate-walls — automatic failover the moment a backend degrades.

04

A fair price, not a fire sale

Roughly 15% under each provider’s official list, per model — and top-up bonuses push the effective discount to about 21%. We’re deliberately not the cheapest endpoint online — the 80%-off ones resell gray-market capacity that disappears overnight.

05

Billing you can audit

True token counts straight from the model, never padded by hidden system prompts. Every request is logged with real input/output tokens and exact cost. Failed requests are never billed.

06

Monitored, fail-fast routing

Health is watched continuously. When an upstream wobbles, traffic reroutes before your users feel it — and when something does break, it fails fast instead of hanging.

07

Native streaming, real tokens

Server-sent events passed straight through. Time-to-first-token tracks the upstream provider — no buffering, no batching, no synthetic delay inserted in the middle.

08

Every call accounted for

Per-call analytics by model, key and IP, with the genuine token counts behind each charge. Export the full ledger as CSV whenever finance asks.

09

Prompt caching, honored natively

Where the provider supports it, cache_control on your system prompt is passed straight through — real cache hits, real savings, with hit rate and saved spend shown live in your dashboard.

−90%

ls ./use-cases

What to build with Brievio.

Browse all use cases

ls ./models --provider

The genuine models — full context, native features.

Browse all models

Anthropic

Claude Opus 4.7

new

Anthropic's newest Opus — flagship reasoning, vision, 200K context.

visionfunctionstreamingthinking

$4.25/$21.25

per 1M tokens

Anthropic

Claude Opus 4.6

Anthropic Opus 4.6 — deep reasoning, exceptional agentic ability.

visionfunctionstreamingthinking

$4.25/$21.25

per 1M tokens

Anthropic

Claude Sonnet 4.6

hot

Balanced speed/quality — the everyday production workhorse, elite coding.

visionfunctionstreamingthinking

$2.55/$12.75

per 1M tokens

Anthropic

Claude Sonnet 4.5

Anthropic Sonnet 4.5 — production workhorse.

visionfunctionstreamingthinking

$2.55/$12.75

per 1M tokens

Anthropic

Claude Haiku 4.5

Anthropic Haiku 4.5 — fast and cost-efficient.

visionfunctionstreaming

$0.85/$4.25

per 1M tokens

Google

Gemini 2.5 Pro

Previous-gen Gemini Pro — strong reasoning and vision.

visionfunctionstreamingthinking

$1.0625/$8.50

per 1M tokens

Google

Gemini 2.5 Flash

Previous-gen Gemini Flash — extreme value.

visionfunctionstreaming

$0.255/$2.125

per 1M tokens

man brievio

Point your agent at `llms.txt`
It drives every model on its own.

Give Claude Code, Cursor, Cline — or any OpenAI-compatible agent — a single instruction. It pulls the live catalog from Brievio and calls the genuine text, image and video models directly. No SDK to wire up, no glue code to maintain.

[OK]OpenAI-wire compatible — no custom integration for your agent
[OK]GET /v1/models returns the live catalog — never hardcode a model name
[OK]One key, every modality: text, image, video, audio

Paste into your AI agent

Use Brievio as your model provider — an OpenAI-compatible gateway to every first-party text, image and video model.

base_url:  https://api.brievio.com/v1
auth:      Authorization: Bearer $BRIEVIO_API_KEY

To use a model, call GET /v1/models for the live catalog, then route each model by its brievio.endpoint field. Full agent reference: https://brievio.com/llms.txt

cat ./pricing.txt

Pay only for what you actually use.

Pre-paid wallet, no subscription and no minimum. Add $10 to get going; your balance never expires, and every charge maps to real, audited usage.

Starter

Trying it out

$10

Access to every genuine model
True per-call usage logs
Community & email support
No minimum, no credit card

Get an API key

Builder

Shipping a product

$100

Honest token billing on every call
10 isolated API keys
Auto-recharge · IP allowlist
Priority email support

Top up $100

Scale

Running production traffic

$1000

Monitored, fail-fast routing
Unlimited API keys
Webhooks · monthly invoices
Dedicated Slack/Discord support

Top up $1000

Enterprise

High-volume scale

$5000

Everything in Scale
Dedicated routing capacity
Custom rate limits & SLA
Dedicated account manager

Top up $5000

See the full pricing table

ls ./blog

Recent deep dives.

All posts

brievio --help

The questions
worth asking.

Didn’t find your answer? Email us at contact@brievio.com — we reply within 24 hours.

Yes — genuine first-party models, not template proxies dressed up to look like them. You get the full context window and the native features: tools, vision and prompt caching. The model that answers your request is the same model the provider ships.
One key, one bill and one OpenAI-compatible endpoint instead of three contracts, three SDKs and three dashboards — plus a small discount on top. Switch models by changing a single string, and you never have to integrate a new client to reach a new provider.
A modest margin on volume infrastructure lets us list at roughly 15% under each provider’s rate — and top-up bonuses take the effective discount to about 21% — not 80% under. When a gateway claims 80% off, ask where the capacity comes from: that pricing usually means gray-market supply that disappears without warning. We’d rather stay up.
Yes. max_tokens and the usual limits are honored natively, exactly as the provider defines them — they aren’t silently ignored or rewritten on the way through.
Every request is logged with the true input and output token counts read straight from the model, never padded by hidden system prompts. Failed requests aren’t billed, streaming responses that drop mid-flight bill only for tokens actually delivered, and you can export the whole ledger as CSV.
No. Your requests and responses are not used to train any model. They’re retained only for your own usage logs and debugging, and they’re scoped to your account.
Yes — it’s a drop-in for the OpenAI SDK. We implement /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/models and /v1/video/generations, with streaming, function calling, tool use and vision all behaving identically. Point base_url at https://api.brievio.com/v1 and you’re done.

$ brievio init --production

Build on models that stay up.

Create a key and point the OpenAI SDK at Brievio — the genuine Claude, Gemini and top image/video models, on infrastructure that holds, billed on real tokens. One base URL change and you’re live.

Get your API key Read the docs

First-hand models.
Traceable to the source.

Genuine models on infrastructure you can build a business on.

Genuine models, nothing re-wrapped

OpenAI-compatible, drop-in

Reliability you can build on

A fair price, not a fire sale

Billing you can audit

Monitored, fail-fast routing

Native streaming, real tokens

Every call accounted for

Prompt caching, honored natively

What to build with Brievio.

AI customer support

RAG chatbot API

AI content moderation

AI code assistant

AI data extraction

The genuine models — full context, native features.

Claude Opus 4.7

Claude Opus 4.6

Claude Sonnet 4.6

Claude Sonnet 4.5

Claude Haiku 4.5

Gemini 2.5 Pro

Gemini 2.5 Flash

Point your agent at `llms.txt`
It drives every model on its own.

Pay only for what you actually use.

Starter

Builder

Scale

Enterprise

Recent deep dives.

Token inflation — how some AI gateways bill you 5–25×, and a 20-line test to catch it

在中国稳定调用 Claude / GPT / Gemini — Brievio 中国友好路由实测

用 Veo 3 與 Sora 為台灣品牌做短影片廣告 — 5 分鐘完整教學

The questions
worth asking.

Build on models that stay up.

First-hand models.Traceable to the source.

Genuine models on infrastructure you can build a business on.

Genuine models, nothing re-wrapped

OpenAI-compatible, drop-in

Reliability you can build on

A fair price, not a fire sale

Billing you can audit

Monitored, fail-fast routing

Native streaming, real tokens

Every call accounted for

Prompt caching, honored natively

What to build with Brievio.

AI customer support

RAG chatbot API

AI content moderation

AI code assistant

AI data extraction

The genuine models — full context, native features.

Claude Opus 4.7

Claude Opus 4.6

Claude Sonnet 4.6

Claude Sonnet 4.5

Claude Haiku 4.5

Gemini 2.5 Pro

Gemini 2.5 Flash

Point your agent at llms.txtIt drives every model on its own.

Pay only for what you actually use.

Starter

Builder

Scale

Enterprise

Recent deep dives.

Token inflation — how some AI gateways bill you 5–25×, and a 20-line test to catch it

在中国稳定调用 Claude / GPT / Gemini — Brievio 中国友好路由实测

用 Veo 3 與 Sora 為台灣品牌做短影片廣告 — 5 分鐘完整教學

The questionsworth asking.

Build on models that stay up.

First-hand models.
Traceable to the source.

Point your agent at `llms.txt`
It drives every model on its own.

The questions
worth asking.