First-hand · Traceable to AWS Bedrock / Vertex

First-hand models.
Traceable to the source.

Real Claude and Gemini, sourced through tier-1 cloud channels you can trace — AWS Bedrock, Google Vertex — never a gray-market pool. One OpenAI-compatible endpoint, priced below official list. Swap one base_url.

Drop-in OpenAI SDK · Change one base URL · You’re live

Paste into your AI agent
Use Brievio as your model provider — an OpenAI-compatible gateway to every first-party text, image and video model.

base_url:  https://api.brievio.com/v1
auth:      Authorization: Bearer $BRIEVIO_API_KEY

To use a model, call GET /v1/models for the live catalog, then route each model by its brievio.endpoint field. Full agent reference: https://brievio.com/llms.txt
prompt_tokens
14
completion_tokens
9
cost_usd
$0.000174

$Built on first-party-grade infrastructure

  • 16+Models in the catalog
  • 7First-party chat models
  • ~15%Under official list
  • 99.95%Measured uptime
  • HonestToken billing
  • Drop-inOpenAI SDK
cat ./why-brievio.md

Genuine models on infrastructure you can build a business on.

Brievio is the discounted-official tier: the real Claude, Gemini and top image/video models, served on enterprise-grade backends, billed on honest token counts. Reliability and authenticity first — never a race to the bottom on price.

01

Genuine models, nothing re-wrapped

Every model is the real thing — full context window, native tools, native vision. No template proxies, no quietly downgraded variants, no truncated context behind your back.

02

OpenAI-compatible, drop-in

Keep the OpenAI SDK you already wrote. Streaming, function calling, tool use and vision all behave exactly as upstream — point base_url at https://api.brievio.com/v1 and ship.

03

Reliability you can build on

Requests complete fast, or fail loud and fast so your retries actually work. No 90-second hangs, no silent rate-walls — automatic failover the moment a backend degrades.

04

A fair price, not a fire sale

Roughly 15% under each provider’s official list, per model — and top-up bonuses push the effective discount to about 21%. We’re deliberately not the cheapest endpoint online — the 80%-off ones resell gray-market capacity that disappears overnight.

05

Billing you can audit

True token counts straight from the model, never padded by hidden system prompts. Every request is logged with real input/output tokens and exact cost. Failed requests are never billed.

06

Monitored, fail-fast routing

Health is watched continuously. When an upstream wobbles, traffic reroutes before your users feel it — and when something does break, it fails fast instead of hanging.

07

Native streaming, real tokens

Server-sent events passed straight through. Time-to-first-token tracks the upstream provider — no buffering, no batching, no synthetic delay inserted in the middle.

08

Every call accounted for

Per-call analytics by model, key and IP, with the genuine token counts behind each charge. Export the full ledger as CSV whenever finance asks.

09

Prompt caching, honored natively

Where the provider supports it, cache_control on your system prompt is passed straight through — real cache hits, real savings, with hit rate and saved spend shown live in your dashboard.

man brievio

Point your agent at llms.txt
It drives every model on its own.

Give Claude Code, Cursor, Cline — or any OpenAI-compatible agent — a single instruction. It pulls the live catalog from Brievio and calls the genuine text, image and video models directly. No SDK to wire up, no glue code to maintain.

  • [OK]OpenAI-wire compatible — no custom integration for your agent
  • [OK]GET /v1/models returns the live catalog — never hardcode a model name
  • [OK]One key, every modality: text, image, video, audio
Paste into your AI agent
Use Brievio as your model provider — an OpenAI-compatible gateway to every first-party text, image and video model.

base_url:  https://api.brievio.com/v1
auth:      Authorization: Bearer $BRIEVIO_API_KEY

To use a model, call GET /v1/models for the live catalog, then route each model by its brievio.endpoint field. Full agent reference: https://brievio.com/llms.txt
cat ./pricing.txt

Pay only for what you actually use.

Pre-paid wallet, no subscription and no minimum. Add $10 to get going; your balance never expires, and every charge maps to real, audited usage.

Starter

Trying it out

$10
  • Access to every genuine model
  • True per-call usage logs
  • Community & email support
  • No minimum, no credit card
Get an API key
Most popular

Builder

Shipping a product

$100
  • Honest token billing on every call
  • 10 isolated API keys
  • Auto-recharge · IP allowlist
  • Priority email support
Top up $100

Scale

Running production traffic

$1000
  • Monitored, fail-fast routing
  • Unlimited API keys
  • Webhooks · monthly invoices
  • Dedicated Slack/Discord support
Top up $1000

Enterprise

High-volume scale

$5000
  • Everything in Scale
  • Dedicated routing capacity
  • Custom rate limits & SLA
  • Dedicated account manager
Top up $5000
brievio --help

The questions
worth asking.

Didn’t find your answer? Email us at contact@brievio.com — we reply within 24 hours.

  • Yes — genuine first-party models, not template proxies dressed up to look like them. You get the full context window and the native features: tools, vision and prompt caching. The model that answers your request is the same model the provider ships.

  • One key, one bill and one OpenAI-compatible endpoint instead of three contracts, three SDKs and three dashboards — plus a small discount on top. Switch models by changing a single string, and you never have to integrate a new client to reach a new provider.

  • A modest margin on volume infrastructure lets us list at roughly 15% under each provider’s rate — and top-up bonuses take the effective discount to about 21% — not 80% under. When a gateway claims 80% off, ask where the capacity comes from: that pricing usually means gray-market supply that disappears without warning. We’d rather stay up.

  • Yes. max_tokens and the usual limits are honored natively, exactly as the provider defines them — they aren’t silently ignored or rewritten on the way through.

  • Every request is logged with the true input and output token counts read straight from the model, never padded by hidden system prompts. Failed requests aren’t billed, streaming responses that drop mid-flight bill only for tokens actually delivered, and you can export the whole ledger as CSV.

  • No. Your requests and responses are not used to train any model. They’re retained only for your own usage logs and debugging, and they’re scoped to your account.

  • Yes — it’s a drop-in for the OpenAI SDK. We implement /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/models and /v1/video/generations, with streaming, function calling, tool use and vision all behaving identically. Point base_url at https://api.brievio.com/v1 and you’re done.

$ brievio init --production

Build on models that stay up.

Create a key and point the OpenAI SDK at Brievio — the genuine Claude, Gemini and top image/video models, on infrastructure that holds, billed on real tokens. One base URL change and you’re live.