qzira AI API Gateway
Technical Reference
Manage and unify multiple AI models with a single API key
qzira is an AI API gateway that lets you manage multiple AI providers — OpenAI, Anthropic, Google AI, and DeepSeek — through a single unified endpoint.
BYOK (Bring Your Own Key) — use your existing API keys as-is. qzira handles request proxying, usage visibility, failover, and rate limiting automatically.
Why use qzira?
- Cost visibility: Monitor request counts and token consumption in real time via the dashboard
- Failover: Automatically switch to another provider when one goes down
- Auto retry: qzira handles 429 retries on your behalf
- Unified key management: Store all provider keys in qzira; your app only needs one
gw_key - Budget alerts & auto-stop: Prevent runaway AI agent costs
- Usage export: Download logs as CSV for reporting or expense trackingNEW
Architecture overview
qzira sits as a proxy between your application and AI providers. One gw_ key grants access to all providers.
This documentation is provided on an as-is basis. AI services evolve rapidly, and each provider's API specifications, pricing, and limits may change without notice. The content reflects information at the time of writing and does not permanently guarantee future behavior. Always check each provider's official documentation for the latest information.
1. Account Setup
1Access the dashboard
Go to https://app.qzira.com.
2Sign in with Google
Click "Sign in with Google" and authenticate with your Google account.
2. Create API Key
After logging in, click "API Keys" in the sidebar.
1Create a new API key
Click "Create new API key" and give it a name (e.g., my-app-key, cursor-dev).
2Copy your API key
The gw_xxxxxxxx key shown immediately after creation must be copied and stored securely. It will only be displayed once.
3. Register Provider API Keys (BYOK)
Click "Providers" in the sidebar.
Supported Providers
| Provider | Where to get your key | Key format |
|---|---|---|
| OpenAI | platform.openai.com/api-keys | sk-... |
| Anthropic | console.anthropic.com/settings/keys | sk-ant-... |
| Google AI | aistudio.google.com/apikey | AIza... |
| DeepSeek | platform.deepseek.com/api_keys | sk-... |
Registration steps
- Click "Register API key" for the provider you want to use
- Enter your API key
- Click "Register" — qzira will automatically validate the key
4. Enable Providers
Registering a provider key alone does not enable routing through that provider. You also need to enable it.
How to enable
In the Providers screen, click the "Enable" button for the provider you want to use.
Simultaneous active providers by plan
| Plan | Max active providers |
|---|---|
| Free | 1 |
| Starter | 3 |
| Pro and above | Unlimited |
5. Code Migration
qzira provides an OpenAI-compatible API endpoint. Migration requires only two changes: Base URL and API key.
Endpoint info
| Item | Value |
|---|---|
| Base URL | https://api.qzira.com/v1 |
| Endpoint (OpenAI-compatible) | /chat/completions |
| Endpoint (Anthropic-compatible) | /v1/messages NEW |
| Authentication | Authorization: Bearer gw_xxxxxxxx or x-api-key: gw_xxxxxxxx |
Migration example: Python (OpenAI SDK)
Before (direct call):
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxx" # OpenAI API key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
After (via qzira):
from openai import OpenAI
client = OpenAI(
api_key="gw_xxxxxxxx", # qzira API key
base_url="https://api.qzira.com/v1" # qzira endpoint
)
response = client.chat.completions.create(
model="gpt-4o", # model name unchanged
messages=[{"role": "user", "content": "Hello"}]
)
Migration example: Python (Anthropic SDK → OpenAI-compatible)
Before (direct Anthropic SDK call):
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-xxxxxxxx")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello"}]
)
After (via qzira — OpenAI-compatible format):
from openai import OpenAI
client = OpenAI(
api_key="gw_xxxxxxxx",
base_url="https://api.qzira.com/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-20250514", # specify Claude model directly
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
)
system messages are handled automatically.
Migration example: Python (Google Gemini → OpenAI-compatible)
from openai import OpenAI
client = OpenAI(
api_key="gw_xxxxxxxx",
base_url="https://api.qzira.com/v1"
)
response = client.chat.completions.create(
model="gemini-2.0-flash", # Gemini model name unchanged
messages=[{"role": "user", "content": "Hello"}]
)
Migration example: JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "gw_xxxxxxxx",
baseURL: "https://api.qzira.com/v1",
});
const response = await client.chat.completions.create({
model: "claude-sonnet-4-20250514",
messages: [{ role: "user", content: "Hello" }],
});
Managing keys with environment variables (recommended)
.env file:
QZIRA_API_KEY=gw_xxxxxxxx
QZIRA_BASE_URL=https://api.qzira.com/v1
Python:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("QZIRA_API_KEY"),
base_url=os.getenv("QZIRA_BASE_URL")
)
Streaming support
qzira supports SSE (Server-Sent Events) streaming for all providers.
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Migration example: Anthropic SDK (native format) NEW
If you use the Anthropic SDK directly, you can use qzira's /v1/messages endpoint in native format.
base_url to https://api.qzira.com (without /v1). The SDK automatically appends /v1/messages.
Before (direct call):
import anthropic
client = anthropic.Anthropic(
api_key="sk-ant-xxxxxxxx" # Anthropic API key
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
After (via qzira — only 2 lines change):
import anthropic
client = anthropic.Anthropic(
api_key="gw_xxxxxxxx", # qzira API key
base_url="https://api.qzira.com" # ⚠️ no /v1
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
Migration example: TypeScript / JavaScript (Anthropic SDK)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: "gw_xxxxxxxx", // qzira API key
baseURL: "https://api.qzira.com" // ⚠️ no /v1
});
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }]
});
Supported models (major examples)
| Provider | Example models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3-mini |
| Anthropic | claude-sonnet-4-20250514, claude-3-5-haiku-20241022, claude-3-opus-20240229 |
| Google AI | gemini-2.0-flash, gemini-2.5-flash, gemini-2.5-pro |
| DeepSeek | deepseek-chat, deepseek-reasoner |
6. Testing with curl
You can test qzira instantly using curl commands.
OpenAI model
curl -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}'
Anthropic model
curl -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-5-haiku-20241022",
"messages": [{"role": "user", "content": "Hello"}]
}'
Google AI model
curl -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash",
"messages": [{"role": "user", "content": "Hello"}]
}'
Anthropic native format (/v1/messages) NEW
curl -X POST https://api.qzira.com/v1/messages \
-H "x-api-key: gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}'
/v1/messages streaming
curl -N -X POST https://api.qzira.com/v1/messages \
-H "x-api-key: gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Count from 1 to 5"}],
"stream": true
}'
Streaming test
curl -N -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Count from 1 to 5"}],
"stream": true
}'
PowerShell (Windows)
$headers = @{
"Authorization" = "Bearer gw_xxxxxxxx"
"Content-Type" = "application/json"
}
$body = '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
Invoke-RestMethod -Uri "https://api.qzira.com/v1/chat/completions" `
-Method POST -Headers $headers -Body $body
Successful response example
{
"id": "chatcmpl-xxxxx",
"object": "chat.completion",
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 9,
"total_tokens": 17
},
"provider": "openai"
}
provider field so you can see which provider handled the request.
Tool Calling (Function Calling) NEW
qzira supports Tool Calling for all three providers. Simply send the OpenAI-compatible tools parameter and qzira relays it in the appropriate format for each provider.
Tool Calling support by provider
| Provider | Method | Status |
|---|---|---|
| OpenAI | Pass-through (native OpenAI format) | ✅ Supported |
| Anthropic | Pass-through (SDK-handled) | ✅ Supported |
| DeepSeek | Pass-through (OpenAI-compatible) | ✅ Supported |
| Google (Gemini) | Auto-conversion (OpenAI tools → Gemini functionDeclarations) | ✅ Supported |
tools to Gemini-format functionDeclarations, and converts the response's functionCall back to tool_calls. No changes needed on your end.
Tool Calling test (curl)
OpenAI model
curl -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. Tokyo, Japan"
}
},
"required": ["location"]
}
}
}
]
}'
Gemini model (auto-converted)
# Same tools format as OpenAI — qzira auto-converts
curl -X POST https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer gw_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{"role": "user", "content": "What is the weather in Osaka?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. Osaka, Japan"
}
},
"required": ["location"]
}
}
}
]
}'
Tool Calling response example
{
"id": "chatcmpl-xxxxx",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_xxxxx",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Tokyo, Japan\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 53,
"completion_tokens": 6,
"total_tokens": 59
},
"provider": "openai"
}
tool_choice options
| Value | Behavior | Gemini conversion |
|---|---|---|
"auto" (default) | Model decides whether to call a tool | AUTO |
"required" | Model must call at least one tool | ANY |
"none" | No tool calls | NONE |
{"type":"function","function":{"name":"xxx"}} | Call a specific tool | ANY + allowedFunctionNames |
Streaming with Tool Calling
Tool Calling works correctly with streaming ("stream": true). Chunk structure when a tool is called:
# Chunk 1: role
data: {"choices":[{"delta":{"role":"assistant"},...}]}
# Chunk 2: tool_calls (function name + arguments)
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_xxx","type":"function","function":{"name":"get_weather","arguments":"{...}"}}]},...}]}
# Chunk 3: done
data: {"choices":[{"delta":{},"finish_reason":"tool_calls",...}]}
# Chunk 4: usage
data: {"usage":{"prompt_tokens":24,"completion_tokens":6}}
data: [DONE]
tools) from before Tool Calling was added may be returned. If results are unexpected, slightly modify the message content to bypass the cache.
7. Cursor / AI Agent Integration
Cursor Verified
You can add qzira as an OpenAI-compatible API provider in Cursor's settings. Manage all OpenAI, Anthropic, and Google models with a single API key.
Prerequisites
- Cursor Pro or higher subscription (Cursor pricing)
- qzira account (Free plan available)
- The provider you want to use must be enabled in the qzira dashboard (see Section 4)
Step 1: Set up qzira
- Log in to app.qzira.com
- Under "Providers" in the sidebar, enable the providers you want:
- OpenAI: for GPT-4o, GPT-4o-mini, etc.
- Anthropic: for Claude Sonnet, Claude Haiku, etc.
- Google AI: for Gemini 2.0 Flash, Gemini 2.5 Pro, etc.
- DeepSeek: for DeepSeek V3 (deepseek-chat), DeepSeek R1 (deepseek-reasoner)
- Under "API Keys," create a key for Cursor (e.g.,
cursor-dev) - Copy the displayed
gw_xxxxxxxxkey (⚠️ shown only once)
Step 2: Configure Cursor
- Open Cursor and go to Settings → Models
- Enter your qzira API key in OpenAI API Key:
gw_xxxxxxxx - Enter the following in Override OpenAI Base URL:
https://api.qzira.com/v1 - Manually add the models you want (click "+ Add model"):
gpt-4o-mini(OpenAI)gpt-4o(OpenAI)claude-sonnet-4-20250514(Anthropic)gemini-2.0-flash(Google)gemini-2.5-pro(Google)
- Toggle the added models ON
claude-3.5-sonnet use Cursor's built-in API. To use them via qzira, add the exact model name manually as described above.
Verification checklist
- Select model: In Cursor chat or editor, select a manually added model (e.g.,
gemini-2.0-flash) - Test send: Send a simple prompt ("Hello") and confirm a response is returned
- Check dashboard: Confirm the request is logged at app.qzira.com
- Check provider: Confirm the correct provider name appears in the log's
providercolumn
Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
| 401 Unauthorized | Invalid or mistyped API key | Check key status in qzira dashboard; create a new key if needed |
| 403 Provider not enabled | Provider not enabled | Dashboard → Providers → enable the provider and set its API key |
| 403 Plan upgrade required | Feature not available on Free plan | Upgrade qzira plan to Pro or higher |
| Model not visible in selector | Model not added or toggle is OFF | Settings → Models → "+ Add model" then toggle ON |
| No response / timeout | Base URL misconfigured | Confirm https://api.qzira.com/v1 is entered correctly (no trailing slash) |
Claude Code NEW
Claude Code switches to qzira by setting environment variables. All requests are logged in the dashboard, and budget controls and auto-stop apply.
ANTHROPIC_BASE_URL to https://api.qzira.com (without /v1). Claude Code automatically appends /v1/messages.
Linux / macOS
# Set environment variables
export ANTHROPIC_BASE_URL="https://api.qzira.com"
export ANTHROPIC_API_KEY="gw_xxxxxxxx"
# Start Claude Code
claude
PowerShell (Windows)
# Set environment variables
$env:ANTHROPIC_BASE_URL = "https://api.qzira.com"
$env:ANTHROPIC_API_KEY = "gw_xxxxxxxx"
# Start Claude Code
claude
Persist via ~/.bashrc or ~/.zshrc
# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_BASE_URL="https://api.qzira.com"
export ANTHROPIC_API_KEY="gw_xxxxxxxx"
Verification
- First launch: The Anthropic OAuth screen may appear (select your organization). This is Claude Code's own auth and is separate from qzira.
- Auth conflict warning: "Using ANTHROPIC_API_KEY instead of Anthropic Console key" means the qzira key is being used ✅
- Automatic model switching: Claude Code automatically uses Haiku (lightweight tasks) and Sonnet (responses). All requests are logged in the dashboard.
- Dashboard check: Confirm requests with provider:
anthropicappear in the usage log at app.qzira.com.
Cline (VS Code extension)
In Cline's settings, change "API Provider" to "OpenAI Compatible" and enter:
- Base URL:
https://api.qzira.com/v1 - API Key:
gw_xxxxxxxx - Model: the model name you want (e.g.,
claude-sonnet-4-20250514)
Windsurf
⚠️ Windsurf does not currently support qzira integration.
Windsurf's BYOK feature only supports Claude 4 Sonnet / Opus, and API keys are set in the Windsurf management panel (windsurf.com/subscription/provider-api-keys). There is no custom Base URL or OpenAI Compatible provider setting, so routing through an API gateway like qzira is not possible.
Source: Windsurf Docs — AI Models (verified February 2026)
Roo Code (VS Code extension)
Roo Code has been verified to work in both OpenAI Compatible mode and Anthropic mode (v3.47.3).
Method A: OpenAI Compatible mode (recommended)
- API Provider:
OpenAI Compatible - Base URL:
https://api.qzira.com/v1 - API Key:
gw_xxxxxxxx - Model:
gpt-4o-mini(orgpt-4o)
Method B: Anthropic mode
- API Provider:
Anthropic - ☑ Check "Use custom base URL"
- Base URL:
https://api.qzira.com(⚠️ no/v1) - API Key:
gw_xxxxxxxx - Model:
claude-sonnet-4-20250514
Other AI agents
Any tool that supports configuring an OpenAI-compatible Base URL and API key can be integrated the same way.
| Tool | Where to configure |
|---|---|
| Cursor | Settings → Models → OpenAI API Key |
| Claude Code | Env vars: ANTHROPIC_BASE_URL=https://api.qzira.com (no /v1) + ANTHROPIC_API_KEY=gw_xxx |
| Cline | API Provider → OpenAI Compatible |
| Windsurf | ❌ No custom Base URL support (BYOK limited to Claude, URL not configurable) |
| Roo Code | API Provider → OpenAI Compatible or Anthropic (custom base URL) |
| Continue.dev | apiBase in config.json |
| Aider | --openai-api-base option |
| LangChain | base_url parameter |
Tool × Provider Compatibility Matrix
Verified compatibility of AI coding tools via qzira (as of February 2026).
📎 Sources & References
- Claude Code: Official — Third-party integrations
- Windsurf: Official — Models
- Roo Code: Official — OpenAI Compatible / Anthropic
- Cursor: Official Documentation
- Cline: Official Documentation
8. Monitoring Usage in the Dashboard
Log in at https://app.qzira.com/dashboard to see the following in real time.
Dashboard overview
- This month's requests: Total monthly request count (vs. plan limit)
- Usage rate: Percentage of plan limit used
- Input / output tokens: Total token consumption
- Daily request graph: Visualizes historical request trends
Recent requests
Details for each request:
| Field | Description |
|---|---|
| API key name | Name of the API key used for the requestNEW |
| Model | Model name used (e.g., claude-sonnet-4-20250514) |
| Provider | Responding provider (e.g., Anthropic) |
| Tokens | Input / output token count |
| Latency | Response time (milliseconds) |
| Status | Success / Error |
| Cost NEW | Estimated cost (USD / JPY switchable). Failed requests show "—" |
| Tool Calls 🔧 NEW | Number of tool calls. Click to view function names and arguments |
| Timestamp | Request date and time |
Usage export (CSV) NEW
Request logs can be downloaded as CSV. Useful for expense reporting and internal analytics.
Click the "Export CSV" button in the "Request Logs" section of the dashboard to download the displayed log data as a CSV file.
CSV columns
| Column | Description |
|---|---|
id | Log ID |
api_key_name | Name of the API key used |
model | Model name |
provider | Provider name |
input_tokens | Input token count |
output_tokens | Output token count |
latency_ms | Response time (ms) |
status | Status code |
created_at | Request timestamp |
estimated_cost_usd | Estimated cost (USD)NEW |
estimated_cost_jpy | Estimated cost (JPY)NEW |
tool_calls | Tool call details (JSON format)NEW |
Log retention period NEW
Request logs are automatically retained for a period based on your plan. Logs older than the retention period are automatically deleted daily.
| Plan | Log retention |
|---|---|
| Free | 3 days |
| Starter | 30 days |
| Pro | 30 days |
| Business | 90 days |
| Scale | 365 days |
9. Budget Alerts & Auto-Stop
Click "Budget" in the sidebar to configure cost controls (Starter and above).
Usage aggregation is not real-time — it updates at intervals. As a result, spending may exceed the configured limit depending on aggregation timing.
This is common behavior in many API environments, and qzira behaves similarly.
qzira keeps aggregation intervals to a few minutes to minimize overage, but instantaneous hard stops are not guaranteed.
In addition to the standard KV-based budget check (up to ~5 min delay), Scale plan users can enable instant enforcement via direct D1 query.
| Item | Detail |
|---|---|
| Plan | Scale only (visible when Auto-Stop is enabled) |
| Effect | Blocks the very next request after the limit is reached — no ~5 min delay |
| Latency overhead | +15–40ms per request (D1 query) |
| How to enable | Dashboard → Budget Settings → Realtime Budget Stop toggle |
Budget management modes
qzira supports two budget modes: request-count-based and cost-based (USD). Switch between them in the Budget settings page of the dashboard.
| Mode | Unit | Characteristics |
|---|---|---|
| Request count | API request count | Simple. Each request = 1 count, regardless of model or token usage |
| Cost (USD) | Estimated API cost (USD) | Manages based on estimated cost from token usage. Prevents overuse of expensive models |
Configuration options
| Setting | Description | Available plan |
|---|---|---|
| Monthly limit | Max monthly request count or cost (USD) | All plans |
| Daily limit | Max daily request count or cost (USD) | Starter and above |
| Budget alert notifications | Email notification at 50% / 80% / 100% | Starter and above |
| Auto-stop | Automatically block requests when limit is reached | Pro and above |
Exchange rate display (USD / JPY) NEW
In cost mode, the USD budget setting also supports JPY display.
| Item | Details |
|---|---|
| Rate source | ExchangeRate-API (open.er-api.com) — daily updates |
| Update frequency | Once daily (UTC 15:00) |
| Cache | KV store, 48-hour cache |
| Fallback | Fixed rate ¥150/USD if API fetch fails |
| Currency switch | Toggle USD ↔ JPY in the input form |
About cost estimation
Estimated costs in cost mode are calculated from each request's token usage and each model's published pricing. Please note:
- Estimated costs are approximations and may not match actual provider billing
- If a provider changes pricing, there may be a lag before it's reflected
- Cost estimation accuracy may be lower for some models or special requests (e.g., image input)
10. Plan Upgrade
Click "Plan & Billing" in the sidebar to change your plan.
Plan comparison
| Plan | Monthly | Requests/mo | Active providers | API keys | Log retention |
|---|---|---|---|---|---|
| Free | $0 | 1,000 | 1 | 1 | 3 days |
| Starter | $5 | 10,000 | 3 | 2 | 30 days |
| Pro | $10 | 100,000 | Unlimited | 5 | 30 days |
| Business | $29 | 500,000 | Unlimited | 50 | 90 days |
| Scale | $69 | 3,000,000 | Unlimited | 100 | 365 days |
Key features by plan
| Feature | Free | Starter | Pro | Business | Scale |
|---|---|---|---|---|---|
| Streaming | ✅ | ✅ | ✅ | ✅ | ✅ |
| API key rotation | ✅ | ✅ | ✅ | ✅ | ✅ |
| Usage export (CSV) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Auto retry | — | ✅ | ✅ | ✅ | ✅ |
| Failover | — | ✅ | ✅ | ✅ | ✅ |
| Budget alerts | — | ✅ | ✅ | ✅ | ✅ |
| Budget limit & auto-stop | — | — | ✅ | ✅ | ✅ |
| Response cache | — | — | ✅ | ✅ | ✅ |
| Semantic cache | — | — | — | ✅ | ✅ |
| Per-key budget | — | — | — | ✅ | ✅ |
| Access control (Per-Key)NEW | — | — | ✅ | ✅ | ✅ |
| Smart RoutingNEW | — | — | — | — | ✅ |
| Secret Shield NEW | — | — | — | — | ✅ |
| Priority support | — | — | — | ✅ | ✅ |
11. API Key Rotation NEW
As a security best practice, we recommend rotating (regenerating) API keys regularly. qzira supports one-click rotation from the dashboard.
What is rotation?
When you rotate a key, the existing API key is immediately invalidated and a new gw_ key is issued. The key's ID (internal identifier) and name are preserved, so dashboard usage history and settings carry over.
Rotation steps
- Click "API Keys" in the sidebar
- Click the "Rotate" button (🔄 icon) for the key you want to rotate
- Select "Execute rotation" in the confirmation dialog
- The new key is displayed — copy and store it securely immediately
401 Unauthorized.
1. Have a way to receive the new key ready (open your
.env file for editing)2. Execute rotation → immediately copy the new key
3. Update the new key in all apps, agents, and CI/CD pipelines
4. Confirm requests from the new key appear in the dashboard
※ A grace period feature is being considered for future implementation.
When to rotate
- When a key may have been accidentally exposed (log output, git history, etc.)
- When a team member leaves or changes role
- Periodically as a security measure (every 30–90 days recommended)
- When suspicious requests are detected in the dashboard
.env file), just update the value after rotation — that's it.
Post-rotation update example
# Update .env file
QZIRA_API_KEY=gw_yyyyyyyy # ← Replace with new key
QZIRA_BASE_URL=https://api.qzira.com/v1
Restart (or redeploy) your application and the new key will be used automatically.
12. Response Cache
Cache AI provider responses for identical requests to speed up subsequent responses and save tokens (Pro and above).
Benefits
- Cost reduction: Skip provider API calls for identical requests — zero token consumption
- Faster responses: Returned instantly from KV cache (hundreds of ms → tens of ms)
- Provider outage mitigation: During cache TTL, you're unaffected by provider outages
- Zero config: Flip the toggle to enable immediately — no code changes required
Downsides & caveats
- Streaming requests are not cached
- Exact match only (for fuzzy matching, use semantic cache)
- Stale responses may be returned during the TTL period
- Requests containing PII may also be cached
Good fit vs. poor fit
| Good fit | Poor fit |
|---|---|
| Repeated test / debug runs | Generating unique creative content each time |
| Batch processing (same prompt, many runs) | Fetching real-time information |
| FAQ bots / templated responses | Streaming agents |
| Cost minimization use cases | Requests with heavy PII |
Supported plans and TTL
| Plan | Available | Default TTL | Custom TTL |
|---|---|---|---|
| Free | — | — | — |
| Starter | — | — | — |
| Pro | ✅ | 1 hour | Up to 1 hour |
| Business | ✅ | 24 hours | Up to 24 hours |
| Scale | ✅ | 7 days | Up to 7 days |
How to enable
- Click "Cache" in the sidebar
- Toggle "Enable cache" to ON
- Optionally configure custom TTL or temperature limit
Configuration options
| Setting | Description | Default |
|---|---|---|
| Cache on/off | Enable or disable response caching | OFF |
| Custom TTL | Cache retention duration (seconds). Can be set to at most the plan's default TTL. | Plan default |
| Temperature limit | Exclude requests above a specified temperature from caching | No limit (all requests cached) |
How caching works
The following request fields are SHA-256 hashed to detect identical requests:
- User ID
- Model name
- Message content
- temperature / top_p / max_tokens
On a cache hit, the provider request is skipped and the stored response is returned immediately, saving token consumption.
Response headers
When caching is active, the following headers are added to responses:
| Header | Value | Description |
|---|---|---|
X-Cache | EXACT_HIT / SEMANTIC_HIT / MISS | Exact hit / Semantic hit / Miss |
X-Cache-TTL | Seconds | Applied TTL |
X-Semantic-Score | 0.00–1.00 | Semantic cache similarity score (on hit only) |
X-Cache-Skip-Reason | String | Reason caching was skipped (on skip only) |
Requests excluded from caching
- Streaming requests (
"stream": true): SSE format is not suited for caching - Temperature limit exceeded: Requests above the configured temperature limit
- Error responses: Provider error responses are not cached
Reading cache statistics
Cache utilization statistics are shown at the bottom of the cache settings page.
All-time statistics
| Item | Description |
|---|---|
| Total hits | Total number of cache hits |
| Input tokens saved | Total input tokens saved by cache hits |
| Output tokens saved | Total output tokens saved by cache hits |
| Unique prompt count | Number of unique prompts stored in cache |
This month's statistics
Hit count and token savings for the current month. Resets at the start of each month.
Recent cache hits
A list of recent cache hits. Each entry includes:
- Provider name / model name
- Input / output tokens saved
- Hit count
- Last hit timestamp
13. Per-Key Budget Settings
Set individual request limits per API key to control specific applications or agents (Business and above).
Supported plans
| Plan | Per-key budget |
|---|---|
| Free | — |
| Starter | — |
| Pro | — |
| Business | ✅ |
| Scale | ✅ |
Setup steps
- Click "API Keys" in the sidebar
- Click the wallet icon for the key you want to configure
- Set the following in the budget modal:
- Monthly request limit: Max monthly requests
- Daily request limit: Max daily requests
- Auto-stop: Automatically block requests when the limit is reached
- Click "Save"
Configuration options
| Setting | Description | Required |
|---|---|---|
| Monthly request limit | Max monthly requests for this key. Leave blank for unlimited. | Optional |
| Daily request limit | Max daily requests for this key. Leave blank for unlimited. | Optional |
| Auto-stop | When ON, requests from this key are automatically blocked when the limit is reached. | Optional |
How it works
Checks are performed in the following order on each request:
- User-level budget check (Section 9 settings)
- Per-key budget check (this section's settings)
- Only if both pass, the request is forwarded to the provider
Response when limit is reached:
{
"error": {
"message": "API key budget exceeded",
"type": "budget_exceeded",
"code": 403
}
}
Notifications
- Threshold alerts: Email at 50% / 80% (determined by 5-minute cron)
- Limit reached: Immediate email at 100%
Notification emails include the API key name so you can immediately identify which key hit its limit.
Checking current usage
The budget modal shows daily and monthly progress bars so you can check current usage in real time.
Use case examples
- Per-agent limits: 100/day for Cursor key, 200/day for Claude Code key
- Per-project management: Separate budget management for Project A and Project B keys
- Overnight safety: Set daily limit + auto-stop on autonomous agent keys to prevent runaway
14. Access Control (Per-Key)
Whitelist-based controls on which providers and models each API key can access. Control "what can be used" per agent within a team, preventing unintended provider usage or expensive model misuse.
Supported plans
| Plan | Access control |
|---|---|
| Free | — |
| Starter | — |
| Pro | ✅ |
| Business | ✅ |
| Scale | ✅ |
How to configure
- Open Dashboard → API Keys
- Click the 🔐 Access Restrictions button for the target key (visible on Pro and above)
- Provider restriction: Select "Allow all providers" or specific providers
- Options:
openai/anthropic/google
- Options:
- Model restriction: Select "Allow all models" or enter specific models
- Example:
gpt-4o,gpt-4o-mini,claude-sonnet-4-20250514
- Example:
- Click Save to apply settings
403 error. Verify the models your AI agent uses before configuring.
Check priority order
Access control checks are performed in this order:
Request received
│
├─ 1. Provider restriction check (evaluated first)
│ If allowed_providers is set:
│ → Target provider not in list → 403 provider_not_allowed
│
├─ 2. Model restriction check (after provider passes)
│ If allowed_models is set:
│ → Target model not in list → 403 model_not_allowed
│
└─ 3. Both pass → Forward to normal proxy processing
Error responses
Requests that violate access controls return errors in the following format:
| Error code | HTTP status | Trigger |
|---|---|---|
provider_not_allowed | 403 | Request to a disallowed provider |
model_not_allowed | 403 | Request with a disallowed model |
Response examples:
// Provider restriction error
{
"error": {
"message": "Provider 'google' is not allowed for this API key",
"code": "provider_not_allowed"
}
}
// Model restriction error
{
"error": {
"message": "Model 'o1-mini' is not allowed for this API key",
"code": "model_not_allowed"
}
}
Use cases
| Scenario | Provider setting | Model setting | Effect |
|---|---|---|---|
| Cursor-only key | openai only | All models | Block all non-OpenAI providers |
| Claude Code-only key | anthropic only | All models | Block all non-Anthropic providers |
| Cost-restricted key | All providers | gpt-4o-mini, gemini-2.0-flash | Allow low-cost models only |
| Production-validated key | openai only | gpt-4o, gpt-4o-mini | Limit to validated models only |
Default behavior
- No restrictions set (default): All providers and models are accessible (same as before)
- Provider restriction only: All models under allowed providers are accessible
- Model restriction only: Specified models accessible from any provider
- Both restricted: Only requests satisfying both provider and model conditions pass
15. Semantic Cache
Semantic cache uses AI vector search to detect semantically similar requests and reuse cached responses. Unlike response cache (Section 12) which requires exact matches, semantic cache can hit on requests with different wording but the same intent.
Benefits
- Handles phrasing variations: Recognizes "What are the benefits of TypeScript?" and "Why should I use TypeScript?" as the same intent
- Significant cost reduction: Catches requests that exact match misses, reducing token consumption
- Two-layer optimization: On semantic hits, also saved to exact cache (response cache) automatically
- Safety-first design: PII detection, short text, and multi-turn requests are automatically excluded
Downsides & caveats
- False positives possible (mitigated by setting threshold ≥ 0.95)
- Embedding costs apply (approx. $0.30 for 200K queries/month — very low)
- Non-streaming, single-turn requests only
- Vector storage has limits (Business: 10,000 / Scale: 50,000)
- Eventual consistency (newly cached items may take minutes to become searchable)
Good fit vs. poor fit
| Good fit | Poor fit |
|---|---|
| FAQ / helpdesk (same question, different phrasing) | Unique creative requests each time |
| Education / learning apps (similar questions recur) | Multi-turn conversations (chatbots) |
| Repetitive code generation (similar patterns) | Streaming-required agents |
| Best results when combined with response cache | Requests with heavy PII |
Response cache vs. semantic cache
| Item | Response Cache (Section 12) | Semantic Cache (this section) |
|---|---|---|
| Match method | SHA-256 exact match | AI vector search (cosine similarity) |
| Supported plans | Pro and above | Business and above |
| Streaming | Not supported | Not supported |
| Multi-turn | Supported | Not supported (single-turn only) |
| Latency | Very low (KV get only) | Slightly higher (embedding generation + vector search) |
| Header | X-Cache: EXACT_HIT | X-Cache: SEMANTIC_HIT |
| Storage | KV only | Vectorize + KV |
Processing flow
Semantic cache is evaluated after response cache (exact match). If an exact match hits, semantic cache processing is skipped.
Request received
↓
Auth + plan check (Business+?) + Feature Toggle ON?
│ NO → Normal flow (skip semantic cache)
↓ YES
Streaming? → YES → Skip (X-Cache-Skip-Reason: streaming)
↓ NO
① Exact Cache Lookup (SHA-256, KV get ×1)
├─ EXACT_HIT → Return immediately (no embedding needed, fastest)
└─ MISS ↓
② Exclusion check (PII / short text / multi-turn)
├─ Excluded → Normal flow (X-Cache-Skip-Reason shows reason)
└─ OK ↓
③ Generate embedding (@cf/baai/bge-small-en-v1.5, 384 dims)
↓
④ Vectorize search (user_scope + meta_group filter)
├─ Score ≥ threshold → Fetch response from KV
│ ├─ KV success → SEMANTIC_HIT (also save to exact cache)
│ └─ KV failure → Treat as MISS (X-Cache-Skip-Reason: kv_miss)
└─ Score < threshold → MISS
↓
⑤ Send request to provider
↓
⑥ Save response (Vectorize + KV + Exact Cache)
Supported plans and storage limits
| Item | Business ($29/mo) | Scale ($69/mo) |
|---|---|---|
| Max vectors | 10,000 | 50,000 |
| Threshold range | 0.90 – 0.99 | 0.85 – 0.99 |
| TTL | 24 hours | 7 days |
| Embedding model | @cf/baai/bge-small-en-v1.5 (384 dims) | |
| Metric | Cosine similarity | |
Automatic cleanup (CRON)
Old vectors past their TTL are deleted by a daily automatic cleanup.
- Schedule: Daily UTC 15:00
- Action: Detect and delete expired vectors per user
- Deletion limit: Max 500 vectors/user/day (5 iterations × 100)
- Count sync: Deleted vector count is automatically decremented from D1 counters
To manually delete all vectors, use the "Delete all" button on the "Semantic" page in the dashboard.
Dashboard settings
Click "Semantic" in the sidebar to open the semantic cache settings page.
Vector storage
A progress bar at the top shows vector storage usage — current count, limit, and usage percentage. TTL and auto-cleanup info is also shown.
Cache settings
- Semantic cache ON/OFF: Toggle to enable/disable
- Similarity threshold: Adjust with the slider (recommended: 0.92)
- Higher → More precise (lower hit rate but fewer false positives)
- Lower → Higher hit rate (more matches but higher false positive risk)
- Click "Save" to apply
Cache statistics
When enabled, cache utilization statistics appear at the bottom of the page: total hits, tokens saved, this month's hits, unique prompt count, and recent semantic hit details.
Threshold guide
| Threshold | Characteristics | Recommended for |
|---|---|---|
| 0.95–0.99 | High precision. Only near-identical sentences hit | Mission-critical workloads, finance/medical |
| 0.92 (recommended) | Balanced. Hits on sufficiently similar requests | General development, code generation, QA |
| 0.85–0.91 | Wide catch. Maximizes cost savings | Batch processing with many similar requests (Scale only) |
Auto-exclusion conditions
| Condition | Reason | X-Cache-Skip-Reason |
|---|---|---|
| Streaming requests | Response is sent incrementally; full response cannot be cached | streaming |
| Multi-turn conversations | Conversations with assistant/tool roles are highly context-dependent | multi_turn |
| Short text | Prompts under 30 characters are insufficient for semantic comparison | too_short |
| PII detected | Requests containing email addresses, phone numbers, API keys, etc. are not cached | pii_detected |
| Feature Toggle OFF | Semantic cache disabled in features settings | feature_off |
| Plan not eligible | Free / Starter / Pro plans do not support semantic cache | plan_not_allowed |
Response headers
| Header | Value | Description |
|---|---|---|
X-Cache | SEMANTIC_HIT | Response returned from semantic cache |
X-Semantic-Score | 0.00 – 1.00 | Matched similarity score (on SEMANTIC_HIT only) |
X-Cache-Skip-Reason | See exclusion conditions above | Reason caching was skipped |
X-Cache-Meta | 8-char hash | meta_group identifier (model/system/temperature combo) |
Matching conditions
For a semantic cache hit, all of the following must be true:
- Same user
- Same meta_group (model name + system prompt + temperature combination)
- User message cosine similarity ≥ configured threshold
- Vector TTL still valid
In other words, a hit only occurs for semantically similar questions using the same model, system prompt, and temperature. Cached responses from a different model or system prompt will never be incorrectly returned.
messages array for similarity matching. The full conversation context is not considered. The same question in different conversation flows may have different intents, so the cache may incorrectly hit. For precision, set the threshold to 0.95 or higher, or consider disabling semantic cache for multi-turn-heavy workloads.
Responses that are not saved
- HTTP 4xx / 5xx error responses
- Responses where
finish_reasonis"length"(truncated output) - Responses shorter than 80 characters
gpt-4o is routed to claude-sonnet-4-20250514, cache matching is done on claude-sonnet-4-20250514.
16. Smart Routing NEW
Smart Routing automatically routes requested models to the optimal provider model (Scale plan only). Efficiently leverage multiple providers for cost optimization or latency improvement.
Three routing strategies
| Strategy | Description | Recommended for |
|---|---|---|
| Cost optimization (default) | Auto-selects the cheapest provider among equivalent models | Batch processing, high-volume requests, cost-sensitive workloads |
| Latency optimization | Prioritizes the fastest-responding provider | Real-time responses, chatbots, user-facing apps |
| Round-robin | Distributes requests evenly across all active providers | Rate limit distribution, provider load balancing |
Benefits
- Cost optimization: Auto-switch to cheaper equivalent providers (up to 30-50% savings possible)
- Rate limit avoidance: Distributing across providers reduces throttling
- Zero code changes: Just toggle ON in the dashboard
Downsides & caveats
- Agents dependent on model-specific behavior (output format, tool calling specs) may behave unexpectedly
- Response quality may vary between providers
- When used with semantic cache, matching uses the post-routing model name
How to configure
- Open the "Feature Settings" page in the dashboard
- Toggle "Smart Routing" to ON
- Select routing strategy (Cost optimization / Latency optimization / Round-robin)
- Confirm at least 2 providers have registered and enabled API keys
17. Failover Details NEW
Failover automatically switches to an equivalent model on another provider when a provider returns an error, then resends the request (Starter and above).
Retry count by plan
| Plan | Retries | 429 handling | Notes |
|---|---|---|---|
| Free | 0 (disabled) | — | No failover |
| Starter | 2 | — | 5xx errors only |
| Pro | 3 | ✅ | 5xx + 429 |
| Business | 3 | ✅ | 5xx + 429 |
| Scale | 5 | ✅ | 5xx + 429 |
How it works
→ Retry 1: Auto-switch to equivalent model on Provider B
→ Retry 2: Auto-switch to equivalent model on Provider C
→ All providers fail: Return last error response
Check the switch status in response headers:
X-Gateway-Failover-From: Original providerX-Gateway-Failover-To: Switched-to providerX-Gateway-Retries: Retry count
Failover exclusions
- 4xx errors (Bad Request, auth errors, etc.): Client-side issues produce the same result on another provider
- 404 model not found: Specified model does not exist
- Only one active provider: No fallback target available
How to configure
- Register and enable API keys for at least 2 providers
- Open the "Feature Settings" page in the dashboard
- Toggle "Failover" to ON
gpt-4o (flagship) fails, it switches to claude-sonnet-4-20250514 (flagship). It does not switch to gpt-4o-mini (economy).
18. Agent Trace NEW
Agent Trace visualizes every processing step of an API request in a Chrome DevTools-style waterfall timeline. You can see exactly how long each stage took — from auth, PII Shield, cache lookup, provider selection, to the final response — making debugging and performance optimization of AI agents straightforward.
Available Plans
Agent Trace is available on the Scale plan only.
Processing Steps
| Step | Status Examples | Description |
|---|---|---|
auth | success / error | API key authentication |
pii_shield | success / skip | Secret Shield PII masking |
cache_check | exact_hit / semantic_hit / miss | Response cache lookup |
provider_select | routed / default | Smart Routing decision |
provider_request | success / error / streaming | Request to AI provider |
failover | success | Automatic provider failover |
response | success / streaming | Final response delivered |
Custom Trace ID
Add the x-qzira-trace-id header to your request to correlate traces with your own application logs:
curl https://api.qzira.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_QZIRA_KEY" \
-H "x-qzira-trace-id: my-agent-session-001" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
The assigned trace ID is returned in the X-Trace-Id response header.
Data Retention
Trace logs are automatically deleted after 24 hours by a daily CRON job. Traces are not available for export.
Dashboard
- Open Dashboard → Agent Trace
- The waterfall timeline for each request is displayed in the list
- Click a row to expand the step-by-step breakdown with timing
- Rows marked EXT indicate a custom trace ID was supplied
- Rows marked CACHE indicate a cache hit occurred
Notes
- Trace recording uses
waitUntil()— zero impact on response latency - Streaming requests are recorded with
provider_request: streamingstatus - If Secret Shield is disabled, the
pii_shieldstep showsskip
19. Secret Shield NEW
Secret Shield automatically detects and masks sensitive information (PII) in requests before they are sent to AI providers. Define custom regex rules to redact API keys, credit card numbers, personal IDs, and more — the provider never sees the original values.
Available Plans
Secret Shield is available on the Scale plan only.
How It Works
User registers regex rules in the Dashboard ↓ API request received (proxy.routes.ts) ↓ If Secret Shield is ON → scan & mask request body before sending ↓ Masked request sent to AI provider ↓ Usage log saved with masked content (original never stored)
Masking Modes
| Mode | Example Input | Example Output |
|---|---|---|
| Full mask | [email protected] | [REDACTED] |
| Prefix (keep first N chars) | sk-ant-api03-xxxx | sk-ant-*** |
| Suffix (keep last N chars) | 4111-1111-1111-1111 | ****-1111 |
Built-in Presets (11 rules)
| Preset | Masking Mode |
|---|---|
| AWS Access Key | Prefix (AKIA***) |
| AWS Secret Key | Prefix |
| OpenAI API Key | Prefix (sk-***) |
| Anthropic API Key | Prefix (sk-ant-***) |
| GCP API Key | Prefix (AIza***) |
| Credit Card Number | Suffix (****-1234) |
| Phone Number (JP) | Suffix |
| Email Address | Full ([REDACTED]) |
| JWT Token | Full ([REDACTED]) |
| My Number (JP) | Full ([REDACTED]) |
| Postal Code (JP) | Full ([REDACTED]) |
AI-Assisted Regex Generation
Describe what you want to mask in plain English (or Japanese), and qzira will generate the regex automatically using Cloudflare Workers AI (Llama 3.1 8B). You can review and edit the generated pattern before saving.
Rule Limits
Up to 20 rules per user. Rules are applied in order; all matching rules are applied to each request.
Dashboard Setup
- Open Dashboard → Secret Shield
- Click Add Rule and describe what to mask (e.g. "employee ID starting with EMP")
- Click ✨ Generate Regex or enter the pattern manually
- Choose masking mode (Full / Prefix / Suffix) and save
- Toggle Secret Shield ON at the top of the page
- Use the Test field to verify masking behavior before going live
Security Notes
- Regex patterns are stored server-side and are not displayed after saving — delete and recreate to update a pattern
- Secret Shield operates as fail-open: if an error occurs during masking, the original request continues unmodified (masking failure does not block the API call)
- Masked content is logged in usage_logs — the original value is never stored
FAQ
- Can I use Secret Shield without Smart Routing?
- Yes. Secret Shield and Smart Routing are independent features that can be toggled separately.
- Does Secret Shield affect latency?
- Masking adds minimal overhead (typically under 5ms) and does not block the request on error.
- Can I view a saved rule's regex pattern?
- No. For security reasons, patterns are not displayed after saving. To update a pattern, delete the rule and create a new one.
Responsibility & Disclaimer
qzira is a BYOK (Bring Your Own Key) API gateway. We clearly define the responsibility boundaries for each area.
Responsibility matrix
| Area | qzira's responsibility | User's responsibility |
|---|---|---|
| API key management | Encrypted storage, safe handling during proxy | Obtaining, managing, and renewing each provider's API keys |
| Request relay | Accurate proxy processing, format conversion | Appropriateness of request content, compliance with provider terms |
| Cost management | Usage visibility, budget alerts, auto-stop feature provision | Budget configuration, proper operation of cost management features |
| AI-generated content | — (not involved) | Reviewing generated content, usage decisions, third-party impact |
| Provider outages | Impact mitigation via failover (supported plans only) | Provider SLA and outages are the provider's responsibility |
| Data protection | Encrypted communications, proper log management | Managing transmitted data content (PII, confidential info handling) |
| AI agents | — (not involved) | Configuring, monitoring, and controlling AI agents |
| Provider terms | — (not involved) | Compliance with each provider's Terms of Service and policies |
Limitations of cost management features
qzira's budget alerts and auto-stop features are designed to assist cost management. Please note:
- Budget alerts are request-count-based and may differ from actual provider billing amounts
- Auto-stop is determined on the next incoming request — requests already in progress are not stopped
- Alert/stop triggering may be delayed during network delays or system failures
- Cost management features are provided on a "best-effort" basis
- Usage aggregation updates at intervals — spending may exceed the configured limit depending on timing
Protection level by plan
| Protection feature | Free | Starter | Pro | Business | Scale |
|---|---|---|---|---|---|
| Monthly request limit | ✅ | ✅ | ✅ | ✅ | ✅ |
| Auto retry | — | ✅ | ✅ | ✅ | ✅ |
| Failover | — | ✅ | ✅ | ✅ | ✅ |
| Budget alert notifications | — | ✅ | ✅ | ✅ | ✅ |
| Daily request limit | — | ✅ | ✅ | ✅ | ✅ |
| Budget limit & auto-stop | — | — | ✅ | ✅ | ✅ |
| Response cache | — | — | ✅ | ✅ | ✅ |
| Semantic cache | — | — | — | ✅ | ✅ |
| Per-key budget | — | — | — | ✅ | ✅ |
| Access control (Per-Key)NEW | — | — | ✅ | ✅ | ✅ |
| Smart RoutingNEW | — | — | — | — | ✅ |
| Priority support | — | — | — | ✅ | ✅ |
Notes for AI agent usage
AI agents (Cursor, Claude Code, Devin, etc.) autonomously send API requests. Agent behavior is the user's responsibility. Recommended safeguards:
- Set a daily request limit (Starter and above)
- Enable auto-stop (Pro and above)
- Check usage in the dashboard regularly
- Be especially careful with overnight automated tasks
SLA (Service Level)
qzira strives to maintain availability but does not currently offer SLA guarantees. The service may be temporarily suspended without prior notice to ensure infrastructure safety. See the Terms of Service for details.
FAQ
Pricing
Q. Is there a time limit on the Free plan?
No, the Free plan has no time limit. Up to 1,000 requests/month with 1 active provider.
Q. Are there costs beyond qzira's fees?
Yes. qzira is BYOK, so usage fees for each AI provider (OpenAI, Anthropic, Google AI, DeepSeek) are separate. qzira's monthly fee covers the gateway service itself.
Q. Can I change plans at any time?
Yes, you can upgrade or downgrade at any time from the dashboard. Upgrades take effect immediately; downgrades take effect at the end of the current billing period. Note that downgrades are limited to once per month.
Security
Q. How are my registered API keys protected?
API keys are stored encrypted. All communication is encrypted over HTTPS (TLS 1.3), and qzira staff cannot view API keys in plaintext.
Q. Does qzira use request content for model training?
No. qzira does not use request content for AI model training or service improvement. Data is used solely for request relay and usage measurement.
Q. Where is my data stored?
qzira's infrastructure runs on Cloudflare's global network. User data is stored in Cloudflare data centers.
Q. What should I do if my API key is leaked?
Immediately rotate (regenerate) the key from the "API Keys" page in the dashboard. The old key is invalidated instantly. See Section 11 for details.
Service
Q. If qzira goes down, will I lose API access?
Requests via qzira will be affected. You can immediately fall back by reverting Base URL and API key to the original values to switch to direct provider access.
Q. What happens if I specify an unsupported model?
You will receive a 400 Invalid model error. Check the supported model list in the Code Migration section.
Q. Is streaming available for all providers?
Yes. SSE-format streaming is supported for OpenAI, Anthropic, Google AI, and DeepSeek. Available on all plans.
Q. What should I do if I hit a rate limit?
If a qzira rate limit (429 error) occurs, wait a moment and retry. Starter plans and above have automatic retry handled by qzira.
Q. Are blocked requests from access control billed?
No. Access control checks happen before sending to the provider API, so blocked requests do not incur provider charges and are not counted toward qzira usage.
Q. Does access control apply to both Chat Completions API and Responses API?
Yes. Access control is checked before request routing, so it applies to both Chat Completions API (/v1/chat/completions) and Responses API (/v1/responses).
Q. What's the difference between semantic cache and response cache?
Response cache (Section 12) only hits when requests are completely identical. Semantic cache (Section 15) uses AI vector search to detect and reuse cached responses for semantically similar requests even if phrasing differs. Using both maximizes token savings.
Q. Can semantic cache return incorrect responses?
Setting the similarity threshold appropriately minimizes false hits. The recommended value is 0.92. For precision-critical use cases, set 0.95 or higher. Additionally, streaming requests and multi-turn conversations are automatically excluded, so it can be used safely in agentic interactive scenarios.
Q. How long are request logs retained?
Logs are automatically retained based on your plan: Free (3 days), Starter (30 days), Pro (30 days), Business (90 days), Scale (365 days). Logs past the retention period are automatically deleted and cannot be recovered. Use CSV export for long-term storage.
Q. Can I export usage data?
Yes. You can export as CSV from the "Request Logs" section of the dashboard. Includes model, provider, token counts, latency, and more. Available on all plans.
Cancellation
Q. What happens to my data when I cancel?
Deleting your account removes all registered API keys, usage history, and settings. This action cannot be undone.
Q. Will I be charged after cancellation?
For paid plans, cancellation downgrades you to the Free plan at the end of the current billing period. Pro-rated refunds are not available.
Smart Routing & Failover
Q. What's the difference between Smart Routing and failover?
Smart Routing is optimal distribution during normal operation — it routes requests to the best provider based on cost, latency, or load balancing (Scale only). Failover is an emergency switch during outages — it automatically switches to an equivalent model on another provider when 5xx or 429 errors occur (Starter and above). Enabling both means failover kicks in if Smart Routing's chosen provider fails.
Q. Will failover affecting my AI agent's behavior?
It might. Because JSON output formats and tool calling specs differ between providers, agents dependent on model-specific behavior may behave unexpectedly. This is why failover is disabled by default. We recommend verifying behavior across multiple providers before enabling.
Tool Calling
Q. Does qzira automatically convert Tool Calling formats between providers?
Yes. qzira accepts OpenAI-format tools / tool_choice parameters and auto-converts to each provider's native format. Tool Calling requests from AI coding tools like Cursor, Cline, and Roo Code work with OpenAI, Anthropic, and Google AI without additional configuration.
Q. Are there unsupported formats for Tool Calling auto-conversion?
Current conversion is based on OpenAI-compatible format (tools array + function definition). If you send Anthropic-native tool_use or Google-native function_declarations format directly, it passes through to that provider, but when switching providers via failover or Smart Routing, conversion may not work correctly. We recommend standardizing on OpenAI-compatible format.
Images & multimodal
Q. Can I send requests with images through qzira?
Yes. qzira passes through request bodies, so image input (Base64 encoded / URL) supported by each model can be sent as-is.
Q. Does qzira auto-convert image formats between providers?
No. There is no auto-conversion of image formats between providers. Sending an OpenAI-format (image_url) request with images to an Anthropic model will result in a format mismatch error. When sending image-containing requests, use the format specified by the target provider. Note that images are also not converted when providers are switched by failover.
Emergency fallback (rollback)
Q. What's the fastest way to recover if qzira has an outage?
Change just these two things to switch to direct provider communication in 10 seconds:
- Base URL: Delete (revert to default) or change to the provider's official URL
- API Key: Replace
gw_...with your original provider key (sk-..., etc.)
qzira is designed so that just changing the Base URL enables introduction or rollback — no major SDK or code rewrites required.
Troubleshooting
Common errors
| Error | Cause | Solution |
|---|---|---|
401 Unauthorized |
Invalid or missing API key | Verify the gw_ prefixed qzira API key. Update to new key after rotation. |
400 Provider not configured |
Provider API key not registered | Register the provider API key in the dashboard |
400 Invalid model |
Specified model name not recognized | Check the supported models list and specify a valid model name |
403 Provider not enabled |
Provider not enabled | Enable the provider in the dashboard |
403 provider_not_allowed |
Violates the API key's provider restriction. Check access restriction settings in the dashboard. | |
403 model_not_allowed |
Violates the API key's model restriction. Confirm the requested model is in the allowed models list. | |
403 Budget Exceeded |
Budget limit reached | Raise the limit in the dashboard, or wait for reset |
403 Plan limit exceeded |
Monthly request limit for the plan reached | Upgrade to a higher plan, or wait for next month's reset |
429 Rate Limited |
Rate limit reached | Wait a moment and retry (Starter+ plans auto-retry) |
502 Bad Gateway |
Invalid response from provider | Wait and retry. Failover auto-triggers (Starter and above). |
503 Service Unavailable |
Provider temporarily unavailable | Failover auto-triggers (Starter+). If it continues, check the provider's status page. |
Immediate rollback
If you experience issues via qzira, just revert Base URL and API key to switch back to direct calls instantly.
# Rollback: qzira → direct call
client = OpenAI(
api_key="sk-xxxxxxxx", # Revert to original key
# Remove base_url (reverts to default)
)
If managing via environment variables, just switch the values in your .env file:
# QZIRA_API_KEY=gw_xxxxxxxx ← Comment out to roll back
# QZIRA_BASE_URL=https://api.qzira.com/v1
OPENAI_API_KEY=sk-xxxxxxxx # ← Original key works as-is
Legal Documents
Legal documents related to qzira use are available at the links below.
| Document | Summary | Link |
|---|---|---|
| Terms of Service | Service conditions, scope of responsibility, disclaimer (22 articles) | Read Terms of Service |
| Privacy Policy | Personal information handling, data protection policy (13 articles) | Read Privacy Policy |
| Specified Commercial Transactions Act | Business operator information, sales terms (Japanese law requirement) | Read Tokushoho |
Key Terms of Service provisions
Support
For questions and bug reports, please contact us through the following:
- Contact form: https://138io.com/contact/
- Email: [email protected]
- Operator: 138data
Legal Disclaimer
Important
Please read the following disclaimer carefully before using this service and documentation.
Legal Disclaimer (English)
1. Service Provided "As-Is"
To the maximum extent permitted by applicable law, this documentation and the qzira service are provided on an "as-is" and "as-available" basis without warranties of any kind, whether express, implied, or statutory, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The information contained herein reflects the state of the service as of February 17, 2026 and may not reflect subsequent changes.
2. Third-Party Dependency and Provider Risks
qzira operates as an API gateway that routes requests to third-party AI providers including OpenAI, Anthropic, and Google. Core features — including but not limited to chat completions, streaming, Tool Calling (Function Calling), failover, smart routing, response caching, and semantic caching — depend on the continued availability and compatibility of these providers' APIs. Provider-side changes such as API specification updates, model deprecation, pricing modifications, rate limit adjustments, or service discontinuation may affect qzira's functionality without prior notice. Users are solely responsible for compliance with each provider's Terms of Service, Acceptable Use Policy, and data handling policies.
3. Limitation of Liability
To the maximum extent permitted by applicable law, qzira and its operator, 138data, shall not be held liable for any direct, indirect, incidental, consequential, or special damages arising from the use of or inability to use the service, including but not limited to unexpected API billing, loss of data, business interruption, or loss of profits, regardless of the theory of liability.
4. Budget Controls and Auto-Stop Disclaimer
qzira provides budget management features including budget alerts, automatic stop (Auto-Stop), and per-API-key budget limits as tools to assist users in managing their API expenditure. These features operate on a best-effort basis and do not guarantee the prevention of all overspending. Factors such as processing delays, concurrent requests, provider-side latency, and exchange rate fluctuations may result in actual charges exceeding configured budgets. Users remain solely responsible for monitoring their own API usage and costs, and the availability of these features does not relieve users of this responsibility.
5. Budget Management Disclaimer
Budget limits are enforced on a best-effort basis. Due to the timing of usage aggregation, actual spending may exceed the configured limit. API charges are incurred under the agreement between the user and each AI provider, and any overage is subject to those terms. Please refer to the usage logs in your dashboard for accurate usage details.
qzira acts solely as an API gateway and control layer and is not the billing entity for API usage charges.
6. Data Handling
qzira does not use API request/response relay data for training purposes. However, data usage policies and opt-out settings for destination providers (OpenAI, Anthropic, Google, etc.) must be confirmed and managed by users themselves. Each provider's data handling policies are outside qzira's control.
7. Recommendation for Verification
Before deploying to production, always check the latest official documentation and pricing pages for each provider. We strongly recommend combining qzira's budget alerts and auto-stop features to prevent unintended billing.