qzira AI API Gateway
Technical Reference

Manage and unify multiple AI models with a single API key

Ver 2.4 Last updated: 2026-02-17

qzira is an AI API gateway that lets you manage multiple AI providers — OpenAI, Anthropic, Google AI, and DeepSeek — through a single unified endpoint.

BYOK (Bring Your Own Key) — use your existing API keys as-is. qzira handles request proxying, usage visibility, failover, and rate limiting automatically.

Why use qzira?

Cost visibility: Monitor request counts and token consumption in real time via the dashboard
Failover: Automatically switch to another provider when one goes down
Auto retry: qzira handles 429 retries on your behalf
Unified key management: Store all provider keys in qzira; your app only needs one gw_ key
Budget alerts & auto-stop: Prevent runaway AI agent costs
Usage export: Download logs as CSV for reporting or expense trackingNEW

Architecture overview

qzira sits as a proxy between your application and AI providers. One gw_ key grants access to all providers.

ℹ️ qzira is a proxy. Requests pass through qzira's servers and are forwarded to the provider. qzira does not issue its own AI keys — it securely manages and uses the keys you bring from each provider (BYOK).

⚠️ About This Documentation (As-Is)
This documentation is provided on an as-is basis. AI services evolve rapidly, and each provider's API specifications, pricing, and limits may change without notice. The content reflects information at the time of writing and does not permanently guarantee future behavior. Always check each provider's official documentation for the latest information.

1. Account Setup

1Access the dashboard

Go to https://app.qzira.com.

2Sign in with Google

Click "Sign in with Google" and authenticate with your Google account.

ℹ️ You will be asked to agree to the Terms of Service when signing in for the first time.

2. Create API Key

After logging in, click "API Keys" in the sidebar.

1Create a new API key

Click "Create new API key" and give it a name (e.g., my-app-key, cursor-dev).

2Copy your API key

The gw_xxxxxxxx key shown immediately after creation must be copied and stored securely. It will only be displayed once.

⚠️ Important: The API key is shown only once at creation. If lost, issue a new key or use the rotation feature to regenerate it.

💡 This key becomes your unified gateway key for all providers. Regular key rotation is recommended (see Section 11).

3. Register Provider API Keys (BYOK)

Click "Providers" in the sidebar.

Supported Providers

Provider	Where to get your key	Key format
OpenAI	platform.openai.com/api-keys	`sk-...`
Anthropic	console.anthropic.com/settings/keys	`sk-ant-...`
Google AI	aistudio.google.com/apikey	`AIza...`
DeepSeek	platform.deepseek.com/api_keys	`sk-...`

Registration steps

Click "Register API key" for the provider you want to use
Enter your API key
Click "Register" — qzira will automatically validate the key

🔒 Registered API keys are stored encrypted and used only for proxying requests. All plans support registering keys for all three providers.

4. Enable Providers

Registering a provider key alone does not enable routing through that provider. You also need to enable it.

How to enable

In the Providers screen, click the "Enable" button for the provider you want to use.

Simultaneous active providers by plan

Plan	Max active providers
Free	1
Starter	3
Pro and above	Unlimited

💡 On the Free plan, only one provider can be active at a time. To switch providers, enable a new one and it will automatically replace the current one (no need to re-enter your key).

5. Code Migration

qzira provides an OpenAI-compatible API endpoint. Migration requires only two changes: Base URL and API key.

Endpoint info

Item	Value
Base URL	`https://api.qzira.com/v1`
Endpoint (OpenAI-compatible)	`/chat/completions`
Endpoint (Anthropic-compatible)	`/v1/messages` NEW
Authentication	`Authorization: Bearer gw_xxxxxxxx` or `x-api-key: gw_xxxxxxxx`

Migration example: Python (OpenAI SDK)

Before (direct call):

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxx"  # OpenAI API key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

After (via qzira):

from openai import OpenAI

client = OpenAI(
    api_key="gw_xxxxxxxx",                  # qzira API key
    base_url="https://api.qzira.com/v1"     # qzira endpoint
)

response = client.chat.completions.create(
    model="gpt-4o",  # model name unchanged
    messages=[{"role": "user", "content": "Hello"}]
)

Migration example: Python (Anthropic SDK → OpenAI-compatible)

Before (direct Anthropic SDK call):

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-xxxxxxxx")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello"}]
)

After (via qzira — OpenAI-compatible format):

from openai import OpenAI

client = OpenAI(
    api_key="gw_xxxxxxxx",
    base_url="https://api.qzira.com/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",  # specify Claude model directly
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"}
    ]
)

ℹ️ qzira automatically detects the provider from the model name and converts the request to the appropriate format. system messages are handled automatically.

Migration example: Python (Google Gemini → OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    api_key="gw_xxxxxxxx",
    base_url="https://api.qzira.com/v1"
)

response = client.chat.completions.create(
    model="gemini-2.0-flash",  # Gemini model name unchanged
    messages=[{"role": "user", "content": "Hello"}]
)

Migration example: JavaScript / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "gw_xxxxxxxx",
  baseURL: "https://api.qzira.com/v1",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello" }],
});

Managing keys with environment variables (recommended)

.env file:

QZIRA_API_KEY=gw_xxxxxxxx
QZIRA_BASE_URL=https://api.qzira.com/v1

Python:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("QZIRA_API_KEY"),
    base_url=os.getenv("QZIRA_BASE_URL")
)

Streaming support

qzira supports SSE (Server-Sent Events) streaming for all providers.

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Migration example: Anthropic SDK (native format) NEW

If you use the Anthropic SDK directly, you can use qzira's /v1/messages endpoint in native format.

⚠️ Set base_url to https://api.qzira.com (without /v1). The SDK automatically appends /v1/messages.

Before (direct call):

import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-xxxxxxxx"  # Anthropic API key
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

After (via qzira — only 2 lines change):

import anthropic

client = anthropic.Anthropic(
    api_key="gw_xxxxxxxx",       # qzira API key
    base_url="https://api.qzira.com"  # ⚠️ no /v1
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Migration example: TypeScript / JavaScript (Anthropic SDK)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "gw_xxxxxxxx",        // qzira API key
  baseURL: "https://api.qzira.com"  // ⚠️ no /v1
});

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }]
});

Supported models (major examples)

Provider	Example models
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3-mini`
Anthropic	`claude-sonnet-4-20250514`, `claude-3-5-haiku-20241022`, `claude-3-opus-20240229`
Google AI	`gemini-2.0-flash`, `gemini-2.5-flash`, `gemini-2.5-pro`
DeepSeek	`deepseek-chat`, `deepseek-reasoner`

💡 OpenAI's o1/o3 reasoning models are also supported. Specify the model name directly. Tool Calling (Function Calling) is supported for all three providers.

6. Testing with curl

You can test qzira instantly using curl commands.

OpenAI model

curl -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Anthropic model

curl -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Google AI model

curl -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Anthropic native format (/v1/messages) NEW

curl -X POST https://api.qzira.com/v1/messages \
  -H "x-api-key: gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello"}]
  }'

/v1/messages streaming

curl -N -X POST https://api.qzira.com/v1/messages \
  -H "x-api-key: gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Count from 1 to 5"}],
    "stream": true
  }'

Streaming test

curl -N -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Count from 1 to 5"}],
    "stream": true
  }'

PowerShell (Windows)

$headers = @{
  "Authorization" = "Bearer gw_xxxxxxxx"
  "Content-Type"  = "application/json"
}
$body = '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

Invoke-RestMethod -Uri "https://api.qzira.com/v1/chat/completions" `
  -Method POST -Headers $headers -Body $body

Successful response example

{
  "id": "chatcmpl-xxxxx",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 9,
    "total_tokens": 17
  },
  "provider": "openai"
}

💡 The response includes a provider field so you can see which provider handled the request.

Tool Calling (Function Calling) NEW

qzira supports Tool Calling for all three providers. Simply send the OpenAI-compatible tools parameter and qzira relays it in the appropriate format for each provider.

Tool Calling support by provider

Provider	Method	Status
OpenAI	Pass-through (native OpenAI format)	✅ Supported
Anthropic	Pass-through (SDK-handled)	✅ Supported
DeepSeek	Pass-through (OpenAI-compatible)	✅ Supported
Google (Gemini)	Auto-conversion (OpenAI `tools` → Gemini `functionDeclarations`)	✅ Supported

💡 Gemini auto-conversion: When a Gemini model is specified, qzira automatically converts OpenAI-format tools to Gemini-format functionDeclarations, and converts the response's functionCall back to tool_calls. No changes needed on your end.

Tool Calling test (curl)

OpenAI model

curl -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g. Tokyo, Japan"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Gemini model (auto-converted)

# Same tools format as OpenAI — qzira auto-converts
curl -X POST https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer gw_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.0-flash",
    "messages": [
      {"role": "user", "content": "What is the weather in Osaka?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g. Osaka, Japan"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Tool Calling response example

{
  "id": "chatcmpl-xxxxx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_xxxxx",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Tokyo, Japan\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "completion_tokens": 6,
    "total_tokens": 59
  },
  "provider": "openai"
}

tool_choice options

Value	Behavior	Gemini conversion
`"auto"` (default)	Model decides whether to call a tool	`AUTO`
`"required"`	Model must call at least one tool	`ANY`
`"none"`	No tool calls	`NONE`
`{"type":"function","function":{"name":"xxx"}}`	Call a specific tool	`ANY` + `allowedFunctionNames`

Streaming with Tool Calling

Tool Calling works correctly with streaming ("stream": true). Chunk structure when a tool is called:

# Chunk 1: role
data: {"choices":[{"delta":{"role":"assistant"},...}]}

# Chunk 2: tool_calls (function name + arguments)
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_xxx","type":"function","function":{"name":"get_weather","arguments":"{...}"}}]},...}]}

# Chunk 3: done
data: {"choices":[{"delta":{},"finish_reason":"tool_calls",...}]}

# Chunk 4: usage
data: {"usage":{"prompt_tokens":24,"completion_tokens":6}}

data: [DONE]

⚠️ Note: If response caching is enabled, an old cached response (without tools) from before Tool Calling was added may be returned. If results are unexpected, slightly modify the message content to bypass the cache.

7. Cursor / AI Agent Integration

Cursor Verified

You can add qzira as an OpenAI-compatible API provider in Cursor's settings. Manage all OpenAI, Anthropic, and Google models with a single API key.

⚠️ Cursor Pro or higher is required. The Free tier does not support BYOK (Bring Your Own Key), so custom API keys and Base URLs cannot be configured.

Prerequisites

Cursor Pro or higher subscription (Cursor pricing)
qzira account (Free plan available)
The provider you want to use must be enabled in the qzira dashboard (see Section 4)

Step 1: Set up qzira

Log in to app.qzira.com
Under "Providers" in the sidebar, enable the providers you want:
- OpenAI: for GPT-4o, GPT-4o-mini, etc.
- Anthropic: for Claude Sonnet, Claude Haiku, etc.
- Google AI: for Gemini 2.0 Flash, Gemini 2.5 Pro, etc.
- DeepSeek: for DeepSeek V3 (deepseek-chat), DeepSeek R1 (deepseek-reasoner)
Under "API Keys," create a key for Cursor (e.g., cursor-dev)
Copy the displayed gw_xxxxxxxx key (⚠️ shown only once)

Step 2: Configure Cursor

Open Cursor and go to Settings → Models
Enter your qzira API key in OpenAI API Key:
```
gw_xxxxxxxx
```
Enter the following in Override OpenAI Base URL:
```
https://api.qzira.com/v1
```
Manually add the models you want (click "+ Add model"):
- gpt-4o-mini (OpenAI)
- gpt-4o (OpenAI)
- claude-sonnet-4-20250514 (Anthropic)
- gemini-2.0-flash (Google)
- gemini-2.5-pro (Google)
Toggle the added models ON

ℹ️ Models preset in Cursor like claude-3.5-sonnet use Cursor's built-in API. To use them via qzira, add the exact model name manually as described above.

Verification checklist

Select model: In Cursor chat or editor, select a manually added model (e.g., gemini-2.0-flash)
Test send: Send a simple prompt ("Hello") and confirm a response is returned
Check dashboard: Confirm the request is logged at app.qzira.com
Check provider: Confirm the correct provider name appears in the log's provider column

Troubleshooting

Symptom	Cause	Solution
401 Unauthorized	Invalid or mistyped API key	Check key status in qzira dashboard; create a new key if needed
403 Provider not enabled	Provider not enabled	Dashboard → Providers → enable the provider and set its API key
403 Plan upgrade required	Feature not available on Free plan	Upgrade qzira plan to Pro or higher
Model not visible in selector	Model not added or toggle is OFF	Settings → Models → "+ Add model" then toggle ON
No response / timeout	Base URL misconfigured	Confirm `https://api.qzira.com/v1` is entered correctly (no trailing slash)

💡 Using separate API keys per agent isolates Cursor's usage from other tools. Combined with per-key budgets (Section 13), you can prevent runaway Cursor costs.

Claude Code NEW

Claude Code switches to qzira by setting environment variables. All requests are logged in the dashboard, and budget controls and auto-stop apply.

⚠️ Set ANTHROPIC_BASE_URL to https://api.qzira.com (without /v1). Claude Code automatically appends /v1/messages.

Linux / macOS

# Set environment variables
export ANTHROPIC_BASE_URL="https://api.qzira.com"
export ANTHROPIC_API_KEY="gw_xxxxxxxx"

# Start Claude Code
claude

PowerShell (Windows)

# Set environment variables
$env:ANTHROPIC_BASE_URL = "https://api.qzira.com"
$env:ANTHROPIC_API_KEY = "gw_xxxxxxxx"

# Start Claude Code
claude

Persist via ~/.bashrc or ~/.zshrc

# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_BASE_URL="https://api.qzira.com"
export ANTHROPIC_API_KEY="gw_xxxxxxxx"

Verification

First launch: The Anthropic OAuth screen may appear (select your organization). This is Claude Code's own auth and is separate from qzira.
Auth conflict warning: "Using ANTHROPIC_API_KEY instead of Anthropic Console key" means the qzira key is being used ✅
Automatic model switching: Claude Code automatically uses Haiku (lightweight tasks) and Sonnet (responses). All requests are logged in the dashboard.
Dashboard check: Confirm requests with provider: anthropic appear in the usage log at app.qzira.com.

Cline (VS Code extension)

In Cline's settings, change "API Provider" to "OpenAI Compatible" and enter:

Base URL: https://api.qzira.com/v1
API Key: gw_xxxxxxxx
Model: the model name you want (e.g., claude-sonnet-4-20250514)

Windsurf

⚠️ Windsurf does not currently support qzira integration.

Windsurf's BYOK feature only supports Claude 4 Sonnet / Opus, and API keys are set in the Windsurf management panel (windsurf.com/subscription/provider-api-keys). There is no custom Base URL or OpenAI Compatible provider setting, so routing through an API gateway like qzira is not possible.

Source: Windsurf Docs — AI Models (verified February 2026)

Roo Code (VS Code extension)

Roo Code has been verified to work in both OpenAI Compatible mode and Anthropic mode (v3.47.3).

Method A: OpenAI Compatible mode (recommended)

API Provider: OpenAI Compatible
Base URL: https://api.qzira.com/v1
API Key: gw_xxxxxxxx
Model: gpt-4o-mini (or gpt-4o)

Method B: Anthropic mode

API Provider: Anthropic
☑ Check "Use custom base URL"
Base URL: https://api.qzira.com (⚠️ no /v1)
API Key: gw_xxxxxxxx
Model: claude-sonnet-4-20250514

Other AI agents

Any tool that supports configuring an OpenAI-compatible Base URL and API key can be integrated the same way.

Tool	Where to configure
Cursor	Settings → Models → OpenAI API Key
Claude Code	Env vars: `ANTHROPIC_BASE_URL=https://api.qzira.com` (no /v1) + `ANTHROPIC_API_KEY=gw_xxx`
Cline	API Provider → OpenAI Compatible
Windsurf	❌ No custom Base URL support (BYOK limited to Claude, URL not configurable)
Roo Code	API Provider → OpenAI Compatible or Anthropic (custom base URL)
Continue.dev	`apiBase` in `config.json`
Aider	`--openai-api-base` option
LangChain	`base_url` parameter

Tool × Provider Compatibility Matrix

Verified compatibility of AI coding tools via qzira (as of February 2026).

📎 Sources & References

Claude Code: Official — Third-party integrations
Windsurf: Official — Models
Roo Code: Official — OpenAI Compatible / Anthropic
Cursor: Official Documentation
Cline: Official Documentation

ℹ️ The above reflects verification results for specific versions as of February 2026. Tool or provider updates may change compatibility. Check each tool's official documentation for the latest status.

💡 Try it free on the Free plan. Send feedback and tool reports to @qzira_dev.

8. Monitoring Usage in the Dashboard

Dashboard overview

This month's requests: Total monthly request count (vs. plan limit)
Usage rate: Percentage of plan limit used
Input / output tokens: Total token consumption
Daily request graph: Visualizes historical request trends

Recent requests

Details for each request:

Field	Description
API key name	Name of the API key used for the requestNEW
Model	Model name used (e.g., `claude-sonnet-4-20250514`)
Provider	Responding provider (e.g., Anthropic)
Tokens	Input / output token count
Latency	Response time (milliseconds)
Status	Success / Error
Cost NEW	Estimated cost (USD / JPY switchable). Failed requests show "—"
Tool Calls 🔧 NEW	Number of tool calls. Click to view function names and arguments
Timestamp	Request date and time

Usage export (CSV) NEW

Request logs can be downloaded as CSV. Useful for expense reporting and internal analytics.

Click the "Export CSV" button in the "Request Logs" section of the dashboard to download the displayed log data as a CSV file.

CSV columns

Column	Description
`id`	Log ID
`api_key_name`	Name of the API key used
`model`	Model name
`provider`	Provider name
`input_tokens`	Input token count
`output_tokens`	Output token count
`latency_ms`	Response time (ms)
`status`	Status code
`created_at`	Request timestamp
`estimated_cost_usd`	Estimated cost (USD)NEW
`estimated_cost_jpy`	Estimated cost (JPY)NEW
`tool_calls`	Tool call details (JSON format)NEW

ℹ️ CSV export covers data within the log retention period. We recommend exporting regularly.

Log retention period NEW

Request logs are automatically retained for a period based on your plan. Logs older than the retention period are automatically deleted daily.

Plan	Log retention
Free	3 days
Starter	30 days
Pro	30 days
Business	90 days
Scale	365 days

⚠️ Important: Logs past the retention period cannot be recovered. If you need long-term storage, export to CSV within the retention period.

9. Budget Alerts & Auto-Stop

Click "Budget" in the sidebar to configure cost controls (Starter and above).

⚠️ Budget limit enforcement timing
Usage aggregation is not real-time — it updates at intervals. As a result, spending may exceed the configured limit depending on aggregation timing.

This is common behavior in many API environments, and qzira behaves similarly.

qzira keeps aggregation intervals to a few minutes to minimize overage, but instantaneous hard stops are not guaranteed.

⚡ Realtime Budget Stop (Scale plan only)

In addition to the standard KV-based budget check (up to ~5 min delay), Scale plan users can enable instant enforcement via direct D1 query.

Item	Detail
Plan	Scale only (visible when Auto-Stop is enabled)
Effect	Blocks the very next request after the limit is reached — no ~5 min delay
Latency overhead	+15–40ms per request (D1 query)
How to enable	Dashboard → Budget Settings → Realtime Budget Stop toggle

Budget management modes

qzira supports two budget modes: request-count-based and cost-based (USD). Switch between them in the Budget settings page of the dashboard.

Mode	Unit	Characteristics
Request count	API request count	Simple. Each request = 1 count, regardless of model or token usage
Cost (USD)	Estimated API cost (USD)	Manages based on estimated cost from token usage. Prevents overuse of expensive models

Configuration options

Setting	Description	Available plan
Monthly limit	Max monthly request count or cost (USD)	All plans
Daily limit	Max daily request count or cost (USD)	Starter and above
Budget alert notifications	Email notification at 50% / 80% / 100%	Starter and above
Auto-stop	Automatically block requests when limit is reached	Pro and above

Exchange rate display (USD / JPY) NEW

In cost mode, the USD budget setting also supports JPY display.

Item	Details
Rate source	ExchangeRate-API (open.er-api.com) — daily updates
Update frequency	Once daily (UTC 15:00)
Cache	KV store, 48-hour cache
Fallback	Fixed rate ¥150/USD if API fetch fails
Currency switch	Toggle USD ↔ JPY in the input form

ℹ️ JPY display is for reference only. Exchange rate fluctuations may cause discrepancies between the dashboard's JPY estimate and actual provider billing. The fallback rate (¥150/USD) is a last resort during API outages and may differ significantly from the market rate. Check each provider's dashboard for accurate billing amounts.

About cost estimation

Estimated costs in cost mode are calculated from each request's token usage and each model's published pricing. Please note:

Estimated costs are approximations and may not match actual provider billing
If a provider changes pricing, there may be a lag before it's reflected
Cost estimation accuracy may be lower for some models or special requests (e.g., image input)

⚠️ To guard against AI agents sending large numbers of overnight requests, we recommend setting a daily limit + auto-stop.

10. Plan Upgrade

Click "Plan & Billing" in the sidebar to change your plan.

Plan comparison

Plan	Monthly	Requests/mo	Active providers	API keys	Log retention
Free	$0	1,000	1	1	3 days
Starter	$5	10,000	3	2	30 days
Pro	$10	100,000	Unlimited	5	30 days
Business	$29	500,000	Unlimited	50	90 days
Scale	$69	3,000,000	Unlimited	100	365 days

ℹ️ Prices shown in USD. Billed in JPY via Stripe. USD prices are approximate.

Key features by plan

Feature	Free	Starter	Pro	Business	Scale
Streaming	✅	✅	✅	✅	✅
API key rotation	✅	✅	✅	✅	✅
Usage export (CSV)	✅	✅	✅	✅	✅
Auto retry	—	✅	✅	✅	✅
Failover	—	✅	✅	✅	✅
Budget alerts	—	✅	✅	✅	✅
Budget limit & auto-stop	—	—	✅	✅	✅
Response cache	—	—	✅	✅	✅
Semantic cache	—	—	—	✅	✅
Per-key budget	—	—	—	✅	✅
Access control (Per-Key)NEW	—	—	✅	✅	✅
Smart RoutingNEW	—	—	—	—	✅
Secret Shield NEW	—	—	—	—	✅
Priority support	—	—	—	✅	✅

11. API Key Rotation NEW

As a security best practice, we recommend rotating (regenerating) API keys regularly. qzira supports one-click rotation from the dashboard.

What is rotation?

When you rotate a key, the existing API key is immediately invalidated and a new gw_ key is issued. The key's ID (internal identifier) and name are preserved, so dashboard usage history and settings carry over.

Rotation steps

Click "API Keys" in the sidebar
Click the "Rotate" button (🔄 icon) for the key you want to rotate
Select "Execute rotation" in the confirmation dialog
The new key is displayed — copy and store it securely immediately

🚨 Important: Rotation immediately invalidates the old key. There is no grace period. Any in-flight requests using the old key will immediately receive 401 Unauthorized.

⚠️ Recommended process: Before rotating, prepare the following:
1. Have a way to receive the new key ready (open your .env file for editing)
2. Execute rotation → immediately copy the new key
3. Update the new key in all apps, agents, and CI/CD pipelines
4. Confirm requests from the new key appear in the dashboard
※ A grace period feature is being considered for future implementation.

When to rotate

When a key may have been accidentally exposed (log output, git history, etc.)
When a team member leaves or changes role
Periodically as a security measure (every 30–90 days recommended)
When suspicious requests are detected in the dashboard

💡 If managing API keys via environment variables (.env file), just update the value after rotation — that's it.

Post-rotation update example

# Update .env file
QZIRA_API_KEY=gw_yyyyyyyy    # ← Replace with new key
QZIRA_BASE_URL=https://api.qzira.com/v1

Restart (or redeploy) your application and the new key will be used automatically.

12. Response Cache

Cache AI provider responses for identical requests to speed up subsequent responses and save tokens (Pro and above).

ℹ️ Response caching is disabled by default. Enable it explicitly from the dashboard. LLM response caching can cause unexpected behavior — understand the use case before enabling.

Benefits

Cost reduction: Skip provider API calls for identical requests — zero token consumption
Faster responses: Returned instantly from KV cache (hundreds of ms → tens of ms)
Provider outage mitigation: During cache TTL, you're unaffected by provider outages
Zero config: Flip the toggle to enable immediately — no code changes required

Downsides & caveats

Streaming requests are not cached
Exact match only (for fuzzy matching, use semantic cache)
Stale responses may be returned during the TTL period
Requests containing PII may also be cached

Good fit vs. poor fit

Good fit	Poor fit
Repeated test / debug runs	Generating unique creative content each time
Batch processing (same prompt, many runs)	Fetching real-time information
FAQ bots / templated responses	Streaming agents
Cost minimization use cases	Requests with heavy PII

Supported plans and TTL

Plan	Available	Default TTL	Custom TTL
Free	—	—	—
Starter	—	—	—
Pro	✅	1 hour	Up to 1 hour
Business	✅	24 hours	Up to 24 hours
Scale	✅	7 days	Up to 7 days

How to enable

Click "Cache" in the sidebar
Toggle "Enable cache" to ON
Optionally configure custom TTL or temperature limit

Configuration options

Setting	Description	Default
Cache on/off	Enable or disable response caching	OFF
Custom TTL	Cache retention duration (seconds). Can be set to at most the plan's default TTL.	Plan default
Temperature limit	Exclude requests above a specified temperature from caching	No limit (all requests cached)

How caching works

The following request fields are SHA-256 hashed to detect identical requests:

User ID
Model name
Message content
temperature / top_p / max_tokens

On a cache hit, the provider request is skipped and the stored response is returned immediately, saving token consumption.

Response headers

When caching is active, the following headers are added to responses:

Header	Value	Description
`X-Cache`	`EXACT_HIT` / `SEMANTIC_HIT` / `MISS`	Exact hit / Semantic hit / Miss
`X-Cache-TTL`	Seconds	Applied TTL
`X-Semantic-Score`	0.00–1.00	Semantic cache similarity score (on hit only)
`X-Cache-Skip-Reason`	String	Reason caching was skipped (on skip only)

Requests excluded from caching

Streaming requests ("stream": true): SSE format is not suited for caching
Temperature limit exceeded: Requests above the configured temperature limit
Error responses: Provider error responses are not cached

⚠️ Note: Response cache (exact match) only returns a cached result if all parameters are identical. Even a slight difference in message content results in a new request. Use semantic cache (Section 15) for semantically similar requests.

Reading cache statistics

Cache utilization statistics are shown at the bottom of the cache settings page.

All-time statistics

Item	Description
Total hits	Total number of cache hits
Input tokens saved	Total input tokens saved by cache hits
Output tokens saved	Total output tokens saved by cache hits
Unique prompt count	Number of unique prompts stored in cache

This month's statistics

Hit count and token savings for the current month. Resets at the start of each month.

Recent cache hits

A list of recent cache hits. Each entry includes:

Provider name / model name
Input / output tokens saved
Hit count
Last hit timestamp

💡 Cache statistics can be reset with the "Clear stats" button. Cached responses are automatically deleted when their TTL expires.

13. Per-Key Budget Settings

Set individual request limits per API key to control specific applications or agents (Business and above).

ℹ️ Per-key budgets are independent from the user-level budget alerts in Section 9. Combining both gives you a double layer of protection: user-wide + per-key.

Supported plans

Plan	Per-key budget
Free	—
Starter	—
Pro	—
Business	✅
Scale	✅

Setup steps

Click "API Keys" in the sidebar
Click the wallet icon for the key you want to configure
Set the following in the budget modal:
- Monthly request limit: Max monthly requests
- Daily request limit: Max daily requests
- Auto-stop: Automatically block requests when the limit is reached
Click "Save"

Configuration options

Setting	Description	Required
Monthly request limit	Max monthly requests for this key. Leave blank for unlimited.	Optional
Daily request limit	Max daily requests for this key. Leave blank for unlimited.	Optional
Auto-stop	When ON, requests from this key are automatically blocked when the limit is reached.	Optional

How it works

Checks are performed in the following order on each request:

User-level budget check (Section 9 settings)
Per-key budget check (this section's settings)
Only if both pass, the request is forwarded to the provider

Response when limit is reached:

{
  "error": {
    "message": "API key budget exceeded",
    "type": "budget_exceeded",
    "code": 403
  }
}

Notifications

Threshold alerts: Email at 50% / 80% (determined by 5-minute cron)
Limit reached: Immediate email at 100%

Notification emails include the API key name so you can immediately identify which key hit its limit.

Checking current usage

The budget modal shows daily and monthly progress bars so you can check current usage in real time.

Use case examples

Per-agent limits: 100/day for Cursor key, 200/day for Claude Code key
Per-project management: Separate budget management for Project A and Project B keys
Overnight safety: Set daily limit + auto-stop on autonomous agent keys to prevent runaway

💡 Combining per-key budgets with user-level budgets enables fine-grained controls like "max 10,000 total/month," "Agent A: max 500/day," "Agent B: max 200/day."

14. Access Control (Per-Key)

Whitelist-based controls on which providers and models each API key can access. Control "what can be used" per agent within a team, preventing unintended provider usage or expensive model misuse.

ℹ️ Access control is available on Pro plans and above. Combined with per-key budgets (Section 13), you can implement a double guard of "what can be used" + "how much can be spent."

Supported plans

Plan	Access control
Free	—
Starter	—
Pro	✅
Business	✅
Scale	✅

How to configure

Open Dashboard → API Keys
Click the 🔐 Access Restrictions button for the target key (visible on Pro and above)
Provider restriction: Select "Allow all providers" or specific providers
- Options: openai / anthropic / google
Model restriction: Select "Allow all models" or enter specific models
- Example: gpt-4o, gpt-4o-mini, claude-sonnet-4-20250514
Click Save to apply settings

⚠️ Once restrictions are set, requests to disallowed providers or models are immediately rejected with a 403 error. Verify the models your AI agent uses before configuring.

Check priority order

Access control checks are performed in this order:

Request received
  │
  ├─ 1. Provider restriction check (evaluated first)
  │     If allowed_providers is set:
  │     → Target provider not in list → 403 provider_not_allowed
  │
  ├─ 2. Model restriction check (after provider passes)
  │     If allowed_models is set:
  │     → Target model not in list → 403 model_not_allowed
  │
  └─ 3. Both pass → Forward to normal proxy processing

Error responses

Requests that violate access controls return errors in the following format:

Error code	HTTP status	Trigger
`provider_not_allowed`	403	Request to a disallowed provider
`model_not_allowed`	403	Request with a disallowed model

Response examples:

// Provider restriction error
{
  "error": {
    "message": "Provider 'google' is not allowed for this API key",
    "code": "provider_not_allowed"
  }
}

// Model restriction error
{
  "error": {
    "message": "Model 'o1-mini' is not allowed for this API key",
    "code": "model_not_allowed"
  }
}

Use cases

Scenario	Provider setting	Model setting	Effect
Cursor-only key	`openai` only	All models	Block all non-OpenAI providers
Claude Code-only key	`anthropic` only	All models	Block all non-Anthropic providers
Cost-restricted key	All providers	`gpt-4o-mini`, `gemini-2.0-flash`	Allow low-cost models only
Production-validated key	`openai` only	`gpt-4o`, `gpt-4o-mini`	Limit to validated models only

Default behavior

No restrictions set (default): All providers and models are accessible (same as before)
Provider restriction only: All models under allowed providers are accessible
Model restriction only: Specified models accessible from any provider
Both restricted: Only requests satisfying both provider and model conditions pass

💡 Removing restrictions: Click "Remove All Restrictions" to clear all provider and model restrictions, returning to the default (allow all) state.

💡 Combined with budgets: Using access control together with per-key budgets (Section 13) enables precise controls like "OpenAI gpt-4o-mini only, max 500 requests/month" per agent.

15. Semantic Cache

Semantic cache uses AI vector search to detect semantically similar requests and reuse cached responses. Unlike response cache (Section 12) which requires exact matches, semantic cache can hit on requests with different wording but the same intent.

ℹ️ Semantic cache is available on Business plans and above. Using it together with response cache (Section 12) maximizes token savings.

Benefits

Handles phrasing variations: Recognizes "What are the benefits of TypeScript?" and "Why should I use TypeScript?" as the same intent
Significant cost reduction: Catches requests that exact match misses, reducing token consumption
Two-layer optimization: On semantic hits, also saved to exact cache (response cache) automatically
Safety-first design: PII detection, short text, and multi-turn requests are automatically excluded

Downsides & caveats

False positives possible (mitigated by setting threshold ≥ 0.95)
Embedding costs apply (approx. $0.30 for 200K queries/month — very low)
Non-streaming, single-turn requests only
Vector storage has limits (Business: 10,000 / Scale: 50,000)
Eventual consistency (newly cached items may take minutes to become searchable)

Good fit vs. poor fit

Good fit	Poor fit
FAQ / helpdesk (same question, different phrasing)	Unique creative requests each time
Education / learning apps (similar questions recur)	Multi-turn conversations (chatbots)
Repetitive code generation (similar patterns)	Streaming-required agents
Best results when combined with response cache	Requests with heavy PII

Response cache vs. semantic cache

Item	Response Cache (Section 12)	Semantic Cache (this section)
Match method	SHA-256 exact match	AI vector search (cosine similarity)
Supported plans	Pro and above	Business and above
Streaming	Not supported	Not supported
Multi-turn	Supported	Not supported (single-turn only)
Latency	Very low (KV get only)	Slightly higher (embedding generation + vector search)
Header	`X-Cache: EXACT_HIT`	`X-Cache: SEMANTIC_HIT`
Storage	KV only	Vectorize + KV

Processing flow

Semantic cache is evaluated after response cache (exact match). If an exact match hits, semantic cache processing is skipped.

Request received
  ↓
Auth + plan check (Business+?) + Feature Toggle ON?
  │ NO → Normal flow (skip semantic cache)
  ↓ YES
Streaming? → YES → Skip (X-Cache-Skip-Reason: streaming)
  ↓ NO
① Exact Cache Lookup (SHA-256, KV get ×1)
  ├─ EXACT_HIT → Return immediately (no embedding needed, fastest)
  └─ MISS ↓
② Exclusion check (PII / short text / multi-turn)
  ├─ Excluded → Normal flow (X-Cache-Skip-Reason shows reason)
  └─ OK ↓
③ Generate embedding (@cf/baai/bge-small-en-v1.5, 384 dims)
  ↓
④ Vectorize search (user_scope + meta_group filter)
  ├─ Score ≥ threshold → Fetch response from KV
  │   ├─ KV success → SEMANTIC_HIT (also save to exact cache)
  │   └─ KV failure → Treat as MISS (X-Cache-Skip-Reason: kv_miss)
  └─ Score < threshold → MISS
       ↓
⑤ Send request to provider
  ↓
⑥ Save response (Vectorize + KV + Exact Cache)

💡 When a semantic cache hit occurs, the response is automatically saved to exact cache as well. If the same request comes again, it will be served from exact cache (KV get only) immediately — even faster.

Supported plans and storage limits

Item	Business ($29/mo)	Scale ($69/mo)
Max vectors	10,000	50,000
Threshold range	0.90 – 0.99	0.85 – 0.99
TTL	24 hours	7 days
Embedding model	@cf/baai/bge-small-en-v1.5 (384 dims)
Metric	Cosine similarity

ℹ️ When the vector count reaches the limit, new request caching is skipped (existing cache can still be searched). Check current usage on the "Semantic" page in the dashboard.

Automatic cleanup (CRON)

Old vectors past their TTL are deleted by a daily automatic cleanup.

Schedule: Daily UTC 15:00
Action: Detect and delete expired vectors per user
Deletion limit: Max 500 vectors/user/day (5 iterations × 100)
Count sync: Deleted vector count is automatically decremented from D1 counters

To manually delete all vectors, use the "Delete all" button on the "Semantic" page in the dashboard.

Dashboard settings

Click "Semantic" in the sidebar to open the semantic cache settings page.

Vector storage

A progress bar at the top shows vector storage usage — current count, limit, and usage percentage. TTL and auto-cleanup info is also shown.

Cache settings

Semantic cache ON/OFF: Toggle to enable/disable
Similarity threshold: Adjust with the slider (recommended: 0.92)
- Higher → More precise (lower hit rate but fewer false positives)
- Lower → Higher hit rate (more matches but higher false positive risk)
Click "Save" to apply

Cache statistics

When enabled, cache utilization statistics appear at the bottom of the page: total hits, tokens saved, this month's hits, unique prompt count, and recent semantic hit details.

Threshold guide

Threshold	Characteristics	Recommended for
0.95–0.99	High precision. Only near-identical sentences hit	Mission-critical workloads, finance/medical
0.92 (recommended)	Balanced. Hits on sufficiently similar requests	General development, code generation, QA
0.85–0.91	Wide catch. Maximizes cost savings	Batch processing with many similar requests (Scale only)

Auto-exclusion conditions

Condition	Reason	X-Cache-Skip-Reason
Streaming requests	Response is sent incrementally; full response cannot be cached	`streaming`
Multi-turn conversations	Conversations with assistant/tool roles are highly context-dependent	`multi_turn`
Short text	Prompts under 30 characters are insufficient for semantic comparison	`too_short`
PII detected	Requests containing email addresses, phone numbers, API keys, etc. are not cached	`pii_detected`
Feature Toggle OFF	Semantic cache disabled in features settings	`feature_off`
Plan not eligible	Free / Starter / Pro plans do not support semantic cache	`plan_not_allowed`

Response headers

Header	Value	Description
`X-Cache`	`SEMANTIC_HIT`	Response returned from semantic cache
`X-Semantic-Score`	0.00 – 1.00	Matched similarity score (on SEMANTIC_HIT only)
`X-Cache-Skip-Reason`	See exclusion conditions above	Reason caching was skipped
`X-Cache-Meta`	8-char hash	meta_group identifier (model/system/temperature combo)

Matching conditions

For a semantic cache hit, all of the following must be true:

Same user
Same meta_group (model name + system prompt + temperature combination)
User message cosine similarity ≥ configured threshold
Vector TTL still valid

In other words, a hit only occurs for semantically similar questions using the same model, system prompt, and temperature. Cached responses from a different model or system prompt will never be incorrectly returned.

⚠️ Multi-turn conversation limitation: Semantic cache uses only the last user message in the messages array for similarity matching. The full conversation context is not considered. The same question in different conversation flows may have different intents, so the cache may incorrectly hit. For precision, set the threshold to 0.95 or higher, or consider disabling semantic cache for multi-turn-heavy workloads.

Responses that are not saved

HTTP 4xx / 5xx error responses
Responses where finish_reason is "length" (truncated output)
Responses shorter than 80 characters

💡 Using with Smart Routing: When Smart Routing is active, semantic cache matching uses the post-routing model name. For example, if gpt-4o is routed to claude-sonnet-4-20250514, cache matching is done on claude-sonnet-4-20250514.

ℹ️ Vectorize behavior: It may take up to a few minutes for newly cached vectors to become searchable (eventual consistency). This is a Cloudflare Vectorize specification.

16. Smart Routing NEW

Smart Routing automatically routes requested models to the optimal provider model (Scale plan only). Efficiently leverage multiple providers for cost optimization or latency improvement.

⚠️ Smart Routing is disabled by default. If your AI agent depends on model-specific behavior (JSON output format, tool calling specs, etc.), switching models may cause unexpected results. Understand the use case thoroughly before enabling.

ℹ️ How routing works: Smart Routing selects an alternative provider based on a predefined model mapping table. It does not automatically assess model quality (tier classification). Failover (Section 17) also follows the same mapping table order. If a model not in the mapping is specified, routing does not trigger and the request goes directly to the original provider.

Three routing strategies

Strategy	Description	Recommended for
Cost optimization (default)	Auto-selects the cheapest provider among equivalent models	Batch processing, high-volume requests, cost-sensitive workloads
Latency optimization	Prioritizes the fastest-responding provider	Real-time responses, chatbots, user-facing apps
Round-robin	Distributes requests evenly across all active providers	Rate limit distribution, provider load balancing

Benefits

Cost optimization: Auto-switch to cheaper equivalent providers (up to 30-50% savings possible)
Rate limit avoidance: Distributing across providers reduces throttling
Zero code changes: Just toggle ON in the dashboard

Downsides & caveats

Agents dependent on model-specific behavior (output format, tool calling specs) may behave unexpectedly
Response quality may vary between providers
When used with semantic cache, matching uses the post-routing model name

How to configure

Open the "Feature Settings" page in the dashboard
Toggle "Smart Routing" to ON
Select routing strategy (Cost optimization / Latency optimization / Round-robin)
Confirm at least 2 providers have registered and enabled API keys

💡 Difference from failover: Failover is an emergency switch when a provider goes down. Smart Routing is optimal distribution during normal operation. Enabling both means failover kicks in if Smart Routing's chosen provider fails.

17. Failover Details NEW

Failover automatically switches to an equivalent model on another provider when a provider returns an error, then resends the request (Starter and above).

⚠️ Failover is disabled by default. Model switching may affect AI agent behavior. Register and enable multiple provider API keys, then explicitly enable in the dashboard.

Retry count by plan

Plan	Retries	429 handling	Notes
Free	0 (disabled)	—	No failover
Starter	2	—	5xx errors only
Pro	3	✅	5xx + 429
Business	3	✅	5xx + 429
Scale	5	✅	5xx + 429

How it works

ℹ️ Request → Provider A (5xx/429 error)
→ Retry 1: Auto-switch to equivalent model on Provider B
→ Retry 2: Auto-switch to equivalent model on Provider C
→ All providers fail: Return last error response

Check the switch status in response headers:
X-Gateway-Failover-From: Original provider
X-Gateway-Failover-To: Switched-to provider
X-Gateway-Retries: Retry count

Failover exclusions

4xx errors (Bad Request, auth errors, etc.): Client-side issues produce the same result on another provider
404 model not found: Specified model does not exist
Only one active provider: No fallback target available

How to configure

Register and enable API keys for at least 2 providers
Open the "Feature Settings" page in the dashboard
Toggle "Failover" to ON

💡 Model tiers: Failover switches to the same class of model. For example, if gpt-4o (flagship) fails, it switches to claude-sonnet-4-20250514 (flagship). It does not switch to gpt-4o-mini (economy).

18. Agent Trace NEW

Agent Trace visualizes every processing step of an API request in a Chrome DevTools-style waterfall timeline. You can see exactly how long each stage took — from auth, PII Shield, cache lookup, provider selection, to the final response — making debugging and performance optimization of AI agents straightforward.

Available Plans

Agent Trace is available on the Scale plan only.

Processing Steps

Step	Status Examples	Description
`auth`	success / error	API key authentication
`pii_shield`	success / skip	Secret Shield PII masking
`cache_check`	exact_hit / semantic_hit / miss	Response cache lookup
`provider_select`	routed / default	Smart Routing decision
`provider_request`	success / error / streaming	Request to AI provider
`failover`	success	Automatic provider failover
`response`	success / streaming	Final response delivered

Custom Trace ID

Add the x-qzira-trace-id header to your request to correlate traces with your own application logs:

curl https://api.qzira.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_QZIRA_KEY" \
  -H "x-qzira-trace-id: my-agent-session-001" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

The assigned trace ID is returned in the X-Trace-Id response header.

Data Retention

Trace logs are automatically deleted after 24 hours by a daily CRON job. Traces are not available for export.

Dashboard

Open Dashboard → Agent Trace
The waterfall timeline for each request is displayed in the list
Click a row to expand the step-by-step breakdown with timing
Rows marked EXT indicate a custom trace ID was supplied
Rows marked CACHE indicate a cache hit occurred

Notes

Trace recording uses waitUntil() — zero impact on response latency
Streaming requests are recorded with provider_request: streaming status
If Secret Shield is disabled, the pii_shield step shows skip

19. Secret Shield NEW

Secret Shield automatically detects and masks sensitive information (PII) in requests before they are sent to AI providers. Define custom regex rules to redact API keys, credit card numbers, personal IDs, and more — the provider never sees the original values.

Available Plans

Secret Shield is available on the Scale plan only.

How It Works

User registers regex rules in the Dashboard
  ↓
API request received (proxy.routes.ts)
  ↓
If Secret Shield is ON → scan & mask request body before sending
  ↓
Masked request sent to AI provider
  ↓
Usage log saved with masked content (original never stored)

Masking Modes

Mode	Example Input	Example Output
Full mask	`[email protected]`	`[REDACTED]`
Prefix (keep first N chars)	`sk-ant-api03-xxxx`	`sk-ant-***`
Suffix (keep last N chars)	`4111-1111-1111-1111`	`****-1111`

Built-in Presets (11 rules)

Preset	Masking Mode
AWS Access Key	Prefix (AKIA***)
AWS Secret Key	Prefix
OpenAI API Key	Prefix (sk-***)
Anthropic API Key	Prefix (sk-ant-***)
GCP API Key	Prefix (AIza***)
Credit Card Number	Suffix (****-1234)
Phone Number (JP)	Suffix
Email Address	Full ([REDACTED])
JWT Token	Full ([REDACTED])
My Number (JP)	Full ([REDACTED])
Postal Code (JP)	Full ([REDACTED])

AI-Assisted Regex Generation

Describe what you want to mask in plain English (or Japanese), and qzira will generate the regex automatically using Cloudflare Workers AI (Llama 3.1 8B). You can review and edit the generated pattern before saving.

Rule Limits

Up to 20 rules per user. Rules are applied in order; all matching rules are applied to each request.

Dashboard Setup

Open Dashboard → Secret Shield
Click Add Rule and describe what to mask (e.g. "employee ID starting with EMP")
Click ✨ Generate Regex or enter the pattern manually
Choose masking mode (Full / Prefix / Suffix) and save
Toggle Secret Shield ON at the top of the page
Use the Test field to verify masking behavior before going live

Security Notes

Regex patterns are stored server-side and are not displayed after saving — delete and recreate to update a pattern
Secret Shield operates as fail-open: if an error occurs during masking, the original request continues unmodified (masking failure does not block the API call)
Masked content is logged in usage_logs — the original value is never stored

FAQ

Can I use Secret Shield without Smart Routing?: Yes. Secret Shield and Smart Routing are independent features that can be toggled separately.
Does Secret Shield affect latency?: Masking adds minimal overhead (typically under 5ms) and does not block the request on error.
Can I view a saved rule's regex pattern?: No. For security reasons, patterns are not displayed after saving. To update a pattern, delete the rule and create a new one.

Responsibility & Disclaimer

qzira is a BYOK (Bring Your Own Key) API gateway. We clearly define the responsibility boundaries for each area.

Responsibility matrix

Area	qzira's responsibility	User's responsibility
API key management	Encrypted storage, safe handling during proxy	Obtaining, managing, and renewing each provider's API keys
Request relay	Accurate proxy processing, format conversion	Appropriateness of request content, compliance with provider terms
Cost management	Usage visibility, budget alerts, auto-stop feature provision	Budget configuration, proper operation of cost management features
AI-generated content	— (not involved)	Reviewing generated content, usage decisions, third-party impact
Provider outages	Impact mitigation via failover (supported plans only)	Provider SLA and outages are the provider's responsibility
Data protection	Encrypted communications, proper log management	Managing transmitted data content (PII, confidential info handling)
AI agents	— (not involved)	Configuring, monitoring, and controlling AI agents
Provider terms	— (not involved)	Compliance with each provider's Terms of Service and policies

🚨 Important: qzira is a request relay service and assumes no responsibility for AI-generated content. Users are responsible for verifying the accuracy and appropriateness of generated content.

Limitations of cost management features

qzira's budget alerts and auto-stop features are designed to assist cost management. Please note:

Budget alerts are request-count-based and may differ from actual provider billing amounts
Auto-stop is determined on the next incoming request — requests already in progress are not stopped
Alert/stop triggering may be delayed during network delays or system failures
Cost management features are provided on a "best-effort" basis
Usage aggregation updates at intervals — spending may exceed the configured limit depending on timing

Protection level by plan

Protection feature	Free	Starter	Pro	Business	Scale
Monthly request limit	✅	✅	✅	✅	✅
Auto retry	—	✅	✅	✅	✅
Failover	—	✅	✅	✅	✅
Budget alert notifications	—	✅	✅	✅	✅
Daily request limit	—	✅	✅	✅	✅
Budget limit & auto-stop	—	—	✅	✅	✅
Response cache	—	—	✅	✅	✅
Semantic cache	—	—	—	✅	✅
Per-key budget	—	—	—	✅	✅
Access control (Per-Key)NEW	—	—	✅	✅	✅
Smart RoutingNEW	—	—	—	—	✅
Priority support	—	—	—	✅	✅

Notes for AI agent usage

AI agents (Cursor, Claude Code, Devin, etc.) autonomously send API requests. Agent behavior is the user's responsibility. Recommended safeguards:

Set a daily request limit (Starter and above)
Enable auto-stop (Pro and above)
Check usage in the dashboard regularly
Be especially careful with overnight automated tasks

SLA (Service Level)

qzira strives to maintain availability but does not currently offer SLA guarantees. The service may be temporarily suspended without prior notice to ensure infrastructure safety. See the Terms of Service for details.

FAQ

Pricing

Q. Is there a time limit on the Free plan?

No, the Free plan has no time limit. Up to 1,000 requests/month with 1 active provider.

Q. Are there costs beyond qzira's fees?

Yes. qzira is BYOK, so usage fees for each AI provider (OpenAI, Anthropic, Google AI, DeepSeek) are separate. qzira's monthly fee covers the gateway service itself.

Q. Can I change plans at any time?

Yes, you can upgrade or downgrade at any time from the dashboard. Upgrades take effect immediately; downgrades take effect at the end of the current billing period. Note that downgrades are limited to once per month.

Security

Q. How are my registered API keys protected?

API keys are stored encrypted. All communication is encrypted over HTTPS (TLS 1.3), and qzira staff cannot view API keys in plaintext.

Q. Does qzira use request content for model training?

No. qzira does not use request content for AI model training or service improvement. Data is used solely for request relay and usage measurement.

Q. Where is my data stored?

qzira's infrastructure runs on Cloudflare's global network. User data is stored in Cloudflare data centers.

Q. What should I do if my API key is leaked?

Immediately rotate (regenerate) the key from the "API Keys" page in the dashboard. The old key is invalidated instantly. See Section 11 for details.

Service

Q. If qzira goes down, will I lose API access?

Requests via qzira will be affected. You can immediately fall back by reverting Base URL and API key to the original values to switch to direct provider access.

Q. What happens if I specify an unsupported model?

You will receive a 400 Invalid model error. Check the supported model list in the Code Migration section.

Q. Is streaming available for all providers?

Yes. SSE-format streaming is supported for OpenAI, Anthropic, Google AI, and DeepSeek. Available on all plans.

Q. What should I do if I hit a rate limit?

If a qzira rate limit (429 error) occurs, wait a moment and retry. Starter plans and above have automatic retry handled by qzira.

Q. Are blocked requests from access control billed?

No. Access control checks happen before sending to the provider API, so blocked requests do not incur provider charges and are not counted toward qzira usage.

Q. Does access control apply to both Chat Completions API and Responses API?

Yes. Access control is checked before request routing, so it applies to both Chat Completions API (/v1/chat/completions) and Responses API (/v1/responses).

Q. What's the difference between semantic cache and response cache?

Response cache (Section 12) only hits when requests are completely identical. Semantic cache (Section 15) uses AI vector search to detect and reuse cached responses for semantically similar requests even if phrasing differs. Using both maximizes token savings.

Q. Can semantic cache return incorrect responses?

Setting the similarity threshold appropriately minimizes false hits. The recommended value is 0.92. For precision-critical use cases, set 0.95 or higher. Additionally, streaming requests and multi-turn conversations are automatically excluded, so it can be used safely in agentic interactive scenarios.

Q. How long are request logs retained?

Logs are automatically retained based on your plan: Free (3 days), Starter (30 days), Pro (30 days), Business (90 days), Scale (365 days). Logs past the retention period are automatically deleted and cannot be recovered. Use CSV export for long-term storage.

Q. Can I export usage data?

Yes. You can export as CSV from the "Request Logs" section of the dashboard. Includes model, provider, token counts, latency, and more. Available on all plans.

Cancellation

Q. What happens to my data when I cancel?

Deleting your account removes all registered API keys, usage history, and settings. This action cannot be undone.

Q. Will I be charged after cancellation?

For paid plans, cancellation downgrades you to the Free plan at the end of the current billing period. Pro-rated refunds are not available.

Smart Routing & Failover

Q. What's the difference between Smart Routing and failover?

Smart Routing is optimal distribution during normal operation — it routes requests to the best provider based on cost, latency, or load balancing (Scale only). Failover is an emergency switch during outages — it automatically switches to an equivalent model on another provider when 5xx or 429 errors occur (Starter and above). Enabling both means failover kicks in if Smart Routing's chosen provider fails.

Q. Will failover affecting my AI agent's behavior?

It might. Because JSON output formats and tool calling specs differ between providers, agents dependent on model-specific behavior may behave unexpectedly. This is why failover is disabled by default. We recommend verifying behavior across multiple providers before enabling.

Tool Calling

Q. Does qzira automatically convert Tool Calling formats between providers?

Yes. qzira accepts OpenAI-format tools / tool_choice parameters and auto-converts to each provider's native format. Tool Calling requests from AI coding tools like Cursor, Cline, and Roo Code work with OpenAI, Anthropic, and Google AI without additional configuration.

Q. Are there unsupported formats for Tool Calling auto-conversion?

Current conversion is based on OpenAI-compatible format (tools array + function definition). If you send Anthropic-native tool_use or Google-native function_declarations format directly, it passes through to that provider, but when switching providers via failover or Smart Routing, conversion may not work correctly. We recommend standardizing on OpenAI-compatible format.

Images & multimodal

Q. Can I send requests with images through qzira?

Yes. qzira passes through request bodies, so image input (Base64 encoded / URL) supported by each model can be sent as-is.

Q. Does qzira auto-convert image formats between providers?

No. There is no auto-conversion of image formats between providers. Sending an OpenAI-format (image_url) request with images to an Anthropic model will result in a format mismatch error. When sending image-containing requests, use the format specified by the target provider. Note that images are also not converted when providers are switched by failover.

Emergency fallback (rollback)

Q. What's the fastest way to recover if qzira has an outage?

Change just these two things to switch to direct provider communication in 10 seconds:

Base URL: Delete (revert to default) or change to the provider's official URL
API Key: Replace gw_... with your original provider key (sk-..., etc.)

⚠️ Because of BYOK, always keep your original provider API keys accessible. Even after registering keys in qzira, you can always check or regenerate them from the provider's dashboard.

qzira is designed so that just changing the Base URL enables introduction or rollback — no major SDK or code rewrites required.

Troubleshooting

Common errors

Error	Cause	Solution
`401 Unauthorized`	Invalid or missing API key	Verify the `gw_` prefixed qzira API key. Update to new key after rotation.
`400 Provider not configured`	Provider API key not registered	Register the provider API key in the dashboard
`400 Invalid model`	Specified model name not recognized	Check the supported models list and specify a valid model name
`403 Provider not enabled`	Provider not enabled	Enable the provider in the dashboard
`403 provider_not_allowed`	Violates the API key's provider restriction. Check access restriction settings in the dashboard.
`403 model_not_allowed`	Violates the API key's model restriction. Confirm the requested model is in the allowed models list.
`403 Budget Exceeded`	Budget limit reached	Raise the limit in the dashboard, or wait for reset
`403 Plan limit exceeded`	Monthly request limit for the plan reached	Upgrade to a higher plan, or wait for next month's reset
`429 Rate Limited`	Rate limit reached	Wait a moment and retry (Starter+ plans auto-retry)
`502 Bad Gateway`	Invalid response from provider	Wait and retry. Failover auto-triggers (Starter and above).
`503 Service Unavailable`	Provider temporarily unavailable	Failover auto-triggers (Starter+). If it continues, check the provider's status page.

Immediate rollback

If you experience issues via qzira, just revert Base URL and API key to switch back to direct calls instantly.

# Rollback: qzira → direct call
client = OpenAI(
    api_key="sk-xxxxxxxx",           # Revert to original key
    # Remove base_url (reverts to default)
)

If managing via environment variables, just switch the values in your .env file:

# QZIRA_API_KEY=gw_xxxxxxxx          ← Comment out to roll back
# QZIRA_BASE_URL=https://api.qzira.com/v1
OPENAI_API_KEY=sk-xxxxxxxx            # ← Original key works as-is

Legal Documents

Legal documents related to qzira use are available at the links below.

Document	Summary	Link
Terms of Service	Service conditions, scope of responsibility, disclaimer (22 articles)	Read Terms of Service
Privacy Policy	Personal information handling, data protection policy (13 articles)	Read Privacy Policy
Specified Commercial Transactions Act	Business operator information, sales terms (Japanese law requirement)	Read Tokushoho

Key Terms of Service provisions

🚨 Article 4 (BYOK): Costs and damages arising from API keys registered by the user are the user's responsibility. qzira assumes no responsibility for provider billing.

⚠️ Article 6 (AI Agents): Requests autonomously sent by AI agents are deemed to have been sent by the user. Agent configuration and monitoring are the user's responsibility.

⚠️ Article 7 (Limitations of cost management features): Budget alerts and auto-stop features assist cost management but do not guarantee prevention of all cost overruns.

ℹ️ Article 9 (Flow-through clause): When each AI provider's Terms of Service change, users must also comply with the updated terms.

Support

For questions and bug reports, please contact us through the following:

Contact form: https://138io.com/contact/
Email: [email protected]
Operator: 138data

Legal Disclaimer

Important

Please read the following disclaimer carefully before using this service and documentation.

Legal Disclaimer (English)

1. Service Provided "As-Is"
To the maximum extent permitted by applicable law, this documentation and the qzira service are provided on an "as-is" and "as-available" basis without warranties of any kind, whether express, implied, or statutory, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The information contained herein reflects the state of the service as of February 17, 2026 and may not reflect subsequent changes.

2. Third-Party Dependency and Provider Risks
qzira operates as an API gateway that routes requests to third-party AI providers including OpenAI, Anthropic, and Google. Core features — including but not limited to chat completions, streaming, Tool Calling (Function Calling), failover, smart routing, response caching, and semantic caching — depend on the continued availability and compatibility of these providers' APIs. Provider-side changes such as API specification updates, model deprecation, pricing modifications, rate limit adjustments, or service discontinuation may affect qzira's functionality without prior notice. Users are solely responsible for compliance with each provider's Terms of Service, Acceptable Use Policy, and data handling policies.

3. Limitation of Liability
To the maximum extent permitted by applicable law, qzira and its operator, 138data, shall not be held liable for any direct, indirect, incidental, consequential, or special damages arising from the use of or inability to use the service, including but not limited to unexpected API billing, loss of data, business interruption, or loss of profits, regardless of the theory of liability.

4. Budget Controls and Auto-Stop Disclaimer
qzira provides budget management features including budget alerts, automatic stop (Auto-Stop), and per-API-key budget limits as tools to assist users in managing their API expenditure. These features operate on a best-effort basis and do not guarantee the prevention of all overspending. Factors such as processing delays, concurrent requests, provider-side latency, and exchange rate fluctuations may result in actual charges exceeding configured budgets. Users remain solely responsible for monitoring their own API usage and costs, and the availability of these features does not relieve users of this responsibility.

5. Budget Management Disclaimer
Budget limits are enforced on a best-effort basis. Due to the timing of usage aggregation, actual spending may exceed the configured limit. API charges are incurred under the agreement between the user and each AI provider, and any overage is subject to those terms. Please refer to the usage logs in your dashboard for accurate usage details.

qzira acts solely as an API gateway and control layer and is not the billing entity for API usage charges.

6. Data Handling
qzira does not use API request/response relay data for training purposes. However, data usage policies and opt-out settings for destination providers (OpenAI, Anthropic, Google, etc.) must be confirmed and managed by users themselves. Each provider's data handling policies are outside qzira's control.

7. Recommendation for Verification
Before deploying to production, always check the latest official documentation and pricing pages for each provider. We strongly recommend combining qzira's budget alerts and auto-stop features to prevent unintended billing.

qzira AI API GatewayTechnical Reference

Why use qzira?

Architecture overview

1. Account Setup

1Access the dashboard

2Sign in with Google

2. Create API Key

1Create a new API key

2Copy your API key

3. Register Provider API Keys (BYOK)

Supported Providers

Registration steps

4. Enable Providers

How to enable

Simultaneous active providers by plan

5. Code Migration

Endpoint info

Migration example: Python (OpenAI SDK)

Migration example: Python (Anthropic SDK → OpenAI-compatible)

Migration example: Python (Google Gemini → OpenAI-compatible)

Migration example: JavaScript / TypeScript

Managing keys with environment variables (recommended)

Streaming support

Migration example: Anthropic SDK (native format) NEW

Migration example: TypeScript / JavaScript (Anthropic SDK)

Supported models (major examples)

6. Testing with curl

OpenAI model

Anthropic model

Google AI model

Anthropic native format (/v1/messages) NEW

/v1/messages streaming

Streaming test

PowerShell (Windows)

Successful response example

Tool Calling (Function Calling) NEW

Tool Calling support by provider

Tool Calling test (curl)

OpenAI model

Gemini model (auto-converted)

Tool Calling response example

tool_choice options

Streaming with Tool Calling

7. Cursor / AI Agent Integration

Cursor Verified

Prerequisites

Step 1: Set up qzira

Step 2: Configure Cursor

Verification checklist

Troubleshooting

Claude Code NEW

Linux / macOS

PowerShell (Windows)

Persist via ~/.bashrc or ~/.zshrc

Verification

Cline (VS Code extension)

Windsurf

Roo Code (VS Code extension)

Method A: OpenAI Compatible mode (recommended)

Method B: Anthropic mode

Other AI agents

Tool × Provider Compatibility Matrix

8. Monitoring Usage in the Dashboard

Dashboard overview

Recent requests

Usage export (CSV) NEW

CSV columns

Log retention period NEW

9. Budget Alerts & Auto-Stop

Budget management modes

Configuration options

Exchange rate display (USD / JPY) NEW

About cost estimation

10. Plan Upgrade

Plan comparison

Key features by plan

11. API Key Rotation NEW

What is rotation?

Rotation steps

When to rotate

qzira AI API Gateway
Technical Reference