Axiom StudioAXIOMSTUDIO
All docs

Operations and Security

LLM Gateway centralizes model access, so operational discipline matters. Treat gateway configuration as production infrastructure: credential changes, fallback routes, metrics tokens, and budgets should be owned and reviewed.

Tenant isolation

LLM Gateway data is scoped by organization and user context. Credentials, analytics, fallback configuration, metrics queries, and audit logs are isolated so one tenant cannot inspect another tenant's configuration or usage.

Operational expectations:

  • Use organization-specific credentials.
  • Avoid shared provider keys across unrelated tenants.
  • Review access to LLM Gateway administration roles.
  • Keep user sessions and provider keys out of logs and documentation.

Credential hygiene

Recommended practices:

  • Store provider keys only in Axiom Cloud and your approved secret management process.
  • Use descriptive credential names that identify provider, environment, and purpose.
  • Prefer disabling credentials before deletion during incidents.
  • Rotate keys on a defined schedule.
  • Revoke old provider keys after migration.
  • Keep fallback credentials current; stale backup keys fail during incidents.

Never place real provider keys, session tokens, metrics tokens, or customer data in examples, tickets, logs, screenshots, or commits.

Metrics token security

The metrics endpoint uses an operator-generated bearer token. Protect it like an infrastructure secret.

Recommended practices:

  • Store it in Prometheus, VictoriaMetrics, or Kubernetes secrets.
  • Use bearer_token_file or credentials_file rather than inline configuration where possible.
  • Rotate it when access changes.
  • Limit access to systems that need to scrape metrics.

Scalable Kubernetes architecture

LLM Gateway is designed to run as part of a Kubernetes-based Axiom deployment. The gateway process includes the inference API, management API, embedded UI, and metrics endpoint, so scaling is primarily a matter of running multiple gateway pods behind a stable Kubernetes Service.

For production scale, plan the deployment around:

  • Horizontal gateway pods: Run multiple replicas behind a Kubernetes Service or ingress so application traffic can continue during pod restarts, upgrades, and node maintenance.
  • PostgreSQL persistence: Use PostgreSQL for durable production data, including credentials metadata, analytics records, audit logs, configuration, and billing-related state.
  • Redis coordination: Use Redis where multi-pod deployments need real-time cache invalidation and cross-pod configuration propagation.
  • Prometheus-compatible metrics: Scrape /metrics/ai-gateway from the deployment and retain metrics in Prometheus, VictoriaMetrics, or your managed observability platform.
  • Progressive rollout controls: Use Kubernetes rolling updates, canary releases, and gateway credential weights to move traffic gradually.
  • Operational runbooks: Document how to disable credentials, change fallback routes, rotate metrics tokens, and restore primary provider routing after an incident.

Hosting options

Choose the hosting model that matches your control, compliance, and operations requirements:

OptionBest fitOperational notes
Axiom-managed cloudTeams that want Axiom to operate the gateway control plane and platform dependencies.Axiom manages the hosted environment while customer teams manage provider credentials, routing policy, usage review, and application integration.
Customer-managed KubernetesTeams that need the gateway inside their own cloud account, VPC, cluster, or compliance boundary.Run Axiom in your Kubernetes environment with your own ingress, PostgreSQL, Redis, metrics stack, backup policy, and secret-management process.
Hybrid fleetTeams with multiple regions, isolated environments, or both hosted and self-hosted requirements.Use a consistent configuration model across environments, then apply environment-specific provider credentials, fallback routes, and monitoring targets.

For self-hosted deployments, validate cluster capacity, ingress/TLS, PostgreSQL backups, Redis availability, metrics retention, and credential rotation before sending production traffic. For Axiom-managed cloud, confirm application connectivity, provider account ownership, budget expectations, and operational contacts before rollout.

Change management

Use a controlled workflow for high-impact changes:

  1. Add or update credentials outside peak traffic when possible.
  2. Test with a known prompt and low-risk workload.
  3. Roll out through weights or non-critical models first.
  4. Watch analytics, metrics, and audit logs.
  5. Document fallback changes for on-call operators.

Changes that affect routing, fallback, weights, or provider credentials should be visible to teams that own production workloads.

Incident runbook

When requests fail or latency increases:

  1. Open Overview and identify affected provider, model, and credential.
  2. Compare gateway request duration with provider duration.
  3. Check llm_gateway_errors_total and error labels.
  4. Review audit logs for recent credential or fallback changes.
  5. Disable the failing credential if needed.
  6. Confirm fallback traffic succeeds.
  7. Monitor token usage and cost while fallback routes are active.
  8. Restore normal routing after provider health is confirmed.

Data handling

Applications send prompts and responses through the gateway. Customer teams should align gateway usage with their data handling policies:

  • Do not send regulated or sensitive data to providers that are not approved for that data class.
  • Use provider and region choices that match compliance requirements.
  • Keep audit logs and metrics access restricted.
  • Review provider retention and training policies outside Axiom before sending production data.

Security checklist

  • Only approved users can manage credentials and fallback routes.
  • Provider keys are rotated and old keys are revoked.
  • Metrics bearer token is stored securely.
  • Examples use placeholders only.
  • Fallback providers meet the same data handling requirements as primary providers.
  • Audit logs are reviewed after incidents and major configuration changes.
  • Monitoring covers authentication failures, provider failures, rate limits, and cost spikes.