Axiom LLM Gateway
Axiom LLM Gateway gives teams one controlled entry point for AI inference across managed and self-hosted model providers. Applications use an OpenAI-compatible API while Axiom handles provider credentials, model routing, fallback behavior, load balancing, usage analytics, audit logs, and Prometheus-compatible metrics.
Use these docs when you are setting up the gateway for the first time, adding providers, connecting an application, or operating the gateway in production.
What the gateway provides
- One API for multiple providers: Use OpenAI-compatible chat and embeddings endpoints for supported providers such as OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Google Vertex AI, Mistral, Cohere, Groq, Ollama, OpenRouter, Perplexity, Cerebras, Hugging Face, ElevenLabs, Nebius, xAI, and Parasail.
- Centralized credentials: Store provider credentials once, scope them to your organization, disable or rotate them without changing application code, and keep provider-specific settings with the credential.
- Model-aware routing: Route requests by requested model and credential configuration so the right provider key is used for the workload.
- Weighted load balancing: Split traffic across credentials within the same provider and model group.
- Fallback routing: Define ordered fallback paths so traffic can continue when a primary provider is disabled, unavailable, rate limited, or returning errors.
- Streaming support: Use Server-Sent Events for streaming chat completions.
- Observability: Track request rate, latency, provider duration, token usage, costs, errors, and active requests through the embedded UI and metrics endpoint.
- Auditability: Review credential changes, fallback events, and provider errors with tenant-scoped audit logs.
- Kubernetes-native operation: Run as part of Axiom with embedded management UI, REST API, inference gateway, and metrics endpoints.
Documentation map
| Document | Use it for |
|---|---|
| Getting started | First provider, first request, and application connection steps. |
| Configuration guide | Credentials, provider settings, session/API access, and environment planning. |
| Routing and reliability | Model routing, load balancing, fallback chains, and operational patterns. |
| Observability and FinOps | Metrics, dashboards, Prometheus/VictoriaMetrics, token usage, cost tracking, and budgets. |
| Operations and security | Tenant isolation, credential hygiene, production deployment, and runbook practices. |
Core endpoints
The LLM Gateway is exposed under your Axiom Cloud base URL:
https://cloud.axiomstudio.ai/rest/v1/llm-gateway
OpenAI-compatible inference endpoints are available below:
/rest/v1/llm-gateway/v1/chat/completions
/rest/v1/llm-gateway/v1/embeddings
Management APIs, such as credentials, providers, analytics, fallback configuration, metrics queries, overview, and audit logs, are exposed under:
/rest/v1/llm-gateway/*
Metrics for Prometheus-compatible scrapers are served through the shared AI gateway metrics endpoint:
/metrics/ai-gateway
Authentication model
Interactive users authenticate with their Axiom Cloud session. Application examples in this documentation show session-token placeholders because the gateway accepts OpenAI-compatible clients with the session token configured as the API key.
Metrics scraping uses a separate bearer token generated by an operator. Store that token in your monitoring system's secret store and reference it from Prometheus or VictoriaMetrics configuration.