OpenAI made a quiet but useful point about Codex: the Codex app, CLI, and SDK can run any model, not only OpenAI's. The harness is the product; the model behind it is a choice. So you can keep Codex and point it at GPT-5.5, Claude, an open-weights model like GLM-5.2 or Kimi K2.7, or whatever fits the task.
There is one catch that trips most people up. Since February 2026, Codex standardized on OpenAI's Responses API. Its provider integration expects wire_api = "responses", and the old Chat Completions path is no longer the way in. That means the model platform you point Codex at has to speak the Responses API natively, not just Chat Completions. Most gateways only do the latter, and they break here.
Token Station exposes every model it hosts through the OpenAI Responses API at /v1/responses, so Codex connects directly with no shim. This guide is the exact setup, the verification command, how to swap models with one line, and how smart routing fits in.
Why you need a custom provider (not just env vars)
With Claude Code you can redirect to a different endpoint with environment variables alone. Codex is different. Its built-in OpenAI provider ignores OPENAI_BASE_URL and always dials api.openai.com. Setting that variable does nothing for the default provider.
The supported path, per OpenAI's advanced configuration docs, is to define your own entry under [model_providers.<id>] in ~/.codex/config.toml and select it with model_provider. (To move the built-in provider you would use openai_base_url, and you cannot reuse the reserved openai id, so a named custom provider is the clean route.) Your API key stays in an environment variable, referenced from the config by env_key so the secret never lands in the file.
The one-time config
Create the config file. This defines a token_station provider on the Responses API and makes it the default:
mkdir -p ~/.codex
cat > ~/.codex/config.toml <<'EOF'
model = "openai/gpt-5.5"
model_provider = "token_station"
[model_providers.token_station]
name = "Token Station"
base_url = "https://models.bytefuture.ai/v1"
env_key = "TOKEN_STATION_API_KEY"
wire_api = "responses"
EOF
Then export your Token Station key (the variable name matches env_key above) and run a one-line check:
export TOKEN_STATION_API_KEY="YOUR_TOKEN_STATION_KEY"
codex exec "Respond with exactly the word: pong"
If it prints pong, Codex is talking to Token Station over the Responses API. From here, codex opens the interactive session against the same provider.
What each field does
| Key | Meaning |
|---|---|
model | The default model ID Codex requests, in provider/model form (here openai/gpt-5.5). |
model_provider | Which provider block to use. Must match the id in [model_providers.<id>]. |
name | A human-readable label. Free text; not an id. |
base_url | Token Station's OpenAI-compatible base, https://models.bytefuture.ai/v1. Codex appends /responses. |
env_key | The environment variable Codex reads the key from. The secret stays out of the file. |
wire_api | "responses". This is the part that matters: it selects the Responses API, which Codex requires and Token Station supports natively. |
It matches the OpenAI docs
Every key above is straight from OpenAI's documented schema for custom providers: model and model_provider at the top level, then a [model_providers.<id>] table with name, base_url, env_key, and wire_api. The id token_station is allowed because it is not one of the reserved ids (openai, ollama, lmstudio). The only value that has to be exactly right for Codex today is wire_api = "responses". Nothing in the block is Token Station specific syntax; it is the same shape you would use for any provider.
Swap the model with one line
Because every model on Token Station sits behind the same key and the same Responses endpoint, switching models is a single edit to model in the config, or a flag at launch:
codex --model anthropic/claude-opus-4-8 exec "Summarize git diff and suggest a commit message"
Some model IDs you can drop into model right now, all on the same config:
| Model ID | Good for |
|---|---|
openai/gpt-5.5 | OpenAI's flagship; the native Codex default. |
anthropic/claude-opus-4-8 | Long-horizon agentic coding and refactors. |
glm/glm-5.2 | Open-weights, 1M context, strong on code at a low price. |
kimi/kimi-k2.7-code | Cheap open-weights coding model for routine work. |
xai/grok-build-0.1 | Fast and inexpensive, a fraction of flagship output cost. |
The point OpenAI was making lands here: Codex is model agnostic. Run the expensive model on the hard task and a cheap open-weights model on the boilerplate, without leaving the harness or touching anything but one line.
Smart routing: let one ID pick the model
Hardcoding a model per task is fine, but Token Station also lets you route by rule instead of by name. You define a policy on your workload (cheapest model that clears a quality floor, latency capped under a threshold with a provider allowlist, or a strict fallback chain like a primary model with a backup behind it) and Token Station picks the model per request.
For Codex this is handy because Codex itself only sends one model ID. Point model at your routed workload and the decision moves server side: if the primary model is slow or unavailable, the fallback answers, and your Codex session never has to know. You change the routing in Token Station, not in config.toml, so the same Codex setup follows your policy as it evolves.
Codex sends one model ID. Smart routing decides what actually answers, so cost and fallback logic live in Token Station instead of being hardcoded in your config.
If something does not connect
- It still hits
api.openai.com. You setOPENAI_BASE_URLand expected the built-in provider to follow. It will not. Use the custom provider above and setmodel_provider = "token_station". - 401 / auth errors. The exported variable name must match
env_keyexactly (TOKEN_STATION_API_KEY), and the key must be exported in the same shell that runscodex. - Protocol or 404 errors on the model. Confirm
wire_api = "responses". Codex requires the Responses API; a Chat Completions only gateway cannot satisfy it. - Wrong model id. Use the
provider/modelform (for exampleanthropic/claude-opus-4-8), not a bare model name.
Get started
Codex running any model comes down to four lines of TOML and one environment variable, and the only requirement that bites is the Responses API. Token Station serves every model it hosts over that API, so the config above works unchanged whether you run GPT-5.5, Claude, GLM-5.2, or a routed workload.
Sign up at models.bytefuture.ai ($10 in free credit, no card), drop your key into TOKEN_STATION_API_KEY, and run the pong check. One key, one endpoint, every model your Codex sessions need.