Tutorial

Kimi K2.7 Code: Cheap Enough to Try, Maybe Good Enough to Share the Work

Yesterday Moonshot AI dropped Kimi K2.7 Code on Hugging Face: a 1-trillion-parameter Mixture-of-Experts coding model (32B active) with a 256K context window, open weights under a Modified MIT license.

If you read our Claude Fable 5 piece, this is the same logic running in reverse. With Fable 5, the capability was proven and the price was the risk. With K2.7 Code, the price is trivial and the capability is the open question. The model is one day old, no third party has benchmarked it, and Moonshot's own numbers put it behind the frontier. Both situations end in the same posture: run a cheap, reversible experiment inside the coding harness you already use.

There is one twist that makes this experiment more interesting than a simple swap. At $0.95 per million input tokens and $4.00 per million output, K2.7 Code costs about a tenth of Claude Fable 5 on input and a twelfth on output. That is cheap enough to give it a different job: working alongside your SOTA model, taking the routine fan-out work while the expensive model keeps the hard parts.

K2.7 Code is on Token Station as kimi/kimi-k2.7-code, at Moonshot's list price with zero markup, and your $10 signup credit covers a lot of it.

What we know (and what we don't)

From the model card:

And the honest part. Moonshot published its own comparison against the frontier, and K2.7 Code loses:

Grouped bar chart of Moonshot's self-reported benchmarks: Kimi K2.7 Code scores 62.0 on Kimi Code Bench v2 versus 69.0 for GPT-5.5 and 67.4 for Claude Opus 4.8; 53.6 on ProgramBench versus 69.1 and 63.8; and 76.0 on MCP Atlas versus 79.4 and 81.3
Moonshot's own published numbers. K2.7 Code trails the frontier on all three. Data: Kimi K2.7 Code model card, June 2026.

A vendor publishing benchmarks where its model loses is a good sign for the numbers' honesty, and the gaps are not embarrassing: roughly 7 points behind GPT-5.5 on Moonshot's coding bench, closer on tool use. The previous Kimi release (K2.6) is currently the best open-weights model on the Artificial Analysis Intelligence Index. What nobody knows yet is how K2.7 Code behaves on your codebase, in your harness, over a long agentic session. That is the unknown this experiment resolves.

One disambiguation, in the same spirit as our Grok Build article: K2.7 Code the model is optimized for Moonshot's own Kimi Code CLI, their coding harness. You do not need that CLI. The model speaks OpenAI- and Anthropic-compatible APIs, and Token Station translates whatever your existing harness sends.

The price: a rounding error next to the frontier

All of these are live on Token Station at the providers' list prices:

ModelInput / 1MOutput / 1MContext
kimi/kimi-k2.7-code$0.95$4.00256K
xai/grok-build-0.1$1.00$2.00256K
anthropic/claude-sonnet-4-6$3.00$15.001M
anthropic/claude-opus-4-8$5.00$25.001M
openai/gpt-5.5$5.00$30.001M
anthropic/claude-fable-5$10.00$50.001M

The $10 signup credit buys roughly 10 million input tokens or 2.5 million output tokens at K2.7 Code prices. Where the same credit bought an afternoon of Fable 5, here it buys weeks of evaluation. The downside risk of this experiment rounds to zero.

The real experiment: share the work

Coding agents already split their work into tiers. There is the main loop, where planning and hard reasoning happen, and there is the fan-out: subagents reading files, running searches, executing tests, summarizing results. The fan-out burns most of the tokens and needs the least brilliance.

That split is exactly where a $4-per-million model earns a place next to a $50-per-million one. Keep Fable 5 or Opus 4.8 in the driver's seat and hand the routine work to K2.7 Code. If Moonshot's numbers hold up in practice, the quality drop on delegated tasks is small and the cost drop is more than 10x on every delegated token.

What you need

Claude Code setup: the two-tier split

Claude Code exposes its model tiers as environment variables, which makes it the cleanest place to run the share-the-work experiment. Reserve the Opus slot for Claude Fable 5 and give everything else to the workhorse:

# Token Station endpoint + auth
export ANTHROPIC_BASE_URL="https://models.bytefuture.ai"
export ANTHROPIC_AUTH_TOKEN="gw-YOUR_TOKEN_STATION_KEY"

# Top tier: Fable 5 takes the genuinely hard problems
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-fable-5"

# Everything else runs on the workhorse
export ANTHROPIC_DEFAULT_SONNET_MODEL="kimi/kimi-k2.7-code"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="kimi/kimi-k2.7-code"
export CLAUDE_CODE_SUBAGENT_MODEL="kimi/kimi-k2.7-code"

claude

An ordinary session now runs K2.7 Code end to end: the main loop, every subagent, every background search, all billing at $4 per million output instead of $50. When a problem actually needs frontier judgment, escalate with /model opus and Fable 5 takes over; drop back down when the hard part is done. The expensive model becomes what it should be at that price, a specialist you call in.

Swap anthropic/claude-fable-5 for anthropic/claude-opus-4-8 in the Opus slot if Fable 5's price makes you wince; the escalation pattern works at any tier.

Codex setup

Codex runs one model per session, but its profiles give you the same split at the invocation level: make the workhorse the default and keep a named escalation profile for Fable 5.

mkdir -p ~/.codex
cat > ~/.codex/config.toml <<'EOF'
# Default: the workhorse
model = "kimi/kimi-k2.7-code"
model_provider = "token_station"

[model_providers.token_station]
name = "token_station"
base_url = "https://models.bytefuture.ai/v1"
env_key = "TOKEN_STATION_API_KEY"
wire_api = "responses"

# Escalation: Fable 5 on demand
[profiles.deep]
model = "anthropic/claude-fable-5"
EOF

export TOKEN_STATION_API_KEY="gw-YOUR_TOKEN_STATION_KEY"

codex                  # routine work on K2.7 Code
codex --profile deep   # hard problems on Fable 5

Day to day you launch plain codex and pay workhorse rates. When a task deserves the frontier model, codex --profile deep brings in Fable 5 for that invocation only. Nothing else in the config moves.

OpenClaw setup

OpenClaw makes the split a first-class setting. Sub-agents inherit the caller's model unless agents.defaults.subagents.model says otherwise (docs), so Fable 5 can drive while every spawned sub-agent runs on K2.7 Code:

{
  "models": {
    "mode": "merge",
    "providers": {
      "token-station": {
        "baseUrl": "https://models.bytefuture.ai/v1",
        "apiKey": "${TOKEN_STATION_API_KEY}",
        "api": "anthropic-messages",
        "models": [
          {
            "id": "anthropic/claude-fable-5",
            "name": "Claude Fable 5 (Token Station)",
            "contextWindow": 1000000,
            "maxTokens": 128000
          },
          {
            "id": "kimi/kimi-k2.7-code",
            "name": "Kimi K2.7 Code (Token Station)",
            "contextWindow": 256000,
            "maxTokens": 32768
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "token-station/anthropic/claude-fable-5" },
      "subagents": { "model": "token-station/kimi/kimi-k2.7-code" }
    }
  }
}

The main agent keeps frontier judgment; the parallel fan-out (the part that burns tokens) bills at workhorse rates. To run the whole thing on K2.7 Code instead, point agents.defaults.model.primary at it; both models sit behind the same key either way.

Quirks worth knowing

Run the experiment

Give K2.7 Code the work your expensive model is overqualified for: subagent searches, test runs, boilerplate, summaries. Watch for a week where it holds up and where it drops the ball, then settle the split accordingly. The same Token Station key runs anthropic/claude-fable-5, anthropic/claude-opus-4-8, and kimi/kimi-k2.7-code side by side, so the comparison is built in.

Sign up at models.bytefuture.ai ($10 in free credit, no card) and find out whether a one-day-old open-weights model can carry half your agent's workload at a tenth of the price.


Share this post Post LinkedIn Facebook Hacker News Reddit