Back to Blog

Claude Code 2.1.131: the silent 1-hour prompt cache bug, plus 64 changes since 2.1.126

Alex Kim
13 min read
Claude Code 2.1.131: the silent 1-hour prompt cache bug, plus 64 changes since 2.1.126

If you set cache_control: { ttl: "1h" } anywhere in your Claude Code prompts and have been quietly paying for the 1-hour cache while wondering why your hit rate looked off – Claude Code 2.1.129 just fixed it. The TTL was silently downgrading to 5 minutes. Nobody could see it. The dashboards looked fine. The bills weren't.

That's the buried headline of today's release pair. 2.1.131 on May 6 ships two surgical fixes (a VS Code-on-Windows activation bug and a Mantle endpoint auth header). 2.1.129 the same day shipped 27 entries. 2.1.128 on May 4 shipped 35. 64 individual changes since 2.1.126 on May 1, and the most expensive one was invisible.

TL;DR

  • The money bug: 2.1.129 fixed the 1-hour prompt cache TTL silently downgrading to 5 minutes. If you cache, audit your traces.
  • The honest correction: 2.1.129 flipped the gateway /v1/models discovery from automatic (added in 2.1.126) to opt-in via CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1. The auto behavior was problematic enough to revert in five days.
  • The regression fix: 2.1.129 restored Ctrl+R to searching all prompts across all projects, matching pre-2.1.124 behavior. Ctrl+S now narrows.

The big one: 1-hour prompt cache silently downgrading to 5 minutes

The 1-hour prompt cache TTL was being downgraded to 5 minutes in flight, with no error and no log line. From 2.1.129's changelog: "Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes." That's the entire entry.

Here's what was happening: when you set cache_control: { type: "ephemeral", ttl: "1h" } on a prompt block, Claude Code was supposed to ask the API for the 1-hour cache tier (which costs more per cache write but pays back massively on cache reads over the hour). Instead, the request was being silently rewritten to the 5-minute tier. You paid for the write either way. The reads either landed inside the 5-minute window (so you saw cache hits) or outside it (so you got cache misses you weren't expecting). Either way, the 1-hour tier you configured was never the tier you got.

If you've been using extended caching to amortize a long system prompt or a heavy tool definition block across hours of agent runs, this directly affected your cost-per-run math. Update, then audit your past traces if you can – the gap between configured-tier and actual-tier is the size of the bug.

A second cost-side fix landed in the same release: "Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model." The warning was a symptom of a real cache invalidation problem during effort or model switches, and false-positive warnings made it harder to spot the real ones.

The honest correction: gateway model discovery flipped to opt-in

In the 2.1.126 post I called the new gateway-aware /model picker "the kind of fix that doesn't make headlines but stops a hundred GitHub issues from being filed." Five days later, Anthropic walked it back.

From 2.1.129:

Gateway /v1/models discovery for the /model picker is now opt-in via CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 (was automatic in 2.1.126–2.1.128).

If your ANTHROPIC_BASE_URL points at an Anthropic-compatible gateway (LiteLLM, an internal proxy, a corporate firewall pass-through), the /model picker no longer queries the gateway's /v1/models endpoint by default. You explicitly opt in by setting CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 in your environment.

The changelog doesn't say why the reversal landed, but the shape of the fix tells you something: making it opt-in protects users whose gateways either don't implement /v1/models, return malformed responses, or expose models the user shouldn't see in the picker. Auto-discovery was good when it worked and bad when it didn't, with no easy way to tell which side you were on. Opt-in puts the choice back in your hands.

If your team relied on the auto behavior since 2.1.126, set the env var in your shell rc file and move on. If you didn't know it was happening, it's now off and your /model picker reverted to the hardcoded list.

Ctrl+R back to all-prompts (regression finally fixed)

Ctrl+R history picker now defaults to searching all prompts across all projects (matching pre-2.1.124 behavior); press Ctrl+S to narrow to current project/session.

If you've been muttering about Ctrl+R since 2.1.124, this is the fix. Some time around 2.1.124, the history picker silently changed scope from "all prompts everywhere" to "current project only," which broke the muscle memory of every long-time user who relies on Ctrl+R to find a prompt they used three projects ago.

2.1.129 reverts the default. Ctrl+R now searches across all your projects and sessions like it always did. Ctrl+S narrows to the current scope, on demand. Two keystrokes, both useful, neither surprising you.

This is the kind of fix that's invisible to anyone who didn't know what changed but immediately better for everyone who did.

EnterWorktree no longer drops your unpushed commits

A quietly nasty bug got fixed in 2.1.128:

EnterWorktree now creates the new branch from local HEAD as documented, instead of origin/<default-branch> – unpushed commits are no longer dropped.

If you use EnterWorktree to create a sibling worktree for a side branch, the old behavior was: take the branch off origin/main regardless of where your current HEAD was. Which means if you had unpushed commits on your local main (or any branch), the new worktree silently started from a state that didn't include them. Switch into the worktree, start working, and your unpushed commits weren't there.

The docs always said "from local HEAD." The code didn't match the docs. Now they do. If you've ever been confused why a worktree branched from "main" was missing changes you knew you'd made, this is why.

Audit any worktrees created on 2.1.121–2.1.127 if you're not sure what state they're in. After 2.1.128, they branch from where you actually are.

--plugin-url for try-before-install

New in 2.1.129:

Added --plugin-url <url> flag to fetch a plugin .zip archive from a URL for the current session.

Combined with 2.1.128's --plugin-dir accepting .zip archives in addition to directories, the plugin loading story now covers three scopes:

  • Permanent install: claude plugin install <name>
  • Local-dir test: --plugin-dir ./my-plugin/ or --plugin-dir ./my-plugin.zip
  • One-shot URL test: --plugin-url https://example.com/my-plugin.zip

The URL form is the one you've been waiting for if you're a plugin author. Drop a .zip somewhere accessible (S3, GitHub release, a CDN), share the URL, and any user can claude --plugin-url <url> to try it for one session without committing to install. Beta testing for plugins just got dramatically less friction.

The headless --output-format stream-json improvement from 2.1.128 also matters here: init.plugin_errors now includes --plugin-dir load failures, not just dependency demotions. If you're scripting plugin testing in CI, the failure mode is now visible.

skillOverrides for shaping skill visibility

Also new in 2.1.129:

skillOverrides setting: off hides from model and /, user-invocable-only hides from model only, name-only collapses description.

Three values, three different problems solved:

  • off – the skill is hidden from both Claude (it can't reach for it) and you (it doesn't show in /). Useful for skills you've installed but don't want active right now.
  • user-invocable-only – the skill is hidden from Claude's proactive use but still appears in /. You can invoke it explicitly when you want it; Claude won't pull it on its own.
  • name-only – the skill name is visible but its description is collapsed. Reduces context bloat when you have many skills installed but only a few that need full descriptions in the system prompt at any moment.

For anyone running with 20+ skills installed, name-only alone is a meaningful context-window saver. For teams that install skills per-developer but want consistent default behavior across the org, user-invocable-only lets you ship the skill without it activating proactively.

Sessions on 1M-context models stop falsely blocking

A 2.1.128 fix that bit anyone running on the 1M-context Claude models:

Fixed sessions on 1M-context models with a smaller autocompact window being falsely blocked with "Prompt is too long" before reaching the actual API limit.

The autocompact window (the threshold at which Claude Code automatically compacts the conversation to free up tokens) was being checked as if it were the API hard limit. So if your autocompact was set to, say, 200k and you'd loaded a 250k-token context on a 1M-window model, you'd see "Prompt is too long" and the session would refuse to continue – even though the actual API limit was 1M and you had ~750k headroom.

After 2.1.128, the autocompact window correctly triggers compaction without blocking the session. If you've been bouncing off "Prompt is too long" on Opus 4.7 with the 1M window enabled, that's why.

Mac sleep, take two: the OAuth refresh race

The 2.1.126 release fixed three Mac-sleep bugs around stream-idle timeouts. 2.1.129 finishes the job:

Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions.

When your Mac woke up, multiple Claude Code sessions running at the same time (terminal tabs, IDE integrations, background agents) could all try to refresh their OAuth tokens at the same instant. The race meant one refresh could invalidate the token mid-flight for the others – and the others would then all silently log out together.

If you've been opening your laptop after a meeting and finding all your Claude Code sessions logged out simultaneously, that's the bug. Single-session users mostly didn't see it. Anyone running 3+ tabs did, and probably blamed the wifi.

A related fix in 2.1.129: "Fixed server-managed settings policy not applying for enterprise/team users whose stored OAuth credentials lacked the user:inference scope." Enterprise admins should re-test policy enforcement after the upgrade – previously-silent failures will now actually apply.

VS Code on Windows works again (2.1.131)

The marquee fix in 2.1.131:

Fixed VS Code extension failing to activate on Windows due to a hardcoded build path in the bundled SDK (createRequire polyfill bug).

If you're on Windows and the Claude Code VS Code extension stopped activating in some recent build, this is the unblock. The bundled SDK had a hardcoded path that worked on macOS and Linux but tripped a createRequire polyfill bug on Windows during extension activation. Two-line release, but if it was hitting you, it was hitting you hard.

The other 2.1.131 entry: "Fixed Mantle endpoint authentication failing with missing x-api-key header." Internal/enterprise Mantle deployments only – if you don't know what that is, you're not affected.

Other money/cost bugs that bit silently

Three more 2.1.128 fixes worth flagging because they all hit cost or context budgets:

FixImpact
Sub-agent progress summaries missing the prompt cache~3× reduction in cache_creation for sub-agent runs. Anyone running long sub-agent loops was burning ~3× more on cache writes than necessary.
Sub-agent summaries firing repeatedly while the sub-agent transcript is staticIdle sub-agents were generating repeated summary calls, adding token cost with no information. The fix caps the worst case.
/context dumping its rendered ASCII visualization grid into the conversationWasted ~1.6k tokens per /context call. Whatever your usage patterns, that adds up.

If you run agent workflows with sub-agents, especially long-running ones, the upgrade is mandatory for cost reasons alone. Pre-2.1.128 cost estimates aren't reliable.

Smaller fixes worth knowing

A grab-bag from across 2.1.128 and 2.1.129 worth noting if you've hit any of these:

  • Parallel shell tool calls no longer cancel siblings on a failed read-only command (2.1.128) – grep, git diff, ls failing in parallel was killing the whole batch
  • Crash loop when piping >10 MB to claude -p via stdin (2.1.128) – fixed
  • MCP tool results dropping images when the server returns both structured content and content blocks (2.1.128) – fixed
  • /plugin update never detecting new versions of npm-sourced plugins (2.1.128) – the headline bug for anyone managing plugins at scale
  • Bedrock default model resolving to global.* instead of the region-appropriate prefix (2.1.128) – enterprise Bedrock deployments were getting the wrong default
  • Vim NORMAL mode Space now moves the cursor right (2.1.128) – matches standard vim. If you're a vim user this has been bugging you for weeks.
  • Stale installed_plugins.json entries pointing at deleted cache directories polluting PATH (2.1.128) – the kind of leak that gets weirder over time
  • MCP stdio servers receiving corrupted arguments when CLAUDE_CODE_SHELL_PREFIX is set and an argument contains spaces (2.1.128) – niche but extremely confusing if it hit you
  • Bash(mkdir *), Bash(touch *) and similar allow rules not honored for in-project paths (2.1.129) – fixed
  • deniedMcpServers patterns with a *:// scheme wildcard not matching mixed-case hostnames (2.1.129) – fixed
  • External-editor handoff (Ctrl+G) blanking the conversation history above the prompt (2.1.129) – fixed
  • /branch success message not including the new branch's session id for /resume (2.1.129) – fixed
  • API errors with unrecognized 400 status codes showing raw JSON instead of the underlying error message (2.1.129) – fixed
  • Policy refusal error messages now include the API Request ID (2.1.129) – makes support debugging actually possible

Three releases at a glance

VersionDateThemeStandouts
2.1.131May 6Hotfix (2 changes)VS Code on Windows activation, Mantle endpoint auth header
2.1.129May 6Hardening + reversal (27 changes)1-hour cache TTL silent downgrade fix, gateway discovery flipped to opt-in, Ctrl+R back to all-prompts, --plugin-url, skillOverrides, OAuth wake-from-sleep race
2.1.128May 4UX + cost (35 changes)EnterWorktree dropped-commits fix, sessions on 1M-context unblocked, sub-agent cache fixes, --plugin-dir accepts .zip, --channels with console auth

2.1.127 and 2.1.130 don't appear in the public changelog. Same pattern as 2.1.124 and 2.1.125 before them – likely internal builds that didn't ship.

Should you update?

Yes. Three caveats:

  1. If you've been running on auto-detected gateway model discovery since 2.1.126, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 in your shell environment before upgrading. Otherwise your /model picker silently reverts to the hardcoded list.
  2. If you cache aggressively with the 1-hour TTL, audit your past traces. Anything cached on 2.1.122–2.1.128 was on the 5-minute tier regardless of what you configured. Cost models built off that data are wrong.
  3. If you have unpushed commits and use EnterWorktree, audit any worktrees created before 2.1.128. They may have branched from origin/<default> instead of where you actually were.

For everyone else: claude --version, confirm you're on 2.1.131, and move on.

The arc from 2.1.121 through 2.1.131 is consistent. Anthropic shipped Opus 4.7 on April 17, and the 19 days since have been almost entirely about hardening the runtime around it – memory leaks, OAuth resilience, sleep recovery, managed settings, gateway integration, plugin ergonomics, prompt caching. None of these are flashy. All of them are exactly what a model migration needs.

The 1-hour cache TTL fix in particular is a reminder that the most expensive bugs in production are the ones you can't see. Audit your traces.

What's next

If you're getting Claude Code working in a real environment – SSH boxes, devcontainers, behind a corporate proxy, on Bedrock or Vertex – the Production Claude Code series is the systematic version of what these releases are quietly enabling. Episode 1 covered the 1M context window. The cache-TTL story in this release lands directly in episode 3 (caching for cost), which is up next.

Until then: update, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 if you need it, audit your cache traces, and confirm your worktrees branched from where you thought they did.

#claude-code#anthropic#Release Notes#AI Development#Prompt Caching#MCP#Plugins#OAuth
Live Workshop

Production-Grade Claude Code in 5 Days

Set up Claude Code the right way – from someone who ships with it daily.

$297$497Early BirdNext cohort: June 2026 Cohort

100% satisfaction guarantee. Full refund if you're not happy after the first session.