What was the 1-hour prompt cache TTL bug in Claude Code 2.1.129?

Claude Code was silently downgrading the 1-hour prompt cache TTL to 5 minutes. When you set cache_control with ttl: "1h", the request was rewritten to the 5-minute tier with no error and no log line. Reads outside the 5-minute window missed the cache. 2.1.129 fixed it. Audit traces from 2.1.122 to 2.1.128.

How does Claude Code 2.1.129's gateway model discovery setting work?

Gateway /v1/models discovery is now opt-in. Set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 in your shell environment to enable it. Without that env var, the /model picker uses the hardcoded model list, even when ANTHROPIC_BASE_URL points at a gateway. The auto behavior added in 2.1.126 was reverted because gateways with malformed /v1/models responses could break the picker.

What does the --plugin-url flag in Claude Code do?

The --plugin-url flag fetches a plugin .zip from a URL and loads it for the current session. It's a one-shot test mode, separate from --plugin-dir (local) and claude plugin install (permanent). Plugin authors can share a URL and any user can run claude --plugin-url to try it without installing. Added in 2.1.129.

What is the skillOverrides setting in Claude Code?

skillOverrides controls how installed skills appear to Claude and to you. Three values: off hides the skill from both Claude and the / picker, user-invocable-only hides it from Claude's proactive use but keeps it in /, and name-only collapses the description while keeping the name visible. Useful for trimming context bloat when many skills are installed. Added in 2.1.129.

Why does Ctrl+R search all prompts again in Claude Code?

Ctrl+R now defaults to searching all prompts across all projects, restoring pre-2.1.124 behavior. The history picker had silently changed scope to current-project-only sometime around 2.1.124, breaking muscle memory for long-time users. 2.1.129 reverts the default. Ctrl+S now narrows to the current project or session when you want that.

What was the EnterWorktree dropped-commits bug in Claude Code 2.1.128?

EnterWorktree was creating new branches from origin/ instead of local HEAD, silently dropping unpushed commits. If you had unpushed commits on your local main, the new worktree was missing them. The docs always said "from local HEAD" – 2.1.128 finally makes the code match. Audit worktrees from 2.1.121 to 2.1.127.

Were Claude Code 2.1.127 and 2.1.130 ever released?

No. Anthropic's official changelog at code.claude.com jumps from 2.1.126 (May 1) to 2.1.128 (May 4) to 2.1.129 (May 6) to 2.1.131 (May 6). 2.1.127 and 2.1.130 don't appear in the public changelog. Same pattern as 2.1.124 and 2.1.125 before them – likely internal builds that never shipped to users.

Should I update to Claude Code 2.1.131?

Yes. The 1-hour cache TTL fix alone justifies it if you cache aggressively. Three caveats: set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 if you relied on auto gateway discovery from 2.1.126; audit cache traces from 2.1.122 to 2.1.128; audit worktrees from EnterWorktree before 2.1.128.

Back to Blog

AI Development AI AI Agents Vibe Coding Developer Tools

Claude Code 2.1.131: the silent 1-hour prompt cache bug, plus 64 changes since 2.1.126

Alex Kim

May 7, 2026

13 min read

Claude Code 2.1.131: the silent 1-hour prompt cache bug, plus 64 changes since 2.1.126

If you set cache_control: { ttl: "1h" } anywhere in your Claude Code prompts and have been quietly paying for the 1-hour cache while wondering why your hit rate looked off – Claude Code 2.1.129 just fixed it. The TTL was silently downgrading to 5 minutes. Nobody could see it. The dashboards looked fine. The bills weren't.

That's the buried headline of today's release pair. 2.1.131 on May 6 ships two surgical fixes (a VS Code-on-Windows activation bug and a Mantle endpoint auth header). 2.1.129 the same day shipped 27 entries. 2.1.128 on May 4 shipped 35. 64 individual changes since 2.1.126 on May 1, and the most expensive one was invisible.

TL;DR

The money bug: 2.1.129 fixed the 1-hour prompt cache TTL silently downgrading to 5 minutes. If you cache, audit your traces.
The honest correction: 2.1.129 flipped the gateway /v1/models discovery from automatic (added in 2.1.126) to opt-in via CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1. The auto behavior was problematic enough to revert in five days.
The regression fix: 2.1.129 restored Ctrl+R to searching all prompts across all projects, matching pre-2.1.124 behavior. Ctrl+S now narrows.

The big one: 1-hour prompt cache silently downgrading to 5 minutes

The 1-hour prompt cache TTL was being downgraded to 5 minutes in flight, with no error and no log line. From 2.1.129's changelog: "Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes." That's the entire entry.

Here's what was happening: when you set cache_control: { type: "ephemeral", ttl: "1h" } on a prompt block, Claude Code was supposed to ask the API for the 1-hour cache tier (which costs more per cache write but pays back massively on cache reads over the hour). Instead, the request was being silently rewritten to the 5-minute tier. You paid for the write either way. The reads either landed inside the 5-minute window (so you saw cache hits) or outside it (so you got cache misses you weren't expecting). Either way, the 1-hour tier you configured was never the tier you got.

If you've been using extended caching to amortize a long system prompt or a heavy tool definition block across hours of agent runs, this directly affected your cost-per-run math. Update, then audit your past traces if you can – the gap between configured-tier and actual-tier is the size of the bug.

A second cost-side fix landed in the same release: "Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model." The warning was a symptom of a real cache invalidation problem during effort or model switches, and false-positive warnings made it harder to spot the real ones.

The honest correction: gateway model discovery flipped to opt-in

In the 2.1.126 post I called the new gateway-aware /model picker "the kind of fix that doesn't make headlines but stops a hundred GitHub issues from being filed." Five days later, Anthropic walked it back.

From 2.1.129:

Gateway /v1/models discovery for the /model picker is now opt-in via CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 (was automatic in 2.1.126–2.1.128).

If your ANTHROPIC_BASE_URL points at an Anthropic-compatible gateway (LiteLLM, an internal proxy, a corporate firewall pass-through), the /model picker no longer queries the gateway's /v1/models endpoint by default. You explicitly opt in by setting CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 in your environment.

The changelog doesn't say why the reversal landed, but the shape of the fix tells you something: making it opt-in protects users whose gateways either don't implement /v1/models, return malformed responses, or expose models the user shouldn't see in the picker. Auto-discovery was good when it worked and bad when it didn't, with no easy way to tell which side you were on. Opt-in puts the choice back in your hands.

If your team relied on the auto behavior since 2.1.126, set the env var in your shell rc file and move on. If you didn't know it was happening, it's now off and your /model picker reverted to the hardcoded list.

Ctrl+R back to all-prompts (regression finally fixed)

Ctrl+R history picker now defaults to searching all prompts across all projects (matching pre-2.1.124 behavior); press Ctrl+S to narrow to current project/session.

If you've been muttering about Ctrl+R since 2.1.124, this is the fix. Some time around 2.1.124, the history picker silently changed scope from "all prompts everywhere" to "current project only," which broke the muscle memory of every long-time user who relies on Ctrl+R to find a prompt they used three projects ago.

2.1.129 reverts the default. Ctrl+R now searches across all your projects and sessions like it always did. Ctrl+S narrows to the current scope, on demand. Two keystrokes, both useful, neither surprising you.

This is the kind of fix that's invisible to anyone who didn't know what changed but immediately better for everyone who did.

EnterWorktree no longer drops your unpushed commits

A quietly nasty bug got fixed in 2.1.128:

EnterWorktree now creates the new branch from local HEAD as documented, instead of origin/<default-branch> – unpushed commits are no longer dropped.

If you use EnterWorktree to create a sibling worktree for a side branch, the old behavior was: take the branch off origin/main regardless of where your current HEAD was. Which means if you had unpushed commits on your local main (or any branch), the new worktree silently started from a state that didn't include them. Switch into the worktree, start working, and your unpushed commits weren't there.

The docs always said "from local HEAD." The code didn't match the docs. Now they do. If you've ever been confused why a worktree branched from "main" was missing changes you knew you'd made, this is why.

Audit any worktrees created on 2.1.121–2.1.127 if you're not sure what state they're in. After 2.1.128, they branch from where you actually are.

`--plugin-url` for try-before-install

New in 2.1.129:

Added --plugin-url <url> flag to fetch a plugin .zip archive from a URL for the current session.

Combined with 2.1.128's --plugin-dir accepting .zip archives in addition to directories, the plugin loading story now covers three scopes:

Permanent install: claude plugin install <name>
Local-dir test: --plugin-dir ./my-plugin/ or --plugin-dir ./my-plugin.zip
One-shot URL test: --plugin-url https://example.com/my-plugin.zip

The URL form is the one you've been waiting for if you're a plugin author. Drop a .zip somewhere accessible (S3, GitHub release, a CDN), share the URL, and any user can claude --plugin-url <url> to try it for one session without committing to install. Beta testing for plugins just got dramatically less friction.

The headless --output-format stream-json improvement from 2.1.128 also matters here: init.plugin_errors now includes --plugin-dir load failures, not just dependency demotions. If you're scripting plugin testing in CI, the failure mode is now visible.

`skillOverrides` for shaping skill visibility

Also new in 2.1.129:

skillOverrides setting: off hides from model and /, user-invocable-only hides from model only, name-only collapses description.

Three values, three different problems solved:

off – the skill is hidden from both Claude (it can't reach for it) and you (it doesn't show in /). Useful for skills you've installed but don't want active right now.
user-invocable-only – the skill is hidden from Claude's proactive use but still appears in /. You can invoke it explicitly when you want it; Claude won't pull it on its own.
name-only – the skill name is visible but its description is collapsed. Reduces context bloat when you have many skills installed but only a few that need full descriptions in the system prompt at any moment.

For anyone running with 20+ skills installed, name-only alone is a meaningful context-window saver. For teams that install skills per-developer but want consistent default behavior across the org, user-invocable-only lets you ship the skill without it activating proactively.

Sessions on 1M-context models stop falsely blocking

A 2.1.128 fix that bit anyone running on the 1M-context Claude models:

Fixed sessions on 1M-context models with a smaller autocompact window being falsely blocked with "Prompt is too long" before reaching the actual API limit.

The autocompact window (the threshold at which Claude Code automatically compacts the conversation to free up tokens) was being checked as if it were the API hard limit. So if your autocompact was set to, say, 200k and you'd loaded a 250k-token context on a 1M-window model, you'd see "Prompt is too long" and the session would refuse to continue – even though the actual API limit was 1M and you had ~750k headroom.

After 2.1.128, the autocompact window correctly triggers compaction without blocking the session. If you've been bouncing off "Prompt is too long" on Opus 4.7 with the 1M window enabled, that's why.

Mac sleep, take two: the OAuth refresh race

The 2.1.126 release fixed three Mac-sleep bugs around stream-idle timeouts. 2.1.129 finishes the job:

Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions.

When your Mac woke up, multiple Claude Code sessions running at the same time (terminal tabs, IDE integrations, background agents) could all try to refresh their OAuth tokens at the same instant. The race meant one refresh could invalidate the token mid-flight for the others – and the others would then all silently log out together.

If you've been opening your laptop after a meeting and finding all your Claude Code sessions logged out simultaneously, that's the bug. Single-session users mostly didn't see it. Anyone running 3+ tabs did, and probably blamed the wifi.

A related fix in 2.1.129: "Fixed server-managed settings policy not applying for enterprise/team users whose stored OAuth credentials lacked the user:inference scope." Enterprise admins should re-test policy enforcement after the upgrade – previously-silent failures will now actually apply.

VS Code on Windows works again (2.1.131)

The marquee fix in 2.1.131:

Fixed VS Code extension failing to activate on Windows due to a hardcoded build path in the bundled SDK (createRequire polyfill bug).

If you're on Windows and the Claude Code VS Code extension stopped activating in some recent build, this is the unblock. The bundled SDK had a hardcoded path that worked on macOS and Linux but tripped a createRequire polyfill bug on Windows during extension activation. Two-line release, but if it was hitting you, it was hitting you hard.

The other 2.1.131 entry: "Fixed Mantle endpoint authentication failing with missing x-api-key header." Internal/enterprise Mantle deployments only – if you don't know what that is, you're not affected.

Other money/cost bugs that bit silently

Three more 2.1.128 fixes worth flagging because they all hit cost or context budgets:

Fix	Impact
Sub-agent progress summaries missing the prompt cache	~3× reduction in `cache_creation` for sub-agent runs. Anyone running long sub-agent loops was burning ~3× more on cache writes than necessary.
Sub-agent summaries firing repeatedly while the sub-agent transcript is static	Idle sub-agents were generating repeated summary calls, adding token cost with no information. The fix caps the worst case.
`/context` dumping its rendered ASCII visualization grid into the conversation	Wasted ~1.6k tokens per `/context` call. Whatever your usage patterns, that adds up.

If you run agent workflows with sub-agents, especially long-running ones, the upgrade is mandatory for cost reasons alone. Pre-2.1.128 cost estimates aren't reliable.

Smaller fixes worth knowing

A grab-bag from across 2.1.128 and 2.1.129 worth noting if you've hit any of these:

Parallel shell tool calls no longer cancel siblings on a failed read-only command (2.1.128) – grep, git diff, ls failing in parallel was killing the whole batch
Crash loop when piping >10 MB to claude -p via stdin (2.1.128) – fixed
MCP tool results dropping images when the server returns both structured content and content blocks (2.1.128) – fixed
/plugin update never detecting new versions of npm-sourced plugins (2.1.128) – the headline bug for anyone managing plugins at scale
Bedrock default model resolving to global.* instead of the region-appropriate prefix (2.1.128) – enterprise Bedrock deployments were getting the wrong default
Vim NORMAL mode Space now moves the cursor right (2.1.128) – matches standard vim. If you're a vim user this has been bugging you for weeks.
Stale installed_plugins.json entries pointing at deleted cache directories polluting PATH (2.1.128) – the kind of leak that gets weirder over time
MCP stdio servers receiving corrupted arguments when CLAUDE_CODE_SHELL_PREFIX is set and an argument contains spaces (2.1.128) – niche but extremely confusing if it hit you
Bash(mkdir *), Bash(touch *) and similar allow rules not honored for in-project paths (2.1.129) – fixed
deniedMcpServers patterns with a *:// scheme wildcard not matching mixed-case hostnames (2.1.129) – fixed
External-editor handoff (Ctrl+G) blanking the conversation history above the prompt (2.1.129) – fixed
/branch success message not including the new branch's session id for /resume (2.1.129) – fixed
API errors with unrecognized 400 status codes showing raw JSON instead of the underlying error message (2.1.129) – fixed
Policy refusal error messages now include the API Request ID (2.1.129) – makes support debugging actually possible

Three releases at a glance

Version	Date	Theme	Standouts
2.1.131	May 6	Hotfix (2 changes)	VS Code on Windows activation, Mantle endpoint auth header
2.1.129	May 6	Hardening + reversal (27 changes)	1-hour cache TTL silent downgrade fix, gateway discovery flipped to opt-in, Ctrl+R back to all-prompts, `--plugin-url`, `skillOverrides`, OAuth wake-from-sleep race
2.1.128	May 4	UX + cost (35 changes)	EnterWorktree dropped-commits fix, sessions on 1M-context unblocked, sub-agent cache fixes, `--plugin-dir` accepts `.zip`, `--channels` with console auth

2.1.127 and 2.1.130 don't appear in the public changelog. Same pattern as 2.1.124 and 2.1.125 before them – likely internal builds that didn't ship.

Should you update?

Yes. Three caveats:

If you've been running on auto-detected gateway model discovery since 2.1.126, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 in your shell environment before upgrading. Otherwise your /model picker silently reverts to the hardcoded list.
If you cache aggressively with the 1-hour TTL, audit your past traces. Anything cached on 2.1.122–2.1.128 was on the 5-minute tier regardless of what you configured. Cost models built off that data are wrong.
If you have unpushed commits and use EnterWorktree, audit any worktrees created before 2.1.128. They may have branched from origin/<default> instead of where you actually were.

For everyone else: claude --version, confirm you're on 2.1.131, and move on.

The arc from 2.1.121 through 2.1.131 is consistent. Anthropic shipped Opus 4.7 on April 17, and the 19 days since have been almost entirely about hardening the runtime around it – memory leaks, OAuth resilience, sleep recovery, managed settings, gateway integration, plugin ergonomics, prompt caching. None of these are flashy. All of them are exactly what a model migration needs.

The 1-hour cache TTL fix in particular is a reminder that the most expensive bugs in production are the ones you can't see. Audit your traces.

What's next

If you're getting Claude Code working in a real environment – SSH boxes, devcontainers, behind a corporate proxy, on Bedrock or Vertex – the Production Claude Code series is the systematic version of what these releases are quietly enabling. Episode 1 covered the 1M context window. The cache-TTL story in this release lands directly in episode 3 (caching for cost), which is up next.

Until then: update, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 if you need it, audit your cache traces, and confirm your worktrees branched from where you thought they did.

#claude-code#anthropic#Release Notes#AI Development#Prompt Caching#MCP#Plugins#OAuth

Live Workshop

Production-Grade Claude Code in 5 Days

Set up Claude Code the right way – from someone who ships with it daily.

$297$497Early BirdNext cohort: June 2026 Cohort

Enroll Now

100% satisfaction guarantee. Full refund if you're not happy after the first session.

AI DevelopmentAI

Anthropic just doubled Claude Code's rate limits – and signed for every GPU in SpaceX's Colossus 1 data center to back it up

Anthropic doubled Claude Code's 5-hour rate limits, removed peak-hours throttling, and signed for SpaceX's Colossus 1 data center – 220,000 NVIDIA GPUs, 300+ MW, online in one month.

May 7, 2026·

8 min

Community

Every WotAI live session is now public on YouTube

The full WotAI Skool live session archive is now public on YouTube – n8n workflows, Claude Code breakdowns, MCP integrations, AI agent architecture, three calls a week.

May 6, 2026·

5 min

AI Development

Claude Code 2.1.126: OAuth that works over SSH, sessions that survive Mac sleep, and a managed-settings security fix

Claude Code 2.1.126 lands 33 fixes. OAuth login finally works over SSH, WSL2, and containers. Sessions survive Mac sleep. claude project purge ships. Plus a managed-sandbox enforcement bug fixed.

May 1, 2026·

12 min