Back to Blog

Claude Code's /code-review vs /ce-code-review: when each one wins

Alex Kim
10 min read
Claude Code's /code-review vs /ce-code-review: when each one wins

Last updated: May 28, 2026

TL;DR

I run the compound-engineering plugin on everything. Last week I noticed Claude Code ships a native /code-review and /simplify that look like duplicates of CE's /ce-code-review and /ce-simplify-code, so I read both source files to figure out when each one actually wins.

They're not duplicates. Native is one engine with an effort knob. CE is a multi-agent persona pipeline with explicit modes for skill-to-skill orchestration. And buried in line 36 of the CE skill is the thing nobody talks about: if you ask CE for a "quick" review, it calls native /review and stops. They're built to layer.

Here's the breakdown.

The architecture difference

Native /code-review is one engine with an effort dial. CE /ce-code-review fans out to 6 always-on reviewer personas, plus conditional ones (security, performance, API contract, data migration, reliability, adversarial, Swift, frontend races, deployment verification) that only spawn when your diff actually touches that surface.

That's the whole game. Everything else falls out of that choice.

/code-review: what each one actually does

Native is part of the CLI binary, so you can't read the source. What you can know from the description:

  • One pass, scaled by an effort flag. low and medium produce fewer high-confidence findings. high and max go broader and may surface uncertain ones. ultra is the deep multi-agent review that runs in Anthropic's cloud (this is the billed feature, formerly /ultrareview).
  • --comment posts findings as inline PR comments. --fix applies them to your working tree.
  • Covers correctness plus reuse, simplification, and efficiency cleanups in a single pass.

CE is a 900-line markdown file. You can read it, fork it, and customize it. The shape:

There are 6 always-on reviewer personas (correctness, testing, maintainability, project-standards, agent-native, learnings-researcher). Whatever your diff touches, those 6 always run. On top of that, CE selects from a catalog of conditional reviewers based on the actual file changes. A small config change might trigger zero conditionals and run 6 reviewers total. A Rails auth feature might trigger security plus reliability plus adversarial and run 9.

Four modes ship in the box. interactive is the default. autofix applies only deterministic safe fixes silently and returns a structured residual list. report-only is strictly read-only, safe to parallelize. headless is the skill-to-skill mode that returns structured findings as text so other skills can route them. That last one is the load-bearing one – it's why /ce-work can run review automatically at the end of every implementation cycle.

Severity (P0 to P3) is separate from routing. CE has a second taxonomy called autofix_class: safe_auto, gated_auto, manual, advisory. The first one means the fixer can apply it. The second means a concrete fix exists but it changes behavior or contracts and shouldn't be applied automatically. So a finding can be P1 severity and gated_auto routing, which means real but don't auto-apply. Severity answers urgency. Routing answers who acts next.

Confidence anchors are the next layer. Findings get rated at 0, 25, 50, 75, or 100. Anything below 75 gets suppressed unless it's P0. When two independent personas flag the same fingerprint, the anchor bumps up. Cross-reviewer agreement is the strongest signal in the pipeline.

In headless, autofix, and the file-tickets path, every surviving finding gets re-checked by an independent validator agent. Findings the validator rejects are dropped.

There's also a model-tiering rule that matters for cost. Correctness, security, and adversarial inherit the session model, so on an Opus session they get Opus. Everything else is forced to Sonnet. Skip that override and an Opus CE run silently costs 3 to 4x.

The thing nobody talks about

CE's first stage is a quick-review short-circuit. If your argument string says "quick", "fast", or "light", CE explicitly calls the harness's native /review and stops. It does not spawn the multi-agent pipeline. They're designed to layer, with CE deferring to native for the cheap pass.

What that means in practice: you don't have to choose. Install CE and you don't lose native – you gain a wrapper that knows when to call it. Programmatic callers (mode:autofix, mode:report-only, mode:headless) skip the short-circuit and always get the full pipeline, because the short-circuit is a human-intent feature.

The cost question

CE always spawns at least 6 subagents per review. Native at low runs as one pass. On a default Sonnet session this is a small difference. On Opus, the gap widens because correctness, security, and adversarial inherit the session model.

The model-tiering rule is the reason CE stays affordable. It forces Sonnet on every persona that isn't one of those three. Skip it and the bill jumps. If you're running CE on Opus and your numbers look high, that's the first knob.

/simplify vs /ce-simplify-code

Native /simplify is literally described in the docs as "equivalent to /code-review --fix". Same engine, different default. One pass, applies fixes.

CE splits simplification into three parallel reviewer agents with non-overlapping mandates. A reuse reviewer searches the codebase for existing utilities that newly-written code is reinventing – duplicated helpers, hand-rolled string manipulation, ad-hoc type guards that ignore an existing branded type. A quality reviewer looks for redundant state, parameter sprawl, copy-paste with slight variation, leaky abstractions, stringly-typed code, unnecessary wrapper components, nested conditionals three-plus levels deep, narration comments, and dead code. An efficiency reviewer looks for unnecessary work, missed concurrency, hot-path bloat, recurring no-op state updates, time-of-check-to-time-of-use anti-patterns, unbounded data structures, and overly broad operations.

After the agents return, CE runs typecheck plus lint plus scoped tests and surfaces any failure with the failing check name. Behavior preservation is the explicit promise of the skill, and the verification step is how it earns it. Native --fix has no equivalent test-verification step.

CE also bakes in an explicit anti-simplification guard. The skill literally says "fewer lines is not the goal, faster comprehension is." Don't inline a helper that gives a concept a name. Don't merge unrelated logic. Check git blame before removing an abstraction. The guard exists because over-simplification is the most common failure mode of an automated simplifier.

When to use which

A practical decision matrix based on actual usage:

SituationBest choice
Routine diff, fast sanity passNative /code-review (or /ce-code-review with "quick" – it short-circuits to native)
Standard PR, want categorized findings plus routing/ce-code-review
Safety-critical surface (anything that hits live users or external systems)/ce-code-review – the autofix_class routing and validator pass earn their keep here
Inside another workflow (e.g., a post-implementation gate inside ce-work)/ce-code-review mode:headless – native has no programmatic mode
Just want fixes applied, no thinkingNative /code-review --fix or /simplify
Clean up a feature branch before opening a PR/ce-simplify-code – the test-verification step is the differentiator
Deep multi-agent cloud review you're willing to pay for/code-review ultra – the only thing in either ecosystem that hits the cloud reviewer fleet

The deeper cleavage

The deepest difference isn't capability. It's where they sit in a workflow.

Native /code-review is end-user-facing. You run it. You read the output. You ship.

CE /ce-code-review is end-user-facing and composable. Other skills call it. /ce-work runs review in headless mode after every implementation cycle. /ce-resolve-pr-feedback reads its run artifacts at /tmp/compound-engineering/ce-code-review/<run-id>/ and decides what to fix. The persona JSON schema, the autofix_class taxonomy, the confidence anchors – none of that exists for you to read. It exists for the next skill to consume.

If you're running a pipeline, you want CE. If you're running ad-hoc reviews, native is faster and shorter to think about. Both is the realistic answer for most people who ship every day.

Frequently asked questions

Does /ce-code-review replace Claude Code's native /code-review?

No. CE explicitly defers to native for quick reviews via its short-circuit logic. They're designed to layer. Install CE if you want the persona pipeline and orchestration hooks. Native is still there for the cheap pass.

What does /code-review ultra do that /ce-code-review doesn't?

/code-review ultra runs a deep multi-agent review in Anthropic's cloud (this is the billed feature formerly known as /ultrareview). It's a different fleet of agents, not a different prompt. CE runs entirely on your session. If you want the cloud-side deep review, use ultra. If you want a structured local pipeline, use CE.

How is /ce-simplify-code different from /simplify?

/simplify is /code-review --fix under the hood. CE splits simplification into three specialized agents (reuse, quality, efficiency) and runs typecheck plus lint plus scoped tests after applying fixes. The test-verification step is the key difference – it's how CE delivers the "preserves behavior" promise that simplification needs.

Will CE blow up my Anthropic bill on Opus?

It can, if you skip the model-override rule. CE forces Sonnet on every persona except correctness, security, and adversarial (which inherit the session model). On an Opus session, that override saves 3 to 4 times the cost. If you're running CE on Opus and bills look high, verify the override is firing.

What are CE's four modes for?

interactive is the human-facing default. autofix applies only deterministic safe fixes and returns a structured residual list. report-only is strictly read-only and safe to parallelize with other operations on the same checkout. headless is the skill-to-skill mode – it returns structured findings as text so other skills can route them. /ce-work calls review in headless mode after every implementation cycle.

What does autofix_class mean?

It's CE's routing taxonomy, separate from severity. safe_auto means the in-skill fixer can apply this. gated_auto means a concrete fix exists but it changes behavior or contracts and shouldn't be applied automatically. manual means hand-off to a downstream resolver. advisory is report-only. Severity tells you urgency. Routing tells you who acts next.

Where do CE's findings live after a review?

Each persona writes full JSON to /tmp/compound-engineering/ce-code-review/<run-id>/<reviewer-name>.json. Compact returns flow back to the orchestrator for merging. Other CE skills like /ce-resolve-pr-feedback read these artifacts to do their work without re-running the review.

Does CE work without a GitHub PR?

Yes. CE supports three diff sources: a PR number or URL, a branch name, or the current branch with no argument. It also accepts base:<ref> to bypass detection entirely – useful for skill-to-skill calls where the orchestrator already knows the diff base.

When should I run /ce-code-review on a draft PR?

Draft PRs are reviewed normally. Draft status is not a skip condition. Early feedback on in-progress work is the highest-value time to catch architectural issues, before the diff grows. CE will only skip if the PR is closed, merged, or judged trivial (lockfile-only bumps, automated chore commits).

What's the right default for a standard feature PR?

/ce-code-review on the branch before pushing. Read the synthesized table. Apply the safe_auto fixes interactively. Defer the gated_auto ones to a follow-up conversation. Open the PR. If the PR picks up review comments, run /ce-resolve-pr-feedback to handle them in one pass instead of round-tripping individually.


If you want to see this layered into a full pipeline, join 760+ builders on Skool – we hold 3 live calls a week for all members, free and paid, and the compound-engineering pipeline against real code comes up often. Or grab the newsletter for the weekly digest.

#claude-code#code-review#Compound Engineering#AI Coding#developer-tools
Live Workshop

Production-Grade Claude Code in 5 Days

Set up Claude Code the right way - from someone who ships with it daily.

$497Next cohort: June 2026 Cohort
Only 15 spots remaining

100% satisfaction guarantee. Full refund if you're not happy after the first session.