Claude Mastery
2026-04-11
⌨️ CLI POWER MOVE
🔥 New
v2.1.101 — The Biggest Bug-Fix Release Yet

Twenty-four hours after v2.1.100 (which was just a CHANGELOG update), Anthropic dropped v2.1.101 — the densest bug-fix release in Claude Code's history. One security patch, two performance fixes, and 35+ bug fixes across sessions, permissions, subagents, plugins, and UI. If you haven't updated: npm i -g @anthropic-ai/claude-code@latest.

The Security Fix

A command injection vulnerability in the POSIX which fallback used for LSP binary detection. If your system lacked a native which, an attacker could craft a binary name that executed arbitrary commands during LSP startup. Fixed by sanitizing input before shell execution.

Two Performance Fixes That Matter

1. Virtual scroller memory leak. Long sessions were retaining dozens of historical copies of the message list. If you've noticed Claude Code getting sluggish after 30+ minutes of active use — this is why. The fix drops old copies during scroll updates.

2. Hardcoded 5-minute request timeout. Every API call had a 5-minute ceiling regardless of your backend. If you're running local LLMs (LM Studio, Ollama), using extended thinking, or going through slow corporate gateways, you've hit this wall. The fix respects API_TIMEOUT_MS — set it to whatever your backend needs.

The Resume Overhaul

Eight separate --resume fixes in one release:

  • Dead-end branch anchoring — large sessions could resume on a dead branch instead of the live conversation
  • Subagent chain bridging — resume could accidentally jump into a subagent's conversation instead of the main chain
  • Missing file_path crash — persisted Edit/Write results without file_path crashed the loader
  • Stale worktreeclaude -w failed with "already exists" after unclean worktree cleanup
  • Narrow /resume picker — sessions from other projects were hidden by default
  • /btw transcript bloat — every /btw was writing the entire conversation to disk
  • --continue -p mismatch — headless sessions created with -p couldn't continue with -p
  • Windows Terminal preview — preview pane was unreachable

Other Highlights

  • OS CA certificate trust: Enterprise TLS proxies now work by default. Previously required manual cert configuration. Set CLAUDE_CODE_CERT_STORE=bundled to opt out.
  • /team-onboarding: Generates a teammate ramp-up guide from your local usage patterns. One command to onboard someone into your Claude Code workflow.
  • permissions.deny override fix: Previously, a PreToolUse hook returning permissionDecision: "ask" could downgrade a permissions.deny rule into a prompt instead of blocking. Fixed — deny now always wins.
  • Subagent MCP inheritance: Subagents now inherit MCP tools from dynamically-injected servers. If you've been wondering why your subagents couldn't call MCP tools that the lead session could — this is the fix.
  • Resilient settings: An unrecognized hook event name in settings.json no longer nukes your entire config file.

Why this matters: v2.1.101 fixes the three biggest agent-killer categories simultaneously: session corruption (8 resume fixes), resource exhaustion (memory leak + timeout), and security escalation (command injection + permission bypass). Update now.

⌨️ CLI POWER MOVE
🔥 New🔧 Try It
The Monitor Tool — From Polling Loops to Event-Driven Agents

Issue #8 covered v2.1.98's security patches. But the same release shipped something more transformative: the Monitor tool — a primitive that turns Claude Code from a poll-and-check agent into an event-driven one.

The Problem with Polling

Every agent builder has written this loop:

Run tests → check output → fix → run tests → check output → fix → ...

Each iteration is a full API call. Five cycles checking a 3-minute build = five context windows loaded, four of them producing nothing. You're paying for idle thinking.

How Monitor Works

Monitor runs a shell command and streams each stdout line as a notification into the conversation. Claude reacts only when something happens.

Monitor("Watch for test failures", "npm test 2>&1 | grep --line-buffered FAIL")

Four parameters:

  • description — label that appears with each notification
  • command — shell command whose stdout becomes the event stream
  • timeout_ms — default 300,000 (5 min), max 3,600,000 (1 hour)
  • persistent — boolean; stays alive until killed with TaskStop
graph LR
    A[Background Process] -->|stdout lines| B[Monitor Tool]
    B -->|batched notifications
200ms window| C[Claude Session] C -->|reacts only on events| D[Fix / Investigate] A -->|stderr| E[Log File]

Critical Gotchas

1. Always use grep --line-buffered. Without it, pipe buffering can delay notifications by minutes. The monitor reacts to stdout *lines* — if grep buffers its output (default behavior when stdout isn't a terminal), your events arrive in batches of 4KB instead of per-line.

2. Handle transient failures. A monitor that exits kills the notification stream. Use || true in poll loops or wrap with retry logic:

bash
while true; do kubectl logs -f deploy/api --since=10s 2>/dev/null | \
grep --line-buffered -E "ERROR|WARN" || sleep 2; done

3. Every stdout line is a conversation message. Verbose output triggers automatic shutdown. Filter aggressively — Monitor is for signals, not logs.

Token Economics: Monitor vs. Polling

Approach10-min build, checking every 2 minAPI callsWasted context
Polling loop5 full calls, 4 idle5~80%
Monitor + grep1 setup call + N event reactions1 + events~0%

The savings compound with build time. A 30-minute CI pipeline checked every 3 minutes = 10 idle API calls. Monitor: zero.

Three Production Patterns

Build watcher:

Monitor("CI status", "gh run watch --exit-status 2>&1 | grep --line-buffered -E 'completed|failed'")

Log sentinel:

Monitor("API errors", "journalctl -fu myservice | grep --line-buffered -i error")

File change reactor:

Monitor("Config changes", "inotifywait -m -e modify /etc/myapp/ 2>&1 | grep --line-buffered MODIFY")

Monitor vs. Related Tools

ToolPurposeNotification
run_in_backgroundOne-shot tasksSingle notification at completion
HooksValidate Claude's own actionsLifecycle events
/loopScheduled pollingTimer-based, full context reload
MonitorObserve external systemsPer-line stdout streaming

Why this matters: Monitor is the first Claude Code primitive that breaks the request-response cycle. Your agent doesn't need to ask "has anything changed?" — it gets told. This is the foundation for reactive, event-driven agent architectures.

🏗️ AGENT ARCHITECTURE
🔥 New🔬 Deep Dive
The Advisor Strategy — Opus as On-Demand Strategist

On April 9, Anthropic shipped the advisor tool — a new beta API primitive that pairs a fast executor model with Opus as an on-demand strategic advisor. All within a single API call. No orchestration code, no extra round trips. This changes agent economics.

The Core Idea

Most agentic work is mechanical: reading files, running commands, writing code. You don't need Opus for that. But every 5-10 turns, the agent hits a decision point: which approach to take, how to decompose a problem, whether the current path is working. That's where Opus earns its cost.

The advisor tool makes this pattern a first-class API primitive:

sequenceDiagram
    participant E as Executor (Sonnet 4.6)
    participant A as Advisor (Opus 4.6)
    participant T as Tools

    E->>T: Read files, explore codebase
    E->>A: "I've seen the code. What's the plan?"
    A-->>E: "1. Refactor X first 2. Then add Y 3. Watch for Z"
    E->>T: Implement step 1
    E->>T: Implement step 2
    E->>T: Run tests
    E->>A: "Tests pass. Am I done?"
    A-->>E: "Check edge case Z — step 3 isn't handled"
    E->>T: Fix edge case
    E->>T: Tests pass
    E->>E: Done

The API

Beta header: anthropic-beta: advisor-tool-2026-03-01

python
import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
betas=["advisor-tool-2026-03-01"],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
}
],
messages=[
{
"role": "user",
"content": "Refactor the auth module to use JWT tokens.",
}
],
)

The executor decides when to call the advisor — it's just another tool. When invoked:

  1. Executor emits server_tool_use with name: "advisor" and empty input
  2. Server runs a separate Opus inference on the full transcript
  3. Advisor response returns as advisor_tool_result
  4. Executor continues, informed by the advice

All within one /v1/messages request. The advisor sees everything: system prompt, tool definitions, all prior turns, all tool results.

Verified Benchmarks

ConfigurationSWE-bench MLBrowseCompCost vs. Sonnet Solo
Sonnet solo72.1%baseline
Sonnet + Opus advisor74.8% (+2.7pp)−11.9%
Haiku solo19.7%−85%
Haiku + Opus advisor41.2% (>2×)−85% vs Sonnet

The counterintuitive result: Sonnet + Opus advisor is both *better* and *cheaper* than Sonnet alone. The advisor's plan reduces total tool calls and conversation length, saving more executor tokens than the advisor consumes.

Caching: The clear_thinking Gotcha

Enable advisor-side caching for conversations with 3+ advisor calls:

python
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}

Critical: If you use extended thinking with clear_thinking set to anything other than "all", the advisor's transcript shifts each turn — causing cache misses on every call. The default keep: {type: "thinking_turns", value: 1} triggers this. Set keep: "all" to preserve advisor cache stability.

Cost Control

max_uses caps advisor calls per request. For conversation-level budgets, count client-side. When you hit your cap, remove the advisor tool from tools and strip all advisor_tool_result blocks from your message history — leaving them without the tool definition causes a 400 invalid_request_error.

The Conciseness Trick

One system prompt line cuts advisor output tokens 35-45%:

The advisor should respond in under 100 words and use enumerated steps, not explanations.

Advisor output (400-700 text tokens) is the biggest cost driver. This constraint forces the advisor to plan, not explain.

Recommended System Prompt Pattern

For coding tasks, prepend this to your executor's system prompt:

Call advisor BEFORE substantive work — before writing, before committing
to an interpretation, before building on an assumption. If the task
requires orientation first (finding files, fetching a source), do that,
then call advisor. Orientation is not substantive work.

Also call advisor:
- When you believe the task is complete (AFTER making deliverables durable)
- When stuck — errors recurring, approach not converging
- When considering a change of approach

This pattern produced the highest intelligence at near-Sonnet cost in Anthropic's internal evaluations.

Valid Model Pairs

ExecutorAdvisor
Haiku 4.5Opus 4.6
Sonnet 4.6Opus 4.6
Opus 4.6Opus 4.6

The advisor must be ≥ the executor's capability. Opus advising itself is valid — useful when you want a fresh perspective mid-generation.

Streaming Behavior

The advisor does not stream. When the executor calls the advisor, the output stream pauses. SSE ping keepalives fire every ~30 seconds during the pause. The full advisor_tool_result arrives in one content_block_start event, then executor streaming resumes.

Plan for this in UIs — show a "thinking..." indicator when you see server_tool_use with name: "advisor".

Why this matters: The advisor tool is the first official Anthropic primitive for model routing inside a single conversation. It replaces ad-hoc orchestration (routing between models in your own code) with a server-side mechanism where the executor decides when it needs help. For anyone building agents with claude -p or the SDK, this is the new default pattern for cost-quality optimization.

🧭 OPERATOR THINKING
🔬 Deep Dive🌿 Evergreen
Four Runtime Patterns from Claude Code's Source

A recent analysis of Claude Code's source code identified four implementation patterns that every agent builder should understand. These aren't features you configure — they're architectural decisions baked into the runtime that explain *why* Claude Code behaves the way it does. And they're patterns worth stealing for your own agents.

Pattern 1: Deferred Tool Loading

Claude Code ships with 50+ tools. Loading all their Zod schemas into the context window at startup would consume thousands of tokens on tools the model might never call.

The solution: Two-phase loading.

graph TD
    A[Startup] -->|tool names only| B[Context Window]
    B -->|model needs a tool| C[ToolSearch meta-tool]
    C -->|name parts: 10pts
search hints: 4pts
descriptions: 2pts| D[Relevance Scoring] D -->|top matches| E[Full Schemas via
tool_reference blocks] E -->|server-side expansion| F[No Extra Round Trip]

lazySchema() wraps each tool's Zod schema in a factory, deferring construction until first access. The ToolSearch meta-tool accepts queries and returns full schemas on demand, scored by name match (10 points), search hints (4 points), and description relevance (2 points). Direct lookup (select:ToolName) skips scoring entirely.

Results return as tool_reference blocks — an Anthropic API extension where the server expands the schema inline. No extra client round-trip required.

Steal this pattern when: Your agent has more than ~15 tools. Below that threshold, the context savings don't justify the extra inference hop.

Pattern 2: Diminishing Returns Detection

Claude Code doesn't just stop at token limits. It monitors output *quality* across continuations:

javascript
const isDiminishing =
tracker.continuationCount >= 3 &&
deltaSinceLastCheck < DIMINISHING_THRESHOLD &&
tracker.lastDeltaTokens < DIMINISHING_THRESHOLD;

The threshold: 3+ continuations where the last two each produced fewer than 500 new tokens. When triggered, the system flags diminishingReturns on the StopDecision object.

This distinguishes genuine budget exhaustion (the model has more to say but ran out of tokens) from spinning (the model is repeating itself, generating filler, or stuck in a loop). Without this check, a max_tokens continuation blindly extends — burning tokens on content that adds nothing.

Steal this pattern when: Building any agent with multi-turn continuations. Check if your last N rounds actually produced meaningful new content before continuing.

Pattern 3: Cache-Aware Context Compaction

Most agents compact context when it gets too large. Claude Code compacts *strategically*, based on whether the cache is likely still warm:

javascript
const gapMinutes =
(Date.now() - new Date(lastAssistant.timestamp).getTime()) / 60_000;
if (gapMinutes < config.gapThresholdMinutes) return null; // cache warm

If the gap between your last message and the current one is shorter than the cache TTL (5 minutes for Anthropic's prompt cache), the old tool results are still cached server-side. Compacting them saves token count but *forces a cache miss* — you'd pay more to re-read the compacted version than to let the cache serve the original.

When the gap exceeds the threshold, the cache is presumed cold. Now compaction pays off: stale tool results are stripped before sending, and the smaller payload avoids re-caching content that's about to become stale again.

For surgical removal, the system can use the cache_edits API to remove specific content blocks without invalidating the remaining cache prefix.

Steal this pattern when: Your agent has idle periods between interactions. Don't compact immediately — check if the cache is still warm first.

Pattern 4: Coalesced Background Memory Extraction

When you chat quickly, Claude Code extracts memories from your conversation in the background. But what happens when 5 messages arrive in rapid succession?

A naive implementation queues 5 extraction jobs. Claude Code uses a single-slot overwrite instead:

javascript
if (inProgress) {
pendingContext = { context, appendSystemMessage };
return; // overwrite, don't queue
}

Five rapid messages = exactly two extraction runs: one in-flight (started before the burst) and one final (with the latest context). The pendingContext slot is overwritten, not appended — each new message replaces the pending request rather than queuing behind it.

Additional safeguards:

  • UUID-based cursor tracking that survives message compaction
  • Mutual exclusion: skips extraction if the main agent just wrote to memory (prevents clobbering)
  • 60-second soft timeout during shutdown to let in-flight extraction finish

Steal this pattern when: Building any background processing pipeline that processes conversation events. A single-slot overwrite is almost always better than a full job queue for "process the latest state" workloads.

The Meta-Lesson

All four patterns share a theme: optimize for real cost, not abstract metrics. Don't count tokens — check if the cache is warm. Don't limit continuations by count — check if they're producing value. Don't process every event — process the latest one. The system optimizes for what actually costs money and time, not for what's easiest to measure.

🌐 ECOSYSTEM INTEL
🔥 New🔧 Try It
AgentLint — 68 Hook Rules, Zero LLM Calls

AgentLint hooks into Claude Code's lifecycle to evaluate 68 rules across 8 rule packs — all without making a single LLM call. Pure heuristic evaluation, millisecond latency, deterministic results.

Architecture

graph TD
    A[Claude Code Session] -->|lifecycle events| B[AgentLint Hooks]
    B --> C{Which events?}
    C -->|PreToolUse: Bash| D[Security + Quality Rules]
    C -->|PreToolUse: Edit/Write| E[Code Quality Rules]
    C -->|PostToolUse| F[Result Validation]
    C -->|UserPromptSubmit| G[Input Guardrails]
    C -->|SubagentStart/Stop| H[Agent Monitoring]
    C -->|Notification| I[Alert Rules]
    C -->|Stop| J[Session Cleanup]
    D & E & F & G & H & I & J --> K[Pass / Warn / Block]

AgentLint hooks into 7 Claude Code lifecycle events: PreToolUse (Bash, Edit, Write), PostToolUse, UserPromptSubmit, SubagentStart/Stop, Notification, and Stop. Every event triggers the relevant rule packs, evaluated locally in milliseconds.

The 8 Rule Packs

PackRulesActivationExamples
Universal19Always onSecrets detection, force-push prevention, destructive command warnings, test weakening detection
Quality7Auto-activeCommit message format, error handling removal, large diffs, dead imports
Python6pyproject.toml detectedSQL injection blocking, unsafe shell execution, bare except clauses
Frontend8package.json detectedAccessibility (alt text, form labels, touch targets), responsive patterns
React3React dependency detectedQuery loading states, empty state handling
SEO4Web project detectedPage metadata, OG tags, structured data
SecurityOpt-inManual enableBash file write blocking, network exfiltration, credential leakage
Autopilot18 (alpha)Manual enableProduction guards, dry-run requirements, cloud resource protection

Auto-detection reads your pyproject.toml and package.json to activate matching language packs. A Python/React project automatically gets Universal + Quality + Python + Frontend + React — no configuration needed.

Custom Rules

Drop a Python file in .agentlint/rules/:

python
from agentlint import Rule, Severity

class NoHardcodedSecrets(Rule):
name = "no-hardcoded-secrets"
severity = Severity.ERROR
events = ["PreToolUse:Write", "PreToolUse:Edit"]

def evaluate(self, event):
content = event.get("content", "")
patterns = [r"sk-[a-zA-Z0-9]{48}", r"ghp_[a-zA-Z0-9]{36}"]
for p in patterns:
if re.search(p, content):
return self.fail(f"Hardcoded secret detected: {p}")
return self.pass_rule()

Installation

bash
pip install agentlint && agentlint setup

setup writes hooks into .claude/settings.json automatically. Configuration lives in agentlint.yml with severity levels (strict, standard, relaxed) and allowlist patterns.

The Autopilot Angle

The alpha Autopilot pack (18 rules) is the most interesting for agent operators. It enforces production safety guardrails: require --dry-run before destructive operations, block cloud resource creation without confirmation, flag production database connections, and require explicit approval for any operation that could affect billing.

Why this matters: The "Walls Beat Signs" principle from issue #2 — mechanical enforcement beats instructional rules. AgentLint is walls: deterministic, millisecond, zero-LLM-cost guardrails that work even when Claude is under pressure and ignoring your CLAUDE.md. 68 rules, 16 stars, actively maintained at 125+ commits.

🌐 ECOSYSTEM INTEL
🔥 New
Twill.ai — Cloud Agents That Deliver Pull Requests

Twill.ai (YC S25, 67 HN points) runs autonomous coding agents in isolated cloud sandboxes and delivers the output as GitHub PRs. The interesting part isn't "cloud agents" — it's the architecture.

The 6-Step Pipeline

graph LR
    A[Task] --> B[Research]
    B --> C[Plan]
    C --> D[🚦 Human Approval Gate]
    D --> E[Implement]
    E --> F[AI Code Review]
    F --> G[Merge/PR]

The enforced approval gate between Plan and Implement is the design choice worth studying. Agents can research and plan autonomously, but they cannot write code until a human approves the plan. This inverts the common pattern where agents code first and humans review after — here, humans review the *approach* before any implementation happens.

Sandbox Lifecycle

Each task spins up an isolated sandbox: only the relevant code is cloned, builds and tests run inside it, then the sandbox is fully deleted after merge. No persistent agent environment means no accumulated state, no credential leakage across tasks, no "works on the agent's machine" drift.

You can SSH into running sandboxes with VS Code or Cursor while the agent works — collaborative debugging without sharing your local environment.

Parallel Agent Races

Run multiple agents on the same task for comparison. Two Sonnet instances working independently, or Sonnet vs. Codex on the same bug. The best PR wins. This is expensive but eliminates single-point-of-failure reasoning — if one agent takes a wrong turn, the other might not.

The Decision Framework

Use Twill when...Use local agents when...
Task is well-scoped (bug fix, dep update, test)Task requires deep codebase intuition
You want sandbox isolation by defaultYou need MCP tools and custom hooks
Team coordination (tasks from Slack, Linear)You're iterating interactively
Compliance requires ephemeral environmentsYou're on a Max plan and cost isn't a factor

Why this matters: Twill's approval-gate-before-code pattern is worth adopting even if you never use the product. The principle: let agents research and plan freely, but gate code generation on human approval of the approach. This catches wrong-direction work before it becomes a wasted PR.

🔬 PRACTICE LAB
🔧 Try It🔬 Deep Dive
Build an Advisor-Powered Code Review Pipeline

You'll build a code review agent that uses Sonnet 4.6 as the executor and Opus 4.6 as the advisor. The executor reads diffs and generates review comments. The advisor provides strategic guidance: which issues matter, what patterns to flag, when to approve.

Prerequisites

  • Anthropic API key with advisor beta access
  • Python 3.10+ with anthropic SDK installed
  • A Git repo with a recent diff to review

Step 1: Set Up the Advisor Tool

bash
pip install --upgrade anthropic
python
# review_agent.py
import anthropic
import subprocess
import sys

client = anthropic.Anthropic()

REVIEW_SYSTEM = """You are a senior code reviewer. You have access to an advisor
tool backed by a stronger model.

Call advisor BEFORE writing any review comments — after reading the diff,
get strategic guidance on what matters most.

Call advisor AGAIN before finalizing — to catch anything you missed.

The advisor should respond in under 100 words and use enumerated steps,
not explanations.

Review criteria: correctness, security, performance, maintainability.
Flag issues by severity: 🔴 critical, 🟡 warning, 🟢 suggestion."""

def get_diff(base="main"):
"""Get the diff between current branch and base."""
result = subprocess.run(
["git", "diff", f"{base}...HEAD"],
capture_output=True, text=True
)
if result.returncode != 0:
# Fallback to unstaged changes
result = subprocess.run(
["git", "diff"], capture_output=True, text=True
)
return result.stdout or "No changes found."

def review_with_advisor(diff: str) -> dict:
"""Run a review with Sonnet executor + Opus advisor."""
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}
],
messages=[
{
"role": "user",
"content": f"Review this diff:\n\n```diff\n{diff}\n```",
}
],
)
return response

def review_without_advisor(diff: str) -> dict:
"""Run a review with Sonnet only (for comparison)."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
system=REVIEW_SYSTEM,
messages=[
{
"role": "user",
"content": f"Review this diff:\n\n```diff\n{diff}\n```",
}
],
)
return response

Step 2: Add Cost Tracking

python
def extract_costs(response, has_advisor=False):
"""Extract token costs from response usage."""
usage = response.usage
costs = {
"executor_input": usage.input_tokens,
"executor_output": usage.output_tokens,
"executor_cache_read": getattr(usage, "cache_read_input_tokens", 0),
}

if has_advisor and hasattr(usage, "iterations"):
for iteration in usage.iterations:
if iteration.get("type") == "advisor_message":
costs["advisor_input"] = iteration["input_tokens"]
costs["advisor_output"] = iteration["output_tokens"]
costs["advisor_cache_read"] = iteration.get(
"cache_read_input_tokens", 0
)

return costs

def print_cost_comparison(with_advisor, without_advisor):
"""Compare costs between advisor and non-advisor runs."""
# Pricing per 1M tokens (as of April 2026)
SONNET_INPUT = 3.0
SONNET_OUTPUT = 15.0
OPUS_INPUT = 15.0
OPUS_OUTPUT = 75.0

def calc_cost(costs, has_advisor=False):
total = (
costs["executor_input"] * SONNET_INPUT / 1_000_000
+ costs["executor_output"] * SONNET_OUTPUT / 1_000_000
)
if has_advisor and "advisor_input" in costs:
total += (
costs["advisor_input"] * OPUS_INPUT / 1_000_000
+ costs["advisor_output"] * OPUS_OUTPUT / 1_000_000
)
return total

cost_with = calc_cost(with_advisor, has_advisor=True)
cost_without = calc_cost(without_advisor)

print(f"\n{'='*50}")
print(f"WITH ADVISOR:")
print(f" Executor: {with_advisor['executor_input']}in / "
f"{with_advisor['executor_output']}out")
if "advisor_input" in with_advisor:
print(f" Advisor: {with_advisor['advisor_input']}in / "
f"{with_advisor['advisor_output']}out")
print(f" Total cost: ${cost_with:.4f}")
print(f"\nWITHOUT ADVISOR:")
print(f" Executor: {without_advisor['executor_input']}in / "
f"{without_advisor['executor_output']}out")
print(f" Total cost: ${cost_without:.4f}")
print(f"\nDifference: ${cost_with - cost_without:+.4f} "
f"({(cost_with/cost_without - 1)*100:+.1f}%)")

Step 3: Run the Comparison

python
if __name__ == "__main__":
base = sys.argv[1] if len(sys.argv) > 1 else "main"
diff = get_diff(base)

if diff == "No changes found.":
print("No diff found. Provide a base branch: "
"python review_agent.py <base-branch>")
sys.exit(1)

print(f"Reviewing {len(diff)} chars of diff...")
print(f"\n--- WITH ADVISOR ---")
resp_with = review_with_advisor(diff)
costs_with = extract_costs(resp_with, has_advisor=True)

# Print the review
for block in resp_with.content:
if hasattr(block, "text"):
print(block.text)

print(f"\n--- WITHOUT ADVISOR ---")
resp_without = review_without_advisor(diff)
costs_without = extract_costs(resp_without)

for block in resp_without.content:
if hasattr(block, "text"):
print(block.text)

print_cost_comparison(costs_with, costs_without)

Step 4: Run It

bash
# Review current branch against main
python review_agent.py main

# Review against a specific branch
python review_agent.py feature/auth-refactor

What to Look For

  1. Advisor timing: Does the executor call the advisor before writing comments? Check the response content array for server_tool_use blocks — they should appear early.
  2. Review quality: Compare the advisor-enhanced review against the solo review. The advisor version should flag more structural issues and fewer nitpicks.
  3. Cost delta: On a medium-sized diff (~500 lines), expect the advisor version to cost 10-20% more but catch 2-3 issues the solo version misses.
  4. Cache behavior: Run the same diff twice. On the second run, advisor_cache_read should be nonzero if caching is working.

Extension: Multi-Turn Review

Turn this into a multi-turn conversation where the author responds to comments:

python
def multi_turn_review(diff: str, author_responses: list[str]):
messages = [
{"role": "user", "content": f"Review this diff:\n\n```diff\n{diff}\n```"}
]

tools = [{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}]

for response_text in author_responses:
# Get review
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=tools,
messages=messages,
)

# Add assistant response to history
messages.append({"role": "assistant", "content": response.content})

# Add author's response
messages.append({"role": "user", "content": response_text})

# Final review pass
return client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=tools,
messages=messages,
)

Verification

  • [ ] Advisor calls appear in response content as server_tool_use blocks
  • [ ] Review identifies at least one issue the solo version missed
  • [ ] usage.iterations array contains advisor_message entries
  • [ ] Cost tracking produces nonzero values for both runs
  • [ ] Second run shows cache_read_input_tokens > 0 for advisor iterations

Why this lab matters: The advisor pattern isn't theoretical — it's a shipping API primitive. Building a real pipeline with it teaches you the cost model, caching behavior, and streaming quirks that documentation alone can't convey. Once you've built one advisor-powered agent, the pattern transfers to any agentic workload: coding, research, data analysis, deployment automation.