Claude Mastery — Saturday, Apr 11

2026-04-11
7 topics

2026-04-11

⌨️ CLI POWER MOVE

🔥 New

v2.1.101 — The Biggest Bug-Fix Release Yet

Twenty-four hours after v2.1.100 (which was just a CHANGELOG update), Anthropic dropped v2.1.101 — the densest bug-fix release in Claude Code's history. One security patch, two performance fixes, and 35+ bug fixes across sessions, permissions, subagents, plugins, and UI. If you haven't updated: npm i -g @anthropic-ai/claude-code@latest.

The Security Fix

A command injection vulnerability in the POSIX which fallback used for LSP binary detection. If your system lacked a native which, an attacker could craft a binary name that executed arbitrary commands during LSP startup. Fixed by sanitizing input before shell execution.

Two Performance Fixes That Matter

1. Virtual scroller memory leak. Long sessions were retaining dozens of historical copies of the message list. If you've noticed Claude Code getting sluggish after 30+ minutes of active use — this is why. The fix drops old copies during scroll updates.

2. Hardcoded 5-minute request timeout. Every API call had a 5-minute ceiling regardless of your backend. If you're running local LLMs (LM Studio, Ollama), using extended thinking, or going through slow corporate gateways, you've hit this wall. The fix respects API_TIMEOUT_MS — set it to whatever your backend needs.

The Resume Overhaul

Eight separate --resume fixes in one release:

Dead-end branch anchoring — large sessions could resume on a dead branch instead of the live conversation
Subagent chain bridging — resume could accidentally jump into a subagent's conversation instead of the main chain
Missing file_path crash — persisted Edit/Write results without file_path crashed the loader
Stale worktree — claude -w failed with "already exists" after unclean worktree cleanup
Narrow /resume picker — sessions from other projects were hidden by default
/btw transcript bloat — every /btw was writing the entire conversation to disk
--continue -p mismatch — headless sessions created with -p couldn't continue with -p
Windows Terminal preview — preview pane was unreachable

Other Highlights

OS CA certificate trust: Enterprise TLS proxies now work by default. Previously required manual cert configuration. Set CLAUDE_CODE_CERT_STORE=bundled to opt out.
/team-onboarding: Generates a teammate ramp-up guide from your local usage patterns. One command to onboard someone into your Claude Code workflow.
permissions.deny override fix: Previously, a PreToolUse hook returning permissionDecision: "ask" could downgrade a permissions.deny rule into a prompt instead of blocking. Fixed — deny now always wins.
Subagent MCP inheritance: Subagents now inherit MCP tools from dynamically-injected servers. If you've been wondering why your subagents couldn't call MCP tools that the lead session could — this is the fix.
Resilient settings: An unrecognized hook event name in settings.json no longer nukes your entire config file.

Why this matters: v2.1.101 fixes the three biggest agent-killer categories simultaneously: session corruption (8 resume fixes), resource exhaustion (memory leak + timeout), and security escalation (command injection + permission bypass). Update now.

anthropics/claude-code code.claude.com/docs

⌨️ CLI POWER MOVE

🔥 New🔧 Try It

The Monitor Tool — From Polling Loops to Event-Driven Agents

Issue #8 covered v2.1.98's security patches. But the same release shipped something more transformative: the Monitor tool — a primitive that turns Claude Code from a poll-and-check agent into an event-driven one.

The Problem with Polling

Every agent builder has written this loop:

Run tests → check output → fix → run tests → check output → fix → ...

Each iteration is a full API call. Five cycles checking a 3-minute build = five context windows loaded, four of them producing nothing. You're paying for idle thinking.

How Monitor Works

Monitor runs a shell command and streams each stdout line as a notification into the conversation. Claude reacts only when something happens.

Monitor("Watch for test failures", "npm test 2>&1 | grep --line-buffered FAIL")

Four parameters:

description — label that appears with each notification
command — shell command whose stdout becomes the event stream
timeout_ms — default 300,000 (5 min), max 3,600,000 (1 hour)
persistent — boolean; stays alive until killed with TaskStop

graph LR
    A[Background Process] -->|stdout lines| B[Monitor Tool]
    B -->|batched notifications
200ms window| C[Claude Session]
    C -->|reacts only on events| D[Fix / Investigate]
    A -->|stderr| E[Log File]

Critical Gotchas

1. Always use grep --line-buffered. Without it, pipe buffering can delay notifications by minutes. The monitor reacts to stdout *lines* — if grep buffers its output (default behavior when stdout isn't a terminal), your events arrive in batches of 4KB instead of per-line.

2. Handle transient failures. A monitor that exits kills the notification stream. Use || true in poll loops or wrap with retry logic:

bash

while true; do kubectl logs -f deploy/api --since=10s 2>/dev/null | \

  grep --line-buffered -E "ERROR|WARN" || sleep 2; done

3. Every stdout line is a conversation message. Verbose output triggers automatic shutdown. Filter aggressively — Monitor is for signals, not logs.

Token Economics: Monitor vs. Polling

Approach	10-min build, checking every 2 min	API calls	Wasted context
Polling loop	5 full calls, 4 idle	5	~80%
Monitor + grep	1 setup call + N event reactions	1 + events	~0%

The savings compound with build time. A 30-minute CI pipeline checked every 3 minutes = 10 idle API calls. Monitor: zero.

Three Production Patterns

Build watcher:

Monitor("CI status", "gh run watch --exit-status 2>&1 | grep --line-buffered -E 'completed|failed'")

Log sentinel:

Monitor("API errors", "journalctl -fu myservice | grep --line-buffered -i error")

File change reactor:

Monitor("Config changes", "inotifywait -m -e modify /etc/myapp/ 2>&1 | grep --line-buffered MODIFY")

Monitor vs. Related Tools

Tool	Purpose	Notification
`run_in_background`	One-shot tasks	Single notification at completion
Hooks	Validate Claude's own actions	Lifecycle events
`/loop`	Scheduled polling	Timer-based, full context reload
Monitor	Observe external systems	Per-line stdout streaming

Why this matters: Monitor is the first Claude Code primitive that breaks the request-response cycle. Your agent doesn't need to ask "has anything changed?" — it gets told. This is the foundation for reactive, event-driven agent architectures.

anthropics/claude-code code.claude.com/docs

🏗️ AGENT ARCHITECTURE

🔥 New🔬 Deep Dive

The Advisor Strategy — Opus as On-Demand Strategist

On April 9, Anthropic shipped the advisor tool — a new beta API primitive that pairs a fast executor model with Opus as an on-demand strategic advisor. All within a single API call. No orchestration code, no extra round trips. This changes agent economics.

The Core Idea

Most agentic work is mechanical: reading files, running commands, writing code. You don't need Opus for that. But every 5-10 turns, the agent hits a decision point: which approach to take, how to decompose a problem, whether the current path is working. That's where Opus earns its cost.

The advisor tool makes this pattern a first-class API primitive:

sequenceDiagram
    participant E as Executor (Sonnet 4.6)
    participant A as Advisor (Opus 4.6)
    participant T as Tools

    E->>T: Read files, explore codebase
    E->>A: "I've seen the code. What's the plan?"
    A-->>E: "1. Refactor X first 2. Then add Y 3. Watch for Z"
    E->>T: Implement step 1
    E->>T: Implement step 2
    E->>T: Run tests
    E->>A: "Tests pass. Am I done?"
    A-->>E: "Check edge case Z — step 3 isn't handled"
    E->>T: Fix edge case
    E->>T: Tests pass
    E->>E: Done

The API

Beta header: anthropic-beta: advisor-tool-2026-03-01

python

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(

    model="claude-sonnet-4-6",

    max_tokens=4096,

    betas=["advisor-tool-2026-03-01"],

    tools=[

        {

            "type": "advisor_20260301",

            "name": "advisor",

            "model": "claude-opus-4-6",

        }

    ],

    messages=[

        {

            "role": "user",

            "content": "Refactor the auth module to use JWT tokens.",

        }

    ],

)

The executor decides when to call the advisor — it's just another tool. When invoked:

Executor emits server_tool_use with name: "advisor" and empty input
Server runs a separate Opus inference on the full transcript
Advisor response returns as advisor_tool_result
Executor continues, informed by the advice

All within one /v1/messages request. The advisor sees everything: system prompt, tool definitions, all prior turns, all tool results.

Verified Benchmarks

Configuration	SWE-bench ML	BrowseComp	Cost vs. Sonnet Solo
Sonnet solo	72.1%	—	baseline
Sonnet + Opus advisor	74.8% (+2.7pp)	—	−11.9%
Haiku solo	—	19.7%	−85%
Haiku + Opus advisor	—	41.2% (>2×)	−85% vs Sonnet

The counterintuitive result: Sonnet + Opus advisor is both *better* and *cheaper* than Sonnet alone. The advisor's plan reduces total tool calls and conversation length, saving more executor tokens than the advisor consumes.

Caching: The `clear_thinking` Gotcha

Enable advisor-side caching for conversations with 3+ advisor calls:

python

{

    "type": "advisor_20260301",

    "name": "advisor",

    "model": "claude-opus-4-6",

    "caching": {"type": "ephemeral", "ttl": "5m"},

}

Critical: If you use extended thinking with clear_thinking set to anything other than "all", the advisor's transcript shifts each turn — causing cache misses on every call. The default keep: {type: "thinking_turns", value: 1} triggers this. Set keep: "all" to preserve advisor cache stability.

Cost Control

max_uses caps advisor calls per request. For conversation-level budgets, count client-side. When you hit your cap, remove the advisor tool from tools and strip all advisor_tool_result blocks from your message history — leaving them without the tool definition causes a 400 invalid_request_error.

The Conciseness Trick

One system prompt line cuts advisor output tokens 35-45%:

The advisor should respond in under 100 words and use enumerated steps, not explanations.

Advisor output (400-700 text tokens) is the biggest cost driver. This constraint forces the advisor to plan, not explain.

Recommended System Prompt Pattern

For coding tasks, prepend this to your executor's system prompt:

Call advisor BEFORE substantive work — before writing, before committing

to an interpretation, before building on an assumption. If the task

requires orientation first (finding files, fetching a source), do that,

then call advisor. Orientation is not substantive work.

Also call advisor:

- When you believe the task is complete (AFTER making deliverables durable)

- When stuck — errors recurring, approach not converging

- When considering a change of approach

This pattern produced the highest intelligence at near-Sonnet cost in Anthropic's internal evaluations.

Valid Model Pairs

Executor	Advisor
Haiku 4.5	Opus 4.6
Sonnet 4.6	Opus 4.6
Opus 4.6	Opus 4.6

The advisor must be ≥ the executor's capability. Opus advising itself is valid — useful when you want a fresh perspective mid-generation.

Streaming Behavior

The advisor does not stream. When the executor calls the advisor, the output stream pauses. SSE ping keepalives fire every ~30 seconds during the pause. The full advisor_tool_result arrives in one content_block_start event, then executor streaming resumes.

Plan for this in UIs — show a "thinking..." indicator when you see server_tool_use with name: "advisor".

Why this matters: The advisor tool is the first official Anthropic primitive for model routing inside a single conversation. It replaces ad-hoc orchestration (routing between models in your own code) with a server-side mechanism where the executor decides when it needs help. For anyone building agents with claude -p or the SDK, this is the new default pattern for cost-quality optimization.

platform.claude.com/docs claude.com/blog

🧭 OPERATOR THINKING

🔬 Deep Dive🌿 Evergreen

Four Runtime Patterns from Claude Code's Source

A recent analysis of Claude Code's source code identified four implementation patterns that every agent builder should understand. These aren't features you configure — they're architectural decisions baked into the runtime that explain *why* Claude Code behaves the way it does. And they're patterns worth stealing for your own agents.

Pattern 1: Deferred Tool Loading

Claude Code ships with 50+ tools. Loading all their Zod schemas into the context window at startup would consume thousands of tokens on tools the model might never call.

The solution: Two-phase loading.

graph TD
    A[Startup] -->|tool names only| B[Context Window]
    B -->|model needs a tool| C[ToolSearch meta-tool]
    C -->|name parts: 10pts
search hints: 4pts
descriptions: 2pts| D[Relevance Scoring]
    D -->|top matches| E[Full Schemas via
tool_reference blocks]
    E -->|server-side expansion| F[No Extra Round Trip]

lazySchema() wraps each tool's Zod schema in a factory, deferring construction until first access. The ToolSearch meta-tool accepts queries and returns full schemas on demand, scored by name match (10 points), search hints (4 points), and description relevance (2 points). Direct lookup (select:ToolName) skips scoring entirely.

Results return as tool_reference blocks — an Anthropic API extension where the server expands the schema inline. No extra client round-trip required.

Steal this pattern when: Your agent has more than ~15 tools. Below that threshold, the context savings don't justify the extra inference hop.

Pattern 2: Diminishing Returns Detection

Claude Code doesn't just stop at token limits. It monitors output *quality* across continuations:

javascript

const isDiminishing =

  tracker.continuationCount >= 3 &&

  deltaSinceLastCheck < DIMINISHING_THRESHOLD &&

  tracker.lastDeltaTokens < DIMINISHING_THRESHOLD;

The threshold: 3+ continuations where the last two each produced fewer than 500 new tokens. When triggered, the system flags diminishingReturns on the StopDecision object.

This distinguishes genuine budget exhaustion (the model has more to say but ran out of tokens) from spinning (the model is repeating itself, generating filler, or stuck in a loop). Without this check, a max_tokens continuation blindly extends — burning tokens on content that adds nothing.

Steal this pattern when: Building any agent with multi-turn continuations. Check if your last N rounds actually produced meaningful new content before continuing.

Pattern 3: Cache-Aware Context Compaction

Most agents compact context when it gets too large. Claude Code compacts *strategically*, based on whether the cache is likely still warm:

javascript

const gapMinutes =

  (Date.now() - new Date(lastAssistant.timestamp).getTime()) / 60_000;

if (gapMinutes < config.gapThresholdMinutes) return null; // cache warm

If the gap between your last message and the current one is shorter than the cache TTL (5 minutes for Anthropic's prompt cache), the old tool results are still cached server-side. Compacting them saves token count but *forces a cache miss* — you'd pay more to re-read the compacted version than to let the cache serve the original.

When the gap exceeds the threshold, the cache is presumed cold. Now compaction pays off: stale tool results are stripped before sending, and the smaller payload avoids re-caching content that's about to become stale again.

For surgical removal, the system can use the cache_edits API to remove specific content blocks without invalidating the remaining cache prefix.

Steal this pattern when: Your agent has idle periods between interactions. Don't compact immediately — check if the cache is still warm first.

Pattern 4: Coalesced Background Memory Extraction

When you chat quickly, Claude Code extracts memories from your conversation in the background. But what happens when 5 messages arrive in rapid succession?

A naive implementation queues 5 extraction jobs. Claude Code uses a single-slot overwrite instead:

javascript

if (inProgress) {

  pendingContext = { context, appendSystemMessage };

  return;  // overwrite, don't queue

}

Five rapid messages = exactly two extraction runs: one in-flight (started before the burst) and one final (with the latest context). The pendingContext slot is overwritten, not appended — each new message replaces the pending request rather than queuing behind it.

Additional safeguards:

UUID-based cursor tracking that survives message compaction
Mutual exclusion: skips extraction if the main agent just wrote to memory (prevents clobbering)
60-second soft timeout during shutdown to let in-flight extraction finish

Steal this pattern when: Building any background processing pipeline that processes conversation events. A single-slot overwrite is almost always better than a full job queue for "process the latest state" workloads.

The Meta-Lesson

All four patterns share a theme: optimize for real cost, not abstract metrics. Don't count tokens — check if the cache is warm. Don't limit continuations by count — check if they're producing value. Don't process every event — process the latest one. The system optimizes for what actually costs money and time, not for what's easiest to measure.

blog.raed.dev/posts

🌐 ECOSYSTEM INTEL

🔥 New🔧 Try It

AgentLint — 68 Hook Rules, Zero LLM Calls

AgentLint hooks into Claude Code's lifecycle to evaluate 68 rules across 8 rule packs — all without making a single LLM call. Pure heuristic evaluation, millisecond latency, deterministic results.

Architecture

graph TD
    A[Claude Code Session] -->|lifecycle events| B[AgentLint Hooks]
    B --> C{Which events?}
    C -->|PreToolUse: Bash| D[Security + Quality Rules]
    C -->|PreToolUse: Edit/Write| E[Code Quality Rules]
    C -->|PostToolUse| F[Result Validation]
    C -->|UserPromptSubmit| G[Input Guardrails]
    C -->|SubagentStart/Stop| H[Agent Monitoring]
    C -->|Notification| I[Alert Rules]
    C -->|Stop| J[Session Cleanup]
    D & E & F & G & H & I & J --> K[Pass / Warn / Block]

AgentLint hooks into 7 Claude Code lifecycle events: PreToolUse (Bash, Edit, Write), PostToolUse, UserPromptSubmit, SubagentStart/Stop, Notification, and Stop. Every event triggers the relevant rule packs, evaluated locally in milliseconds.

The 8 Rule Packs

Pack	Rules	Activation	Examples
Universal	19	Always on	Secrets detection, force-push prevention, destructive command warnings, test weakening detection
Quality	7	Auto-active	Commit message format, error handling removal, large diffs, dead imports
Python	6	`pyproject.toml` detected	SQL injection blocking, unsafe shell execution, bare except clauses
Frontend	8	`package.json` detected	Accessibility (alt text, form labels, touch targets), responsive patterns
React	3	React dependency detected	Query loading states, empty state handling
SEO	4	Web project detected	Page metadata, OG tags, structured data
Security	Opt-in	Manual enable	Bash file write blocking, network exfiltration, credential leakage
Autopilot	18 (alpha)	Manual enable	Production guards, dry-run requirements, cloud resource protection

Auto-detection reads your pyproject.toml and package.json to activate matching language packs. A Python/React project automatically gets Universal + Quality + Python + Frontend + React — no configuration needed.

Custom Rules

Drop a Python file in .agentlint/rules/:

python

from agentlint import Rule, Severity

class NoHardcodedSecrets(Rule):

    name = "no-hardcoded-secrets"

    severity = Severity.ERROR

    events = ["PreToolUse:Write", "PreToolUse:Edit"]

    def evaluate(self, event):

        content = event.get("content", "")

        patterns = [r"sk-[a-zA-Z0-9]{48}", r"ghp_[a-zA-Z0-9]{36}"]

        for p in patterns:

            if re.search(p, content):

                return self.fail(f"Hardcoded secret detected: {p}")

        return self.pass_rule()

Installation

bash

pip install agentlint && agentlint setup

setup writes hooks into .claude/settings.json automatically. Configuration lives in agentlint.yml with severity levels (strict, standard, relaxed) and allowlist patterns.

The Autopilot Angle

The alpha Autopilot pack (18 rules) is the most interesting for agent operators. It enforces production safety guardrails: require --dry-run before destructive operations, block cloud resource creation without confirmation, flag production database connections, and require explicit approval for any operation that could affect billing.

Why this matters: The "Walls Beat Signs" principle from issue #2 — mechanical enforcement beats instructional rules. AgentLint is walls: deterministic, millisecond, zero-LLM-cost guardrails that work even when Claude is under pressure and ignoring your CLAUDE.md. 68 rules, 16 stars, actively maintained at 125+ commits.

mauhpr/agentlint

🌐 ECOSYSTEM INTEL

🔥 New

Twill.ai — Cloud Agents That Deliver Pull Requests

Twill.ai (YC S25, 67 HN points) runs autonomous coding agents in isolated cloud sandboxes and delivers the output as GitHub PRs. The interesting part isn't "cloud agents" — it's the architecture.

The 6-Step Pipeline

graph LR
    A[Task] --> B[Research]
    B --> C[Plan]
    C --> D[🚦 Human Approval Gate]
    D --> E[Implement]
    E --> F[AI Code Review]
    F --> G[Merge/PR]

The enforced approval gate between Plan and Implement is the design choice worth studying. Agents can research and plan autonomously, but they cannot write code until a human approves the plan. This inverts the common pattern where agents code first and humans review after — here, humans review the *approach* before any implementation happens.

Sandbox Lifecycle

Each task spins up an isolated sandbox: only the relevant code is cloned, builds and tests run inside it, then the sandbox is fully deleted after merge. No persistent agent environment means no accumulated state, no credential leakage across tasks, no "works on the agent's machine" drift.

You can SSH into running sandboxes with VS Code or Cursor while the agent works — collaborative debugging without sharing your local environment.

Parallel Agent Races

Run multiple agents on the same task for comparison. Two Sonnet instances working independently, or Sonnet vs. Codex on the same bug. The best PR wins. This is expensive but eliminates single-point-of-failure reasoning — if one agent takes a wrong turn, the other might not.

The Decision Framework

Use Twill when...	Use local agents when...
Task is well-scoped (bug fix, dep update, test)	Task requires deep codebase intuition
You want sandbox isolation by default	You need MCP tools and custom hooks
Team coordination (tasks from Slack, Linear)	You're iterating interactively
Compliance requires ephemeral environments	You're on a Max plan and cost isn't a factor

Why this matters: Twill's approval-gate-before-code pattern is worth adopting even if you never use the product. The principle: let agents research and plan freely, but gate code generation on human approval of the approach. This catches wrong-direction work before it becomes a wasted PR.

twill.ai

🔬 PRACTICE LAB

🔧 Try It🔬 Deep Dive

Build an Advisor-Powered Code Review Pipeline

You'll build a code review agent that uses Sonnet 4.6 as the executor and Opus 4.6 as the advisor. The executor reads diffs and generates review comments. The advisor provides strategic guidance: which issues matter, what patterns to flag, when to approve.

Prerequisites

Anthropic API key with advisor beta access
Python 3.10+ with anthropic SDK installed
A Git repo with a recent diff to review

Step 1: Set Up the Advisor Tool

bash

pip install --upgrade anthropic

python

# review_agent.py

import anthropic

import subprocess

import sys

client = anthropic.Anthropic()

REVIEW_SYSTEM = """You are a senior code reviewer. You have access to an advisor 

tool backed by a stronger model. 

Call advisor BEFORE writing any review comments — after reading the diff, 

get strategic guidance on what matters most.

Call advisor AGAIN before finalizing — to catch anything you missed.

The advisor should respond in under 100 words and use enumerated steps, 

not explanations.

Review criteria: correctness, security, performance, maintainability.

Flag issues by severity: 🔴 critical, 🟡 warning, 🟢 suggestion."""

def get_diff(base="main"):

    """Get the diff between current branch and base."""

    result = subprocess.run(

        ["git", "diff", f"{base}...HEAD"],

        capture_output=True, text=True

    )

    if result.returncode != 0:

        # Fallback to unstaged changes

        result = subprocess.run(

            ["git", "diff"], capture_output=True, text=True

        )

    return result.stdout or "No changes found."

def review_with_advisor(diff: str) -> dict:

    """Run a review with Sonnet executor + Opus advisor."""

    response = client.beta.messages.create(

        model="claude-sonnet-4-6",

        max_tokens=8192,

        betas=["advisor-tool-2026-03-01"],

        system=REVIEW_SYSTEM,

        tools=[

            {

                "type": "advisor_20260301",

                "name": "advisor",

                "model": "claude-opus-4-6",

                "caching": {"type": "ephemeral", "ttl": "5m"},

            }

        ],

        messages=[

            {

                "role": "user",

                "content": f"Review this diff:\n\n```diff\n{diff}\n```",

            }

        ],

    )

    return response

def review_without_advisor(diff: str) -> dict:

    """Run a review with Sonnet only (for comparison)."""

    response = client.messages.create(

        model="claude-sonnet-4-6",

        max_tokens=8192,

        system=REVIEW_SYSTEM,

        messages=[

            {

                "role": "user",

                "content": f"Review this diff:\n\n```diff\n{diff}\n```",

            }

        ],

    )

    return response

Step 2: Add Cost Tracking

python

def extract_costs(response, has_advisor=False):

    """Extract token costs from response usage."""

    usage = response.usage

    costs = {

        "executor_input": usage.input_tokens,

        "executor_output": usage.output_tokens,

        "executor_cache_read": getattr(usage, "cache_read_input_tokens", 0),

    }

    if has_advisor and hasattr(usage, "iterations"):

        for iteration in usage.iterations:

            if iteration.get("type") == "advisor_message":

                costs["advisor_input"] = iteration["input_tokens"]

                costs["advisor_output"] = iteration["output_tokens"]

                costs["advisor_cache_read"] = iteration.get(

                    "cache_read_input_tokens", 0

                )

    return costs

def print_cost_comparison(with_advisor, without_advisor):

    """Compare costs between advisor and non-advisor runs."""

    # Pricing per 1M tokens (as of April 2026)

    SONNET_INPUT = 3.0

    SONNET_OUTPUT = 15.0

    OPUS_INPUT = 15.0

    OPUS_OUTPUT = 75.0

    def calc_cost(costs, has_advisor=False):

        total = (

            costs["executor_input"] * SONNET_INPUT / 1_000_000

            + costs["executor_output"] * SONNET_OUTPUT / 1_000_000

        )

        if has_advisor and "advisor_input" in costs:

            total += (

                costs["advisor_input"] * OPUS_INPUT / 1_000_000

                + costs["advisor_output"] * OPUS_OUTPUT / 1_000_000

            )

        return total

    cost_with = calc_cost(with_advisor, has_advisor=True)

    cost_without = calc_cost(without_advisor)

    print(f"\n{'='*50}")

    print(f"WITH ADVISOR:")

    print(f"  Executor: {with_advisor['executor_input']}in / "

          f"{with_advisor['executor_output']}out")

    if "advisor_input" in with_advisor:

        print(f"  Advisor:  {with_advisor['advisor_input']}in / "

              f"{with_advisor['advisor_output']}out")

    print(f"  Total cost: ${cost_with:.4f}")

    print(f"\nWITHOUT ADVISOR:")

    print(f"  Executor: {without_advisor['executor_input']}in / "

          f"{without_advisor['executor_output']}out")

    print(f"  Total cost: ${cost_without:.4f}")

    print(f"\nDifference: ${cost_with - cost_without:+.4f} "

          f"({(cost_with/cost_without - 1)*100:+.1f}%)")

Step 3: Run the Comparison

python

if __name__ == "__main__":

    base = sys.argv[1] if len(sys.argv) > 1 else "main"

    diff = get_diff(base)

    if diff == "No changes found.":

        print("No diff found. Provide a base branch: "

              "python review_agent.py <base-branch>")

        sys.exit(1)

    print(f"Reviewing {len(diff)} chars of diff...")

    print(f"\n--- WITH ADVISOR ---")

    resp_with = review_with_advisor(diff)

    costs_with = extract_costs(resp_with, has_advisor=True)

    # Print the review

    for block in resp_with.content:

        if hasattr(block, "text"):

            print(block.text)

    print(f"\n--- WITHOUT ADVISOR ---")

    resp_without = review_without_advisor(diff)

    costs_without = extract_costs(resp_without)

    for block in resp_without.content:

        if hasattr(block, "text"):

            print(block.text)

    print_cost_comparison(costs_with, costs_without)

Step 4: Run It

bash

# Review current branch against main

python review_agent.py main

# Review against a specific branch

python review_agent.py feature/auth-refactor

What to Look For

Advisor timing: Does the executor call the advisor before writing comments? Check the response content array for server_tool_use blocks — they should appear early.
Review quality: Compare the advisor-enhanced review against the solo review. The advisor version should flag more structural issues and fewer nitpicks.
Cost delta: On a medium-sized diff (~500 lines), expect the advisor version to cost 10-20% more but catch 2-3 issues the solo version misses.
Cache behavior: Run the same diff twice. On the second run, advisor_cache_read should be nonzero if caching is working.

Extension: Multi-Turn Review

Turn this into a multi-turn conversation where the author responds to comments:

python

def multi_turn_review(diff: str, author_responses: list[str]):

    messages = [

        {"role": "user", "content": f"Review this diff:\n\n```diff\n{diff}\n```"}

    ]

    tools = [{

        "type": "advisor_20260301",

        "name": "advisor",

        "model": "claude-opus-4-6",

        "caching": {"type": "ephemeral", "ttl": "5m"},

    }]

    for response_text in author_responses:

        # Get review

        response = client.beta.messages.create(

            model="claude-sonnet-4-6",

            max_tokens=8192,

            betas=["advisor-tool-2026-03-01"],

            system=REVIEW_SYSTEM,

            tools=tools,

            messages=messages,

        )

        # Add assistant response to history

        messages.append({"role": "assistant", "content": response.content})

        # Add author's response

        messages.append({"role": "user", "content": response_text})

    # Final review pass

    return client.beta.messages.create(

        model="claude-sonnet-4-6",

        max_tokens=8192,

        betas=["advisor-tool-2026-03-01"],

        system=REVIEW_SYSTEM,

        tools=tools,

        messages=messages,

    )

Verification

[ ] Advisor calls appear in response content as server_tool_use blocks
[ ] Review identifies at least one issue the solo version missed
[ ] usage.iterations array contains advisor_message entries
[ ] Cost tracking produces nonzero values for both runs
[ ] Second run shows cache_read_input_tokens > 0 for advisor iterations

Why this lab matters: The advisor pattern isn't theoretical — it's a shipping API primitive. Building a real pipeline with it teaches you the cost model, caching behavior, and streaming quirks that documentation alone can't convey. Once you've built one advisor-powered agent, the pattern transfers to any agentic workload: coding, research, data analysis, deployment automation.

platform.claude.com/docs

The Security Fix

Two Performance Fixes That Matter

The Resume Overhaul

Other Highlights

The Problem with Polling

How Monitor Works

Critical Gotchas

Token Economics: Monitor vs. Polling

Three Production Patterns

Monitor vs. Related Tools

The Core Idea

The API

Verified Benchmarks

Caching: The clear_thinking Gotcha

Cost Control

The Conciseness Trick

Recommended System Prompt Pattern

Valid Model Pairs

Streaming Behavior

Pattern 1: Deferred Tool Loading

Pattern 2: Diminishing Returns Detection

Pattern 3: Cache-Aware Context Compaction

Pattern 4: Coalesced Background Memory Extraction

The Meta-Lesson

Architecture

The 8 Rule Packs

Custom Rules

Installation

The Autopilot Angle

The 6-Step Pipeline

Sandbox Lifecycle

Parallel Agent Races

The Decision Framework

Prerequisites

Step 1: Set Up the Advisor Tool

Step 2: Add Cost Tracking

Step 3: Run the Comparison

Step 4: Run It

What to Look For

Extension: Multi-Turn Review

Verification

Caching: The `clear_thinking` Gotcha