Twenty-four hours after v2.1.100 (which was just a CHANGELOG update), Anthropic dropped v2.1.101 — the densest bug-fix release in Claude Code's history. One security patch, two performance fixes, and 35+ bug fixes across sessions, permissions, subagents, plugins, and UI. If you haven't updated: npm i -g @anthropic-ai/claude-code@latest.
The Security Fix
A command injection vulnerability in the POSIX which fallback used for LSP binary detection. If your system lacked a native which, an attacker could craft a binary name that executed arbitrary commands during LSP startup. Fixed by sanitizing input before shell execution.
Two Performance Fixes That Matter
1. Virtual scroller memory leak. Long sessions were retaining dozens of historical copies of the message list. If you've noticed Claude Code getting sluggish after 30+ minutes of active use — this is why. The fix drops old copies during scroll updates.
2. Hardcoded 5-minute request timeout. Every API call had a 5-minute ceiling regardless of your backend. If you're running local LLMs (LM Studio, Ollama), using extended thinking, or going through slow corporate gateways, you've hit this wall. The fix respects API_TIMEOUT_MS — set it to whatever your backend needs.
The Resume Overhaul
Eight separate --resume fixes in one release:
- Dead-end branch anchoring — large sessions could resume on a dead branch instead of the live conversation
- Subagent chain bridging — resume could accidentally jump into a subagent's conversation instead of the main chain
- Missing file_path crash — persisted Edit/Write results without
file_pathcrashed the loader - Stale worktree —
claude -wfailed with "already exists" after unclean worktree cleanup - Narrow /resume picker — sessions from other projects were hidden by default
/btwtranscript bloat — every/btwwas writing the entire conversation to disk--continue -pmismatch — headless sessions created with-pcouldn't continue with-p- Windows Terminal preview — preview pane was unreachable
Other Highlights
- OS CA certificate trust: Enterprise TLS proxies now work by default. Previously required manual cert configuration. Set
CLAUDE_CODE_CERT_STORE=bundledto opt out. /team-onboarding: Generates a teammate ramp-up guide from your local usage patterns. One command to onboard someone into your Claude Code workflow.permissions.denyoverride fix: Previously, a PreToolUse hook returningpermissionDecision: "ask"could downgrade apermissions.denyrule into a prompt instead of blocking. Fixed — deny now always wins.- Subagent MCP inheritance: Subagents now inherit MCP tools from dynamically-injected servers. If you've been wondering why your subagents couldn't call MCP tools that the lead session could — this is the fix.
- Resilient settings: An unrecognized hook event name in
settings.jsonno longer nukes your entire config file.
Why this matters: v2.1.101 fixes the three biggest agent-killer categories simultaneously: session corruption (8 resume fixes), resource exhaustion (memory leak + timeout), and security escalation (command injection + permission bypass). Update now.
Issue #8 covered v2.1.98's security patches. But the same release shipped something more transformative: the Monitor tool — a primitive that turns Claude Code from a poll-and-check agent into an event-driven one.
The Problem with Polling
Every agent builder has written this loop:
Each iteration is a full API call. Five cycles checking a 3-minute build = five context windows loaded, four of them producing nothing. You're paying for idle thinking.
How Monitor Works
Monitor runs a shell command and streams each stdout line as a notification into the conversation. Claude reacts only when something happens.
Four parameters:
description— label that appears with each notificationcommand— shell command whose stdout becomes the event streamtimeout_ms— default 300,000 (5 min), max 3,600,000 (1 hour)persistent— boolean; stays alive until killed withTaskStop
graph LR
A[Background Process] -->|stdout lines| B[Monitor Tool]
B -->|batched notifications
200ms window| C[Claude Session]
C -->|reacts only on events| D[Fix / Investigate]
A -->|stderr| E[Log File]
Critical Gotchas
1. Always use grep --line-buffered. Without it, pipe buffering can delay notifications by minutes. The monitor reacts to stdout *lines* — if grep buffers its output (default behavior when stdout isn't a terminal), your events arrive in batches of 4KB instead of per-line.
2. Handle transient failures. A monitor that exits kills the notification stream. Use || true in poll loops or wrap with retry logic:
grep --line-buffered -E "ERROR|WARN" || sleep 2; done
3. Every stdout line is a conversation message. Verbose output triggers automatic shutdown. Filter aggressively — Monitor is for signals, not logs.
Token Economics: Monitor vs. Polling
| Approach | 10-min build, checking every 2 min | API calls | Wasted context |
|---|---|---|---|
| Polling loop | 5 full calls, 4 idle | 5 | ~80% |
| Monitor + grep | 1 setup call + N event reactions | 1 + events | ~0% |
The savings compound with build time. A 30-minute CI pipeline checked every 3 minutes = 10 idle API calls. Monitor: zero.
Three Production Patterns
Build watcher:
Log sentinel:
File change reactor:
Monitor vs. Related Tools
| Tool | Purpose | Notification |
|---|---|---|
run_in_background | One-shot tasks | Single notification at completion |
| Hooks | Validate Claude's own actions | Lifecycle events |
/loop | Scheduled polling | Timer-based, full context reload |
| Monitor | Observe external systems | Per-line stdout streaming |
Why this matters: Monitor is the first Claude Code primitive that breaks the request-response cycle. Your agent doesn't need to ask "has anything changed?" — it gets told. This is the foundation for reactive, event-driven agent architectures.
On April 9, Anthropic shipped the advisor tool — a new beta API primitive that pairs a fast executor model with Opus as an on-demand strategic advisor. All within a single API call. No orchestration code, no extra round trips. This changes agent economics.
The Core Idea
Most agentic work is mechanical: reading files, running commands, writing code. You don't need Opus for that. But every 5-10 turns, the agent hits a decision point: which approach to take, how to decompose a problem, whether the current path is working. That's where Opus earns its cost.
The advisor tool makes this pattern a first-class API primitive:
sequenceDiagram
participant E as Executor (Sonnet 4.6)
participant A as Advisor (Opus 4.6)
participant T as Tools
E->>T: Read files, explore codebase
E->>A: "I've seen the code. What's the plan?"
A-->>E: "1. Refactor X first 2. Then add Y 3. Watch for Z"
E->>T: Implement step 1
E->>T: Implement step 2
E->>T: Run tests
E->>A: "Tests pass. Am I done?"
A-->>E: "Check edge case Z — step 3 isn't handled"
E->>T: Fix edge case
E->>T: Tests pass
E->>E: Done
The API
Beta header: anthropic-beta: advisor-tool-2026-03-01
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
betas=["advisor-tool-2026-03-01"],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
}
],
messages=[
{
"role": "user",
"content": "Refactor the auth module to use JWT tokens.",
}
],
)
The executor decides when to call the advisor — it's just another tool. When invoked:
- Executor emits
server_tool_usewithname: "advisor"and emptyinput - Server runs a separate Opus inference on the full transcript
- Advisor response returns as
advisor_tool_result - Executor continues, informed by the advice
All within one /v1/messages request. The advisor sees everything: system prompt, tool definitions, all prior turns, all tool results.
Verified Benchmarks
| Configuration | SWE-bench ML | BrowseComp | Cost vs. Sonnet Solo |
|---|---|---|---|
| Sonnet solo | 72.1% | — | baseline |
| Sonnet + Opus advisor | 74.8% (+2.7pp) | — | −11.9% |
| Haiku solo | — | 19.7% | −85% |
| Haiku + Opus advisor | — | 41.2% (>2×) | −85% vs Sonnet |
The counterintuitive result: Sonnet + Opus advisor is both *better* and *cheaper* than Sonnet alone. The advisor's plan reduces total tool calls and conversation length, saving more executor tokens than the advisor consumes.
Caching: The clear_thinking Gotcha
Enable advisor-side caching for conversations with 3+ advisor calls:
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}
Critical: If you use extended thinking with clear_thinking set to anything other than "all", the advisor's transcript shifts each turn — causing cache misses on every call. The default keep: {type: "thinking_turns", value: 1} triggers this. Set keep: "all" to preserve advisor cache stability.
Cost Control
max_uses caps advisor calls per request. For conversation-level budgets, count client-side. When you hit your cap, remove the advisor tool from tools and strip all advisor_tool_result blocks from your message history — leaving them without the tool definition causes a 400 invalid_request_error.
The Conciseness Trick
One system prompt line cuts advisor output tokens 35-45%:
Advisor output (400-700 text tokens) is the biggest cost driver. This constraint forces the advisor to plan, not explain.
Recommended System Prompt Pattern
For coding tasks, prepend this to your executor's system prompt:
to an interpretation, before building on an assumption. If the task
requires orientation first (finding files, fetching a source), do that,
then call advisor. Orientation is not substantive work.
Also call advisor:
- When you believe the task is complete (AFTER making deliverables durable)
- When stuck — errors recurring, approach not converging
- When considering a change of approach
This pattern produced the highest intelligence at near-Sonnet cost in Anthropic's internal evaluations.
Valid Model Pairs
| Executor | Advisor |
|---|---|
| Haiku 4.5 | Opus 4.6 |
| Sonnet 4.6 | Opus 4.6 |
| Opus 4.6 | Opus 4.6 |
The advisor must be ≥ the executor's capability. Opus advising itself is valid — useful when you want a fresh perspective mid-generation.
Streaming Behavior
The advisor does not stream. When the executor calls the advisor, the output stream pauses. SSE ping keepalives fire every ~30 seconds during the pause. The full advisor_tool_result arrives in one content_block_start event, then executor streaming resumes.
Plan for this in UIs — show a "thinking..." indicator when you see server_tool_use with name: "advisor".
Why this matters: The advisor tool is the first official Anthropic primitive for model routing inside a single conversation. It replaces ad-hoc orchestration (routing between models in your own code) with a server-side mechanism where the executor decides when it needs help. For anyone building agents with claude -p or the SDK, this is the new default pattern for cost-quality optimization.
A recent analysis of Claude Code's source code identified four implementation patterns that every agent builder should understand. These aren't features you configure — they're architectural decisions baked into the runtime that explain *why* Claude Code behaves the way it does. And they're patterns worth stealing for your own agents.
Pattern 1: Deferred Tool Loading
Claude Code ships with 50+ tools. Loading all their Zod schemas into the context window at startup would consume thousands of tokens on tools the model might never call.
The solution: Two-phase loading.
graph TD
A[Startup] -->|tool names only| B[Context Window]
B -->|model needs a tool| C[ToolSearch meta-tool]
C -->|name parts: 10pts
search hints: 4pts
descriptions: 2pts| D[Relevance Scoring]
D -->|top matches| E[Full Schemas via
tool_reference blocks]
E -->|server-side expansion| F[No Extra Round Trip]
lazySchema() wraps each tool's Zod schema in a factory, deferring construction until first access. The ToolSearch meta-tool accepts queries and returns full schemas on demand, scored by name match (10 points), search hints (4 points), and description relevance (2 points). Direct lookup (select:ToolName) skips scoring entirely.
Results return as tool_reference blocks — an Anthropic API extension where the server expands the schema inline. No extra client round-trip required.
Steal this pattern when: Your agent has more than ~15 tools. Below that threshold, the context savings don't justify the extra inference hop.
Pattern 2: Diminishing Returns Detection
Claude Code doesn't just stop at token limits. It monitors output *quality* across continuations:
tracker.continuationCount >= 3 &&
deltaSinceLastCheck < DIMINISHING_THRESHOLD &&
tracker.lastDeltaTokens < DIMINISHING_THRESHOLD;
The threshold: 3+ continuations where the last two each produced fewer than 500 new tokens. When triggered, the system flags diminishingReturns on the StopDecision object.
This distinguishes genuine budget exhaustion (the model has more to say but ran out of tokens) from spinning (the model is repeating itself, generating filler, or stuck in a loop). Without this check, a max_tokens continuation blindly extends — burning tokens on content that adds nothing.
Steal this pattern when: Building any agent with multi-turn continuations. Check if your last N rounds actually produced meaningful new content before continuing.
Pattern 3: Cache-Aware Context Compaction
Most agents compact context when it gets too large. Claude Code compacts *strategically*, based on whether the cache is likely still warm:
(Date.now() - new Date(lastAssistant.timestamp).getTime()) / 60_000;
if (gapMinutes < config.gapThresholdMinutes) return null; // cache warm
If the gap between your last message and the current one is shorter than the cache TTL (5 minutes for Anthropic's prompt cache), the old tool results are still cached server-side. Compacting them saves token count but *forces a cache miss* — you'd pay more to re-read the compacted version than to let the cache serve the original.
When the gap exceeds the threshold, the cache is presumed cold. Now compaction pays off: stale tool results are stripped before sending, and the smaller payload avoids re-caching content that's about to become stale again.
For surgical removal, the system can use the cache_edits API to remove specific content blocks without invalidating the remaining cache prefix.
Steal this pattern when: Your agent has idle periods between interactions. Don't compact immediately — check if the cache is still warm first.
Pattern 4: Coalesced Background Memory Extraction
When you chat quickly, Claude Code extracts memories from your conversation in the background. But what happens when 5 messages arrive in rapid succession?
A naive implementation queues 5 extraction jobs. Claude Code uses a single-slot overwrite instead:
pendingContext = { context, appendSystemMessage };
return; // overwrite, don't queue
}
Five rapid messages = exactly two extraction runs: one in-flight (started before the burst) and one final (with the latest context). The pendingContext slot is overwritten, not appended — each new message replaces the pending request rather than queuing behind it.
Additional safeguards:
- UUID-based cursor tracking that survives message compaction
- Mutual exclusion: skips extraction if the main agent just wrote to memory (prevents clobbering)
- 60-second soft timeout during shutdown to let in-flight extraction finish
Steal this pattern when: Building any background processing pipeline that processes conversation events. A single-slot overwrite is almost always better than a full job queue for "process the latest state" workloads.
The Meta-Lesson
All four patterns share a theme: optimize for real cost, not abstract metrics. Don't count tokens — check if the cache is warm. Don't limit continuations by count — check if they're producing value. Don't process every event — process the latest one. The system optimizes for what actually costs money and time, not for what's easiest to measure.
AgentLint hooks into Claude Code's lifecycle to evaluate 68 rules across 8 rule packs — all without making a single LLM call. Pure heuristic evaluation, millisecond latency, deterministic results.
Architecture
graph TD
A[Claude Code Session] -->|lifecycle events| B[AgentLint Hooks]
B --> C{Which events?}
C -->|PreToolUse: Bash| D[Security + Quality Rules]
C -->|PreToolUse: Edit/Write| E[Code Quality Rules]
C -->|PostToolUse| F[Result Validation]
C -->|UserPromptSubmit| G[Input Guardrails]
C -->|SubagentStart/Stop| H[Agent Monitoring]
C -->|Notification| I[Alert Rules]
C -->|Stop| J[Session Cleanup]
D & E & F & G & H & I & J --> K[Pass / Warn / Block]
AgentLint hooks into 7 Claude Code lifecycle events: PreToolUse (Bash, Edit, Write), PostToolUse, UserPromptSubmit, SubagentStart/Stop, Notification, and Stop. Every event triggers the relevant rule packs, evaluated locally in milliseconds.
The 8 Rule Packs
| Pack | Rules | Activation | Examples |
|---|---|---|---|
| Universal | 19 | Always on | Secrets detection, force-push prevention, destructive command warnings, test weakening detection |
| Quality | 7 | Auto-active | Commit message format, error handling removal, large diffs, dead imports |
| Python | 6 | pyproject.toml detected | SQL injection blocking, unsafe shell execution, bare except clauses |
| Frontend | 8 | package.json detected | Accessibility (alt text, form labels, touch targets), responsive patterns |
| React | 3 | React dependency detected | Query loading states, empty state handling |
| SEO | 4 | Web project detected | Page metadata, OG tags, structured data |
| Security | Opt-in | Manual enable | Bash file write blocking, network exfiltration, credential leakage |
| Autopilot | 18 (alpha) | Manual enable | Production guards, dry-run requirements, cloud resource protection |
Auto-detection reads your pyproject.toml and package.json to activate matching language packs. A Python/React project automatically gets Universal + Quality + Python + Frontend + React — no configuration needed.
Custom Rules
Drop a Python file in .agentlint/rules/:
class NoHardcodedSecrets(Rule):
name = "no-hardcoded-secrets"
severity = Severity.ERROR
events = ["PreToolUse:Write", "PreToolUse:Edit"]
def evaluate(self, event):
content = event.get("content", "")
patterns = [r"sk-[a-zA-Z0-9]{48}", r"ghp_[a-zA-Z0-9]{36}"]
for p in patterns:
if re.search(p, content):
return self.fail(f"Hardcoded secret detected: {p}")
return self.pass_rule()
Installation
setup writes hooks into .claude/settings.json automatically. Configuration lives in agentlint.yml with severity levels (strict, standard, relaxed) and allowlist patterns.
The Autopilot Angle
The alpha Autopilot pack (18 rules) is the most interesting for agent operators. It enforces production safety guardrails: require --dry-run before destructive operations, block cloud resource creation without confirmation, flag production database connections, and require explicit approval for any operation that could affect billing.
Why this matters: The "Walls Beat Signs" principle from issue #2 — mechanical enforcement beats instructional rules. AgentLint is walls: deterministic, millisecond, zero-LLM-cost guardrails that work even when Claude is under pressure and ignoring your CLAUDE.md. 68 rules, 16 stars, actively maintained at 125+ commits.
Twill.ai (YC S25, 67 HN points) runs autonomous coding agents in isolated cloud sandboxes and delivers the output as GitHub PRs. The interesting part isn't "cloud agents" — it's the architecture.
The 6-Step Pipeline
graph LR
A[Task] --> B[Research]
B --> C[Plan]
C --> D[🚦 Human Approval Gate]
D --> E[Implement]
E --> F[AI Code Review]
F --> G[Merge/PR]
The enforced approval gate between Plan and Implement is the design choice worth studying. Agents can research and plan autonomously, but they cannot write code until a human approves the plan. This inverts the common pattern where agents code first and humans review after — here, humans review the *approach* before any implementation happens.
Sandbox Lifecycle
Each task spins up an isolated sandbox: only the relevant code is cloned, builds and tests run inside it, then the sandbox is fully deleted after merge. No persistent agent environment means no accumulated state, no credential leakage across tasks, no "works on the agent's machine" drift.
You can SSH into running sandboxes with VS Code or Cursor while the agent works — collaborative debugging without sharing your local environment.
Parallel Agent Races
Run multiple agents on the same task for comparison. Two Sonnet instances working independently, or Sonnet vs. Codex on the same bug. The best PR wins. This is expensive but eliminates single-point-of-failure reasoning — if one agent takes a wrong turn, the other might not.
The Decision Framework
| Use Twill when... | Use local agents when... |
|---|---|
| Task is well-scoped (bug fix, dep update, test) | Task requires deep codebase intuition |
| You want sandbox isolation by default | You need MCP tools and custom hooks |
| Team coordination (tasks from Slack, Linear) | You're iterating interactively |
| Compliance requires ephemeral environments | You're on a Max plan and cost isn't a factor |
Why this matters: Twill's approval-gate-before-code pattern is worth adopting even if you never use the product. The principle: let agents research and plan freely, but gate code generation on human approval of the approach. This catches wrong-direction work before it becomes a wasted PR.
You'll build a code review agent that uses Sonnet 4.6 as the executor and Opus 4.6 as the advisor. The executor reads diffs and generates review comments. The advisor provides strategic guidance: which issues matter, what patterns to flag, when to approve.
Prerequisites
- Anthropic API key with advisor beta access
- Python 3.10+ with
anthropicSDK installed - A Git repo with a recent diff to review
Step 1: Set Up the Advisor Tool
import anthropic
import subprocess
import sys
client = anthropic.Anthropic()
REVIEW_SYSTEM = """You are a senior code reviewer. You have access to an advisor
tool backed by a stronger model.
Call advisor BEFORE writing any review comments — after reading the diff,
get strategic guidance on what matters most.
Call advisor AGAIN before finalizing — to catch anything you missed.
The advisor should respond in under 100 words and use enumerated steps,
not explanations.
Review criteria: correctness, security, performance, maintainability.
Flag issues by severity: 🔴 critical, 🟡 warning, 🟢 suggestion."""
def get_diff(base="main"):
"""Get the diff between current branch and base."""
result = subprocess.run(
["git", "diff", f"{base}...HEAD"],
capture_output=True, text=True
)
if result.returncode != 0:
# Fallback to unstaged changes
result = subprocess.run(
["git", "diff"], capture_output=True, text=True
)
return result.stdout or "No changes found."
def review_with_advisor(diff: str) -> dict:
"""Run a review with Sonnet executor + Opus advisor."""
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}
],
messages=[
{
"role": "user",
"content": f"Review this diff:\n\n```diff\n{diff}\n```",
}
],
)
return response
def review_without_advisor(diff: str) -> dict:
"""Run a review with Sonnet only (for comparison)."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
system=REVIEW_SYSTEM,
messages=[
{
"role": "user",
"content": f"Review this diff:\n\n```diff\n{diff}\n```",
}
],
)
return response
Step 2: Add Cost Tracking
"""Extract token costs from response usage."""
usage = response.usage
costs = {
"executor_input": usage.input_tokens,
"executor_output": usage.output_tokens,
"executor_cache_read": getattr(usage, "cache_read_input_tokens", 0),
}
if has_advisor and hasattr(usage, "iterations"):
for iteration in usage.iterations:
if iteration.get("type") == "advisor_message":
costs["advisor_input"] = iteration["input_tokens"]
costs["advisor_output"] = iteration["output_tokens"]
costs["advisor_cache_read"] = iteration.get(
"cache_read_input_tokens", 0
)
return costs
def print_cost_comparison(with_advisor, without_advisor):
"""Compare costs between advisor and non-advisor runs."""
# Pricing per 1M tokens (as of April 2026)
SONNET_INPUT = 3.0
SONNET_OUTPUT = 15.0
OPUS_INPUT = 15.0
OPUS_OUTPUT = 75.0
def calc_cost(costs, has_advisor=False):
total = (
costs["executor_input"] * SONNET_INPUT / 1_000_000
+ costs["executor_output"] * SONNET_OUTPUT / 1_000_000
)
if has_advisor and "advisor_input" in costs:
total += (
costs["advisor_input"] * OPUS_INPUT / 1_000_000
+ costs["advisor_output"] * OPUS_OUTPUT / 1_000_000
)
return total
cost_with = calc_cost(with_advisor, has_advisor=True)
cost_without = calc_cost(without_advisor)
print(f"\n{'='*50}")
print(f"WITH ADVISOR:")
print(f" Executor: {with_advisor['executor_input']}in / "
f"{with_advisor['executor_output']}out")
if "advisor_input" in with_advisor:
print(f" Advisor: {with_advisor['advisor_input']}in / "
f"{with_advisor['advisor_output']}out")
print(f" Total cost: ${cost_with:.4f}")
print(f"\nWITHOUT ADVISOR:")
print(f" Executor: {without_advisor['executor_input']}in / "
f"{without_advisor['executor_output']}out")
print(f" Total cost: ${cost_without:.4f}")
print(f"\nDifference: ${cost_with - cost_without:+.4f} "
f"({(cost_with/cost_without - 1)*100:+.1f}%)")
Step 3: Run the Comparison
base = sys.argv[1] if len(sys.argv) > 1 else "main"
diff = get_diff(base)
if diff == "No changes found.":
print("No diff found. Provide a base branch: "
"python review_agent.py <base-branch>")
sys.exit(1)
print(f"Reviewing {len(diff)} chars of diff...")
print(f"\n--- WITH ADVISOR ---")
resp_with = review_with_advisor(diff)
costs_with = extract_costs(resp_with, has_advisor=True)
# Print the review
for block in resp_with.content:
if hasattr(block, "text"):
print(block.text)
print(f"\n--- WITHOUT ADVISOR ---")
resp_without = review_without_advisor(diff)
costs_without = extract_costs(resp_without)
for block in resp_without.content:
if hasattr(block, "text"):
print(block.text)
print_cost_comparison(costs_with, costs_without)
Step 4: Run It
python review_agent.py main
# Review against a specific branch
python review_agent.py feature/auth-refactor
What to Look For
- Advisor timing: Does the executor call the advisor before writing comments? Check the response
contentarray forserver_tool_useblocks — they should appear early. - Review quality: Compare the advisor-enhanced review against the solo review. The advisor version should flag more structural issues and fewer nitpicks.
- Cost delta: On a medium-sized diff (~500 lines), expect the advisor version to cost 10-20% more but catch 2-3 issues the solo version misses.
- Cache behavior: Run the same diff twice. On the second run,
advisor_cache_readshould be nonzero if caching is working.
Extension: Multi-Turn Review
Turn this into a multi-turn conversation where the author responds to comments:
messages = [
{"role": "user", "content": f"Review this diff:\n\n```diff\n{diff}\n```"}
]
tools = [{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"caching": {"type": "ephemeral", "ttl": "5m"},
}]
for response_text in author_responses:
# Get review
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=tools,
messages=messages,
)
# Add assistant response to history
messages.append({"role": "assistant", "content": response.content})
# Add author's response
messages.append({"role": "user", "content": response_text})
# Final review pass
return client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
system=REVIEW_SYSTEM,
tools=tools,
messages=messages,
)
Verification
- [ ] Advisor calls appear in response content as
server_tool_useblocks - [ ] Review identifies at least one issue the solo version missed
- [ ]
usage.iterationsarray containsadvisor_messageentries - [ ] Cost tracking produces nonzero values for both runs
- [ ] Second run shows
cache_read_input_tokens > 0for advisor iterations
Why this lab matters: The advisor pattern isn't theoretical — it's a shipping API primitive. Building a real pipeline with it teaches you the cost model, caching behavior, and streaming quirks that documentation alone can't convey. Once you've built one advisor-powered agent, the pattern transfers to any agentic workload: coding, research, data analysis, deployment automation.