According to monitoring by 1M AI News, Ant Group engineer and Umi.js front-end framework author Chen Cheng reverse-engineered the source code of Claude Code 2.1.81, fully restoring what happens after pressing auto mode. The key discovery: each tool invocation passes through a four-layer decision pipeline. Only when the first three layers cannot determine safety does it call an independent AI classifier for security review.
The four-layer pipeline is as follows: the first layer checks existing permission rules; if matched, it allows directly; the second layer simulates acceptEdits mode (i.e., permission to edit files). If it passes in this mode, the risk is considered low, and it skips the classifier; the third layer checks a whitelist of read-only tools (Read, Grep, Glob, LSP, WebSearch, etc.), which do not modify any state and are unconditionally allowed; only if all these conditions are unmet does it proceed to the fourth layer, sending an API request to Claude Sonnet for security classification.
Key design details of the classifier include: always using Sonnet instead of Opus to balance cost and latency; setting temperature to 0 to ensure deterministic output; the classifier is defined as a “security monitor for autonomous AI programming agents,” protecting against three types of risks (prompt injection, scope creep, unintended harm); the user’s CLAUDE.md configuration file is injected into the classifier context as a basis for judging user intent.
The interception rules cover over 22 categories, including force push, direct push to main branch, downloading and executing external code, production deployment, data leaks, self-modification permissions, creation of remote code execution surfaces, credential leaks, etc. Exceptions for allowing include 7 types: hardcoded test keys, local file operations within the working directory, read-only GET requests, installing declared dependencies, official toolchain installation, reading configuration credentials sent to target providers, and pushing to the current working branch.
There is also a circuit breaker mechanism: after 3 consecutive rejections or a total of 20 rejections, the system downgrades to manual confirmation; in headless mode, it directly aborts the agent. When the classifier is unavailable, a feature flag controls whether it “fail-closed” (immediately reject) or “fail-open” (downgrade to manual confirmation).
In auto mode, prompt injection during behavior is also finely controlled: injected once every 5 dialogue rounds; within each 5-injection cycle, the first injection is a full version (about 800 words, including six instructions like “execute immediately,” “reduce interruptions,” “action over plan”), and the remaining four are concise versions, balancing context window usage and behavioral stability.