Autonomy is not a single switch. It starts with removing friction and ends with the system knowing whether the last iteration actually improved the work.
The first bottleneck is obvious. If Claude has to ask for approval every time it edits a file, runs a test, or touches the shell, it stops being something you can leave alone for more than a few minutes.
Anthropic exposes several permission modes: default, acceptEdits, dontAsk, plan, and bypassPermissions. The practical move is full bypass — but only when the surrounding environment is disposable enough that mistakes are containable.
"The shortcut is the flag. The actual upgrade is the sandbox."
Claude Code can use 1M token variants of Opus 4.6 and Sonnet 4.6. That does not remove the need for context management. Once a session carries old logs, abandoned plans, and stale assumptions, autonomy degrades quietly.
People who get more out of Claude Code treat context like a budget: /clear between tasks, /compact before drift, store durable state in files.
"A larger context window is still a working set, not durable memory."
Each subagent runs in its own context window with its own prompt, tool restrictions, model choice, and permission mode. Keeps high-volume work local and returns a summary to the main thread.
Instead of one overstuffed conversation, you break the work into roles. One agent explores. Another runs tests. Another reviews. Another implements. The main thread becomes an orchestrator.
"They stop local noise from contaminating the main thread."
Anthropic's hooks system intercepts exit attempts and feeds the task back in. The Ralph Wiggum plugin keeps Claude working until a completion phrase appears or an iteration cap is hit. Files, git history, and logs become persistent memory between passes.
The RepoMirror team ran Claude Code headlessly in loops overnight and came back to 1,000+ commits across several projects. The technique works — when the success criteria are clear.
"Ralph-style loops are persistence machinery. They are not verification machinery."
Karpathy's autoresearch is a structured evaluation loop. The agent edits train.py (630 lines), runs a fixed 5-minute experiment, checks val_bpb, and decides: keep or revert. ~12 experiments/hour, ~100 overnight.
Replace val_bpb with whatever objective matters in your domain: test pass rate, benchmark latency, bundle size, accessibility score. The important thing is that the metric exists before the loop starts.
"Did that last iteration make things better?"