Interactive Experience

Five Levels of Running Claude Code More Autonomously

Autonomy is not a single switch. It starts with removing friction and ends with the system knowing whether the last iteration actually improved the work.

↓ scroll or click to explore
Level 01

Remove Permission Friction

The first bottleneck is obvious. If Claude has to ask for approval every time it edits a file, runs a test, or touches the shell, it stops being something you can leave alone for more than a few minutes.

Anthropic exposes several permission modes: default, acceptEdits, dontAsk, plan, and bypassPermissions. The practical move is full bypass — but only when the surrounding environment is disposable enough that mistakes are containable.

"The shortcut is the flag. The actual upgrade is the sandbox."

--dangerously-skip-permissions --permission-mode bypassPermissions --allow-dangerously-skip-permissions
claude-code — permission demo
Level 02

Treat Context as an Operating Budget

Claude Code can use 1M token variants of Opus 4.6 and Sonnet 4.6. That does not remove the need for context management. Once a session carries old logs, abandoned plans, and stale assumptions, autonomy degrades quietly.

People who get more out of Claude Code treat context like a budget: /clear between tasks, /compact before drift, store durable state in files.

"A larger context window is still a working set, not durable memory."

context budget — visualizer
5%
50K
Tokens Used
0
Tasks Run
0
Compactions
Decision Quality
95%
Level 03

Move Work into Separate Contexts

Each subagent runs in its own context window with its own prompt, tool restrictions, model choice, and permission mode. Keeps high-volume work local and returns a summary to the main thread.

Instead of one overstuffed conversation, you break the work into roles. One agent explores. Another runs tests. Another reviews. Another implements. The main thread becomes an orchestrator.

"They stop local noise from contaminating the main thread."

subagent orchestrator
Main Thread 12% Clean
🔍 Explorer
🧪 Tester
📋 Reviewer
⚙️ Implement
Click a subagent to spawn it
Level 04

Turn Stop into Another Iteration

Anthropic's hooks system intercepts exit attempts and feeds the task back in. The Ralph Wiggum plugin keeps Claude working until a completion phrase appears or an iteration cap is hit. Files, git history, and logs become persistent memory between passes.

The RepoMirror team ran Claude Code headlessly in loops overnight and came back to 1,000+ commits across several projects. The technique works — when the success criteria are clear.

"Ralph-style loops are persistence machinery. They are not verification machinery."

ralph loop — iteration hook
0
Iterations
🔨 Work
🛑 Stop
🪝 Hook
Check
Completion criteria: tests passing (not met)
Level 05

Stop Rewarding Activity, Start Rewarding Measured Improvement

Karpathy's autoresearch is a structured evaluation loop. The agent edits train.py (630 lines), runs a fixed 5-minute experiment, checks val_bpb, and decides: keep or revert. ~12 experiments/hour, ~100 overnight.

Replace val_bpb with whatever objective matters in your domain: test pass rate, benchmark latency, bundle size, accessibility score. The important thing is that the metric exists before the loop starts.

"Did that last iteration make things better?"

autoresearch — evaluation loop
0
Experiments
0
Kept
0
Reverted
val_bpb over experiments 📄 train.py — 630 lines
Click "Run Experiments" to begin...