Interactive Experience

Five Levels of Running Claude Code More Autonomously

Autonomy is not a single switch. It starts with removing friction and ends with the system knowing whether the last iteration actually improved the work.

↓ scroll or click to explore

Level 01

Remove Permission Friction

The first bottleneck is obvious. If Claude has to ask for approval every time it edits a file, runs a test, or touches the shell, it stops being something you can leave alone for more than a few minutes.

Anthropic exposes several permission modes: default, acceptEdits, dontAsk, plan, and bypassPermissions. The practical move is full bypass — but only when the surrounding environment is disposable enough that mistakes are containable.

"The shortcut is the flag. The actual upgrade is the sandbox."

--dangerously-skip-permissions ⎘ --permission-mode bypassPermissions ⎘ --allow-dangerously-skip-permissions ⎘

claude-code — permission demo

bypassPermissions mode

Level 02

Treat Context as an Operating Budget

Claude Code can use 1M token variants of Opus 4.6 and Sonnet 4.6. That does not remove the need for context management. Once a session carries old logs, abandoned plans, and stale assumptions, autonomy degrades quietly.

People who get more out of Claude Code treat context like a budget: /clear between tasks, /compact before drift, store durable state in files.

"A larger context window is still a working set, not durable memory."

context budget — visualizer

50K

Tokens Used

Tasks Run

Compactions

Decision Quality

95%

Level 03

Move Work into Separate Contexts

Each subagent runs in its own context window with its own prompt, tool restrictions, model choice, and permission mode. Keeps high-volume work local and returns a summary to the main thread.

Instead of one overstuffed conversation, you break the work into roles. One agent explores. Another runs tests. Another reviews. Another implements. The main thread becomes an orchestrator.

"They stop local noise from contaminating the main thread."

subagent orchestrator

Main Thread 12% Clean

🔍 Explorer

🧪 Tester

📋 Reviewer

⚙️ Implement

Click a subagent to spawn it

Level 04

Turn Stop into Another Iteration

Anthropic's hooks system intercepts exit attempts and feeds the task back in. The Ralph Wiggum plugin keeps Claude working until a completion phrase appears or an iteration cap is hit. Files, git history, and logs become persistent memory between passes.

The RepoMirror team ran Claude Code headlessly in loops overnight and came back to 1,000+ commits across several projects. The technique works — when the success criteria are clear.

"Ralph-style loops are persistence machinery. They are not verification machinery."

ralph loop — iteration hook

Iterations

🔨

🛑

🪝

✓

Completion criteria: tests passing (not met)

Level 05

Stop Rewarding Activity, Start Rewarding Measured Improvement

Karpathy's autoresearch is a structured evaluation loop. The agent edits train.py (630 lines), runs a fixed 5-minute experiment, checks val_bpb, and decides: keep or revert. ~12 experiments/hour, ~100 overnight.

Replace val_bpb with whatever objective matters in your domain: test pass rate, benchmark latency, bundle size, accessibility score. The important thing is that the metric exists before the loop starts.

"Did that last iteration make things better?"

autoresearch — evaluation loop

Experiments

Kept

Reverted

val_bpb over experiments 📄 train.py — 630 lines

⏳ Click "Run Experiments" to begin...