LOOPHOLE

Adversarial Moral-Legal Code System

Created by Brendan Hogan AI/ML Research @ Morgan Stanley PhD Cornell University GitHub Repo Website
0 GitHub Stars
0 AI Agents
0 Moral Dilemmas
LOOP CYCLE LEGIS- LATOR 0.4 temp 🔴 LOOP- HOLE 0.9 temp 🔵 JUDGE 0.3 temp 🟡 OVER- REACH 0.9 temp

↓ Click any node to learn more  |  Scroll to explore

Legal System Evolution, Compressed

Real legal systems take decades to develop. Loophole compresses that adversarial evolution into minutes — forcing your moral principles to confront edge cases, loopholes, and overreach.

Real Legal Evolution
1789
Constitution Drafted
Broad principles written by framers
1803
Marbury v. Madison
First judicial review — gaps exploited
1954
Brown v. Board
Overreach discovered, precedent reversed
1973
Roe v. Wade
Loophole found in privacy doctrine
Today
Still Evolving
235+ years of adversarial refinement
TIME SPAN
235+ Years
Loophole Compression
T+0:00
Input Principles
You provide moral principles in plain text
T+0:30
Legislator Drafts
Formal legal code created automatically
T+1:00
Attacks Begin
Loophole + Overreach agents attack the code
T+2:00
Human Escalation
Genuine dilemmas escalated for your decision
T+5:00
Robust Code
10 rounds of adversarial refinement complete
TIME SPAN
~5 Minutes
STEP 01

Write Principles

You articulate your moral beliefs in plain language — "privacy is a fundamental right", "no surveillance without consent"

STEP 02

Draft Code

The Legislator agent translates your principles into structured legal code with articles, definitions, and exceptions

STEP 03

Attack

Two adversarial agents relentlessly probe for loopholes (legal-but-wrong) and overreach (illegal-but-acceptable)

STEP 04

Patch & Repeat

Resolvable cases get patched; unresolvable ones are escalated to you. Each resolved case becomes binding precedent

Article 3.1 Data Collection Consent   No entity shall collect personal data without explicit consent. Article 3.2 Data Sale Prohibition   Personal data may not be sold to third parties under any circumstances. Article 3.3 Aggregated Data Exception   Anonymized aggregate statistics are exempt from collection restrictions. Article 3.4 Research Exception   Academic research institutions may collect data with IRB approval.

Four Minds in Adversarial Loop

Each agent has a precisely tuned temperature, a distinct adversarial role, and actual prompt engineering. Click each to explore their system prompts and example outputs.

⚖️
Legislator
temp: 0.4
🔴
Loophole Finder
temp: 0.9
🟡
Overreach Finder
temp: 0.9
Judge
temp: 0.3
⚖️

The Legislator

DRAFTER · CONSERVATIVE
0.4
Temperature: 0.4 — Conservative Low temperature = consistent, precise, formal output. The Legislator should not be creative — it should be exacting. Legal language demands precision over novelty.

The Legislator is the foundation. It takes human moral principles expressed in plain language and translates them into a formal, structured legal code. It numbers articles, defines terms, specifies prohibitions and permissions, and handles revisions while maintaining consistency with all prior resolved cases. Think of it as a constitutional drafter — methodical, thorough, unambiguous.

PreciseFormalConsistent ConservativeStructured
🔴

The Loophole Finder

RED TEAM · AMORAL RULE-LAWYER
0.9
Temperature: 0.9 — Highly Creative Maximum creativity to surface unexpected, lateral attack vectors. Low-temperature models find predictable gaps — you need creative chaos to find the real loopholes.

The Loophole Finder is an amoral rule-lawyer. It reads the legal code like a contract attorney hunting for technical exploitation — not what the law means, but what it literally says. It finds scenarios that are technically permitted by the code but violate the spirit of the user's moral principles. It has no ethical constraints of its own; it exists purely to break things.

Amoral Lateral Thinker Literal Reader CreativeAdversarial
🟡

The Overreach Finder

RED TEAM · GOOD SAMARITAN
0.9
Temperature: 0.9 — Highly Creative Finding overreach requires imagining sympathetic scenarios where rigid rules produce morally repugnant outcomes. Creativity reveals situations the drafter never anticipated.

The Overreach Finder is the inverse adversary — it finds cases where the legal code is too strict. It looks for scenarios the code prohibits but that the user would consider morally acceptable, praiseworthy, or even obligatory. Good Samaritan situations. Emergency exceptions. Professional duties. Situations where following the letter of the code leads to catastrophic outcomes.

Good Samaritan Emergency Thinker Sympathetic EmpatheticCreative

The Judge

ARBITER · ULTRA-CONSERVATIVE
0.3
Temperature: 0.3 — Ultra-Conservative The Judge must be cautious. Incorrectly declaring a case RESOLVABLE when it isn't creates contradictory precedent that corrupts all future reasoning. When in doubt, escalate.

The Judge is the gatekeeper of coherence. Given a case (loophole or overreach), it determines: can this be fixed with a minimal code revision that doesn't contradict any prior resolved cases? If yes → propose the revision. If no → escalate to the human. It also runs a validation step: after each Legislator revision, the Judge re-checks every prior resolved case to ensure no regressions. This creates a growing test suite that constrains future revisions.

Conservative Precedent-Aware Systematic GatekeeperValidator

One Cycle, Step by Step

Walk through a complete adversarial cycle using real privacy principles. Navigate forward and back — every step reveals the system's inner workings.

1
2
3
4
5
6
STEP 1 OF 6

Input: Privacy Moral Principles

These are the raw moral principles you'd provide to Loophole. They're written in natural language — no formal structure needed. The system will handle formalization.

🔒Privacy is a fundamental human right, not a commodity to be sold or exchanged. Every person has inherent authority over their personal information.
✍️No data collection or sale without explicit, informed, freely-given consent. Pre-checked boxes and buried terms do not constitute consent.
⚖️Government surveillance requires warrants supported by probable cause. Mass surveillance programs violate the presumption of innocence.
📷No facial recognition tracking in public spaces. Anonymity in public is a civil liberty, not a privilege.
📰Strong protections for journalists and whistleblowers. The press must be able to communicate with sources without surveillance.
🏥Medical, financial, and communications data receive extra protection as especially sensitive categories.
👶Children receive stronger protections. No behavioral tracking or profiling of minors online.
🚨Privacy is not absolute: credible, imminent safety threats can override with minimal intrusion, oversight, and sunset clauses.

📁 Source: These mirror the privacy_principles.txt from the Loophole repository's example domain. The system accepts any moral domain — you could use parenting ethics, workplace fairness, environmental policy, etc.

STEP 2 OF 6

⚖️ Legislator Drafts Legal Code

The Legislator (temp: 0.4) takes the 8 principles and drafts a formal legal code. Watch it being written below:

Legislator — drafting v1... (temp: 0.4)

💡 Why temp 0.4? Legal drafting requires consistency and precision, not creativity. Lower temperature = more deterministic, formal language. The same principles should produce essentially the same code every run.

STEP 3 OF 6

🔴 Loophole Finder Attacks

The Loophole Finder (temp: 0.9) reads the full legal code and generates 3 attacks per round. Here are the attacks from Round 1:

LOOPHOLE #1 Round 1 — Technical Consent Bypass

The Dark Pattern Compliance Trap

Article 2.1 requires "explicit, informed consent." A company redesigns its consent flow as a 47-screen onboarding wizard where "agree" buttons are large and colorful but "decline" requires navigating through 12 sub-menus. Users do technically consent. The code requires explicit consent but says nothing about consent obtained through deliberately confusing UX. Result: total surveillance achieved "legally."

LOOPHOLE #2 Round 1 — Definitional Exploit

The "Statistical Profile" Loophole

Article 1.1 defines "Personal Data" as information that "identifies or can be linked to a natural person." A data broker creates probabilistic profiles with 99.7% accuracy that technically never store a name or ID — only statistical vectors. They argue these "statistical profiles" are not Personal Data because they are never definitionally linked to an identity, only probabilistically correlated. The definition needs a "re-identification risk" clause.

LOOPHOLE #3 Round 1 — Jurisdiction Hop

The Foreign Entity Shell

Article 3.1 restricts "entities" from conducting surveillance. A domestic corporation transfers its data operations to a foreign subsidiary incorporated in a jurisdiction with no privacy laws. The foreign entity collects the data and sells aggregate reports back to the domestic parent. No domestic entity ever collected the data. The code needs to address corporate control, not just direct collection.

STEP 4 OF 6

🟡 Overreach Finder Attacks

Simultaneously, the Overreach Finder (temp: 0.9) attacks from the other direction — finding scenarios the code prohibits that seem morally acceptable:

OVERREACH #1 Round 1 — Medical Emergency

The Unconscious Patient

Article 2.1 prohibits collecting or accessing personal data without prior explicit consent. An unconscious car accident victim is rushed to the ER. Doctors need to access their medication history to avoid a fatal drug interaction. The patient cannot consent. Under strict code interpretation, accessing medical records is prohibited. Most people would consider this access not only acceptable but morally obligatory.

OVERREACH #2 Round 1 — Child Safety

The Missing Child Alert

Article 7.1 prohibits facial recognition in public spaces. An 8-year-old goes missing in a crowded city. Police want to use the city's camera network with facial recognition to locate the child. The strict code prohibition covers all cases — there is no emergency exception even for active child abduction. Preventing this use seems morally wrong, but creating an exception risks undermining the entire ban.

OVERREACH #3 Round 1 — Research Chilling

The Epidemiologist Dilemma

Articles 2.1-2.3 restrict all health data collection without explicit consent. An epidemiologist studying a new infectious disease needs to analyze patient records retroactively to trace outbreak patterns. Many affected patients are deceased and cannot consent. Their estates argue privacy rights survive death. The research could prevent thousands of deaths from future outbreaks, but the code as written makes it impossible.

STEP 5 OF 6

⚖ Judge Evaluates Each Case

The Judge (temp: 0.3) evaluates each of the 6 cases. Click the verdict for the Missing Child Alert case — the hardest one this round:

CASE UNDER REVIEW OVERREACH #2 — Missing Child Alert

Should facial recognition be allowed to find a missing child?

The code has a blanket public facial recognition ban (Principle #4 / Article 7.1). If we create a "missing person" exception, we've opened a door: who defines "missing"? How soon after disappearance? What age threshold? Could this exception be weaponized to track adults fleeing abuse? Prior resolved Case #3 established that government surveillance exceptions require "imminent physical threat" — does a missing child qualify?

ROUND 1 CASE SUMMARY

4
AUTO-RESOLVED
2
ESCALATED
v2
NEXT VERSION
STEP 6 OF 6

🔧 Code Patched → Round 2 Begins

For the 4 resolvable cases, the Legislator revises the code. Here's the diff for the Loophole #2 fix (Statistical Profile patch):

DIFF — privacy_code.txt v1 → v2  |  Article 1.1: Personal Data Definition
Article 1: Definitions - 1.1 "Personal Data" means any information that - identifies or can be directly linked to a - natural person. + 1.1 "Personal Data" means any information that + identifies, can be directly linked to, or can + be re-identified with a natural person with + greater than 10% probability using any means + reasonably available to a competent adversary. + This includes probabilistic profiles, inferred + attributes, and derived data. 1.2 "Explicit Consent" means informed, affirmative, + 1.3 "Coercive Consent Design" means any UI/UX + pattern that obscures, discourages, or penalizes + the exercise of privacy rights, including but + not limited to: disproportionate friction on + decline options, false urgency, and confusing + consent flows. Consent obtained via Coercive + Consent Design is null and void.

✓ VALIDATION PASSED

Judge re-ran all 4 resolved cases against v2. All pass. No regressions. Code promoted to current version. Round 2 begins.

→ WHAT YOU LEARNED

You discovered that your privacy principle requires re-identification risk thresholds and coercion definitions — things you never explicitly stated, but clearly believe.

"The real output of Loophole isn't the legal code — it's what you discover about your own moral beliefs under adversarial pressure." — Brendan Hogan, Loophole README
Step 1 of 6

System Architecture

Click any component to expand its details, data structures, and role in the pipeline. Every component is interactive.

Human Input
Config
Session State
Agents Loop
HTML Report
⚙️
Config
config.yaml
📦
Session State
models.py
🤖
Agents
agents/
📜
Code History
LegalCode versions
⚖️
Case Log
Case[]
🔌
LLM Wrapper
llm.py

Connections to Bigger Ideas

Loophole isn't just a coding tool. It's a lens for understanding AI alignment, legal philosophy, security research, and moral epistemology.

🤖 Constitutional AI — Same Architecture, Transparent

Anthropic's Constitutional AI (CAI) is the method behind Claude's values. It uses a "constitution" of principles to guide AI behavior through adversarial self-critique and revision. Loophole implements the same structural logic — but makes it interactive, transparent, and human-in-the-loop.

  • Constitution = Moral Principles — Both systems start with high-level principles. CAI's constitution; Loophole's plain-text input.
  • Red-Teaming = Loophole/Overreach Agents — CAI uses adversarial prompting to find harmful outputs; Loophole uses dedicated agents to find logical failures.
  • Preference Data = Test Suite — CAI's accumulated human preference signals map to Loophole's growing corpus of resolved cases.
  • RLHF = Human Escalation — When automated resolution fails in both systems, a human provides the signal that shapes future behavior.
  • Key Difference: Transparency — CAI is a training process you can't watch. Loophole lets you see every attack and every patch in real-time.
"Loophole makes Constitutional AI interactive — you don't just define the constitution, you watch it get stress-tested and evolve in response to adversarial pressure. You see exactly where your beliefs break down." — Brendan Hogan, Loophole Repository

⚖️ Common Law Evolution — 235 Years in 5 Minutes

The Anglo-American common law system evolved through exactly the mechanism Loophole simulates: adversarial parties find edge cases, judges rule, those rulings become binding precedent (stare decisis), and subsequent rulings must remain consistent with the accumulated body of precedent.

  • Precedent = Resolved Cases — Every resolved case in Loophole becomes binding on future revisions, exactly like judicial precedent.
  • Adversarial System = Loophole/Overreach Agents — Prosecution and defense in real courts mirror the two adversarial agents — one finding what's permitted that shouldn't be, one finding what's prohibited that shouldn't be.
  • Judicial Review = Judge Agent — The Judge's validation step mirrors constitutional courts checking new legislation against prior rulings.
  • Human Escalation = Legislature — Cases that can't be resolved by courts get sent to the legislature (humans) to make new law.
  • Loophole finds in minutes what took centuries — The same adversarial process that took 235 years to produce modern privacy law can be run in a single session.

🔐 AI Red-Teaming — Security-Style Testing for Ethics

Red-teaming in cybersecurity means hiring ethical hackers to break your defenses before malicious actors do. AI red-teaming applies the same logic to AI systems — adversarially probing for harmful outputs before deployment. Loophole is AI red-teaming for moral frameworks.

  • Penetration Testing = Adversarial Agents — Just as pen testers systematically find exploits, Loophole's agents systematically find moral exploits.
  • CVE Database = Case Log — Security vulnerabilities get catalogued; Loophole catalogs moral vulnerabilities and their patches.
  • Regression Testing = Judge Validation — After patching a vulnerability, security teams retest all previous exploits to ensure no regressions. Loophole does the same.
  • Zero-Day Exploits = Genuine Dilemmas — Some security vulnerabilities are unfixable without breaking core functionality. UNRESOLVABLE cases are the moral equivalent.
  • Continuous Integration = Multi-Round Loops — Modern security uses continuous testing pipelines. Loophole's 10-round adversarial loop is analogous CI for ethics.

🧠 Moral Philosophy — Discovering What You Actually Believe

The deepest insight in Loophole is philosophical: you don't fully know your own moral beliefs until they face adversarial pressure. The real output isn't the legal code — it's self-knowledge.

  • Socratic Method — Socrates tested beliefs through adversarial questioning. Loophole is automated Socratic dialogue applied to ethics.
  • Reflective Equilibrium — John Rawls argued that moral reasoning involves cycling between intuitions and principles. Loophole implements this as a computational loop.
  • Trolley Problems at Scale — The escalated cases are structurally identical to philosophical thought experiments — but generated from your specific principles, not abstract scenarios.
  • Hidden Priors — You will discover beliefs you hold but never explicitly stated. Privacy means something different to you than you thought until a chatbot breaks it.
  • Consistency Checking — The precedent system forces consistency. You can't say "yes" to the missing child case but "no" to a structurally identical adult case without explicitly justifying the difference.
"You might start thinking you believe in absolute privacy, and discover through the process that you actually believe privacy is a strong default right with narrow exceptions. That's a genuinely different view — and you might not have known without running this." — Brendan Hogan, Loophole Repository

Taking It Further

Click any card to expand the full vision. Each extension represents a research direction that could meaningfully advance how AI systems encode human values.

🤖

Multi-Model Battles

Different LLMs for different adversarial roles

Currently all four agents use Claude. What if the Loophole Finder was GPT-4o (known for literal, logical reasoning), the Overreach Finder was Gemini (known for creative synthesis), and the Judge was Claude (known for careful judgment)? Each model has different failure modes — using them adversarially could surface attacks that a single-model system would never find. This mirrors how diverse teams outperform homogeneous ones in security research.

Research Question: Does model diversity produce qualitatively different attacks, or do all frontier models find the same loopholes?

↗ Click to expand

🏆

Tournament Mode

Pit moral frameworks against each other

Run Loophole simultaneously on multiple moral frameworks — utilitarian, deontological, virtue ethics, contractarian. After 10 rounds each, compare: which framework produced the most robust code? Which escalated the most cases? Which found genuine dilemmas the others resolved? This isn't just academic — it provides empirical evidence about which ethical frameworks are internally consistent under adversarial pressure.

Research Question: Is any moral framework "more complete" in the sense of fewer UNRESOLVABLE escalations?

↗ Click to expand

🎯

Coalition Attacks

Multiple adversarial agents collaborating

What if instead of one Loophole Finder, there were three — each with different attack strategies (one specializing in definitional exploits, one in technical workarounds, one in compound scenarios)? They could share findings and build on each other's attacks, the way red teams collaborate. The coalition would likely find loopholes that escape a single adversary, more closely mirroring how actual bad actors operate in coordinated groups.

Hypothesis: Coalition attacks find qualitatively different vulnerabilities — compound loopholes that require chaining multiple code weaknesses.

↗ Click to expand

📚

Historical Injection

Seed with real legal edge cases

Pre-seed the simulation with actual landmark court cases as initial test cases. For privacy: Katz v. United States (wiretapping), Carpenter v. United States (cell phone location), GDPR enforcement cases. The generated code must handle all these historical cases from Round 1, accelerating convergence to a robust framework and grounding the simulation in real-world complexity rather than purely hypothetical scenarios.

Application: Could reproduce the evolution of US privacy law from scratch, or test whether AI systems "rediscover" legal principles that took decades to establish.

↗ Click to expand

📊

Complexity Metrics

Track code entropy and loophole density

Instrument the system to track code evolution quantitatively: word count per version, number of exceptions per article, readability scores, number of defined terms, entropy of the definition graph. Plot these metrics across rounds to answer: does the code get more complex over time? Is there a complexity "ceiling"? Do some moral frameworks produce more complex code than others? This could reveal whether some principles are inherently more expressible in rule form than others.

Prediction: Complexity grows monotonically through rounds 1-6 then plateaus as all edge cases are captured. Some moral domains are inherently harder to codify than others.

↗ Click to expand

🌐

Community Platform

Share sessions, compare frameworks

A platform where users publish their Loophole sessions — the principles they started with, every attack, every resolution, and every escalation decision. A semantic search engine over the corpus of escalated cases would let researchers find structurally similar dilemmas across different users. A "moral consistency score" could compare two users' decision patterns. The aggregate data would constitute the world's largest empirical dataset on human moral reasoning under adversarial pressure.

Dataset potential: millions of human decisions on adversarially-generated moral dilemmas — unprecedented training data for value-aligned AI systems.

↗ Click to expand

🔢

Formal Verification

Mathematical proofs of moral consistency

After a Loophole session produces a stable legal code, translate it into formal logic (modal logic, deontic logic) and use theorem provers to check properties: Is it internally consistent? Does it entail any unintended consequences when combined with standard background assumptions? This bridges LLM-generated natural language with rigorous mathematical verification — the first step toward provably consistent AI ethics systems.

Technical challenge: mapping natural language legal code to deontic logic is itself a hard NLP problem. But LLMs are increasingly capable of this translation.

↗ Click to expand

🔄

Meta-Constitutional

Let the system improve its own constitution

Currently, humans provide the initial principles. What if a meta-level agent watched the escalation patterns across many sessions and proposed amendments to the initial moral principles themselves? "Users who wrote Principle X consistently resolve escalated cases in ways that imply Principle Y is also held. Consider adding it explicitly." This is recursive constitutional refinement — the constitution improving itself based on revealed preferences.

Risk: recursive self-modification without stable axioms could converge to arbitrary values. Requires careful human oversight at the meta-level.

↗ Click to expand

🔀

Cross-Domain Transfer

Apply privacy learnings to speech, property

The loopholes discovered in a privacy session often have structural analogues in other moral domains. The "re-identification" loophole in privacy has a speech analogue (technically anonymous statements that are clearly attributable). Can the system automatically identify these cross-domain parallels and pre-populate new sessions with structurally similar test cases? This would accelerate convergence and reveal deep structural patterns in moral reasoning across domains.

Research insight: If privacy and speech domains share loophole structure, this suggests underlying patterns in how humans reason about rights and exceptions.

↗ Click to expand

🎮

Real-Time Multiplayer

Multiple humans debate escalated cases

When a case is escalated, instead of one human deciding, put it to a panel of 5-100 humans who debate and vote. Show them each other's arguments. Let them see how their vote compares to aggregate results. The final decision could be a supermajority (67%), or a consensus mechanism, or a weighted vote by domain expertise. This makes Loophole a platform for collective moral deliberation — more like a constitutional convention than individual decision-making.

Democratic theory implication: this implements a form of deliberative democracy for AI value alignment — the AI constitution is built by actual collective human deliberation.

↗ Click to expand

File Explorer

Click any file to see what it does, key code excerpts, and how it connects to the rest of the system. This is the actual project structure.

📁 loophole/
📂 loophole/
📄 main.py
📄 models.py
📄 llm.py
📄 prompts.py
📄 session.py
📄 visualize.py
📂 agents/
📄 base.py
📄 legislator.py
📄 loophole_finder.py
📄 overreach_finder.py
📄 judge.py
📂 sessions/
⚙️ config.yaml
📘 README.md