Adversarial Moral-Legal Code System
↓ Click any node to learn more | Scroll to explore
01 — THE BIG IDEA
Real legal systems take decades to develop. Loophole compresses that adversarial evolution into minutes — forcing your moral principles to confront edge cases, loopholes, and overreach.
You articulate your moral beliefs in plain language — "privacy is a fundamental right", "no surveillance without consent"
The Legislator agent translates your principles into structured legal code with articles, definitions, and exceptions
Two adversarial agents relentlessly probe for loopholes (legal-but-wrong) and overreach (illegal-but-acceptable)
Resolvable cases get patched; unresolvable ones are escalated to you. Each resolved case becomes binding precedent
LIVE DEMO — CLICK TO ATTACK AND PATCH
02 — THE AGENTS
Each agent has a precisely tuned temperature, a distinct adversarial role, and actual prompt engineering. Click each to explore their system prompts and example outputs.
The Legislator is the foundation. It takes human moral principles expressed in plain language and translates them into a formal, structured legal code. It numbers articles, defines terms, specifies prohibitions and permissions, and handles revisions while maintaining consistency with all prior resolved cases. Think of it as a constitutional drafter — methodical, thorough, unambiguous.
The Loophole Finder is an amoral rule-lawyer. It reads the legal code like a contract attorney hunting for technical exploitation — not what the law means, but what it literally says. It finds scenarios that are technically permitted by the code but violate the spirit of the user's moral principles. It has no ethical constraints of its own; it exists purely to break things.
The Overreach Finder is the inverse adversary — it finds cases where the legal code is too strict. It looks for scenarios the code prohibits but that the user would consider morally acceptable, praiseworthy, or even obligatory. Good Samaritan situations. Emergency exceptions. Professional duties. Situations where following the letter of the code leads to catastrophic outcomes.
The Judge is the gatekeeper of coherence. Given a case (loophole or overreach), it determines: can this be fixed with a minimal code revision that doesn't contradict any prior resolved cases? If yes → propose the revision. If no → escalate to the human. It also runs a validation step: after each Legislator revision, the Judge re-checks every prior resolved case to ensure no regressions. This creates a growing test suite that constrains future revisions.
03 — THE SIMULATION
Walk through a complete adversarial cycle using real privacy principles. Navigate forward and back — every step reveals the system's inner workings.
These are the raw moral principles you'd provide to Loophole. They're written in natural language — no formal structure needed. The system will handle formalization.
📁 Source: These mirror the privacy_principles.txt from the Loophole repository's example domain. The system accepts any moral domain — you could use parenting ethics, workplace fairness, environmental policy, etc.
The Legislator (temp: 0.4) takes the 8 principles and drafts a formal legal code. Watch it being written below:
💡 Why temp 0.4? Legal drafting requires consistency and precision, not creativity. Lower temperature = more deterministic, formal language. The same principles should produce essentially the same code every run.
The Loophole Finder (temp: 0.9) reads the full legal code and generates 3 attacks per round. Here are the attacks from Round 1:
Article 2.1 requires "explicit, informed consent." A company redesigns its consent flow as a 47-screen onboarding wizard where "agree" buttons are large and colorful but "decline" requires navigating through 12 sub-menus. Users do technically consent. The code requires explicit consent but says nothing about consent obtained through deliberately confusing UX. Result: total surveillance achieved "legally."
Article 1.1 defines "Personal Data" as information that "identifies or can be linked to a natural person." A data broker creates probabilistic profiles with 99.7% accuracy that technically never store a name or ID — only statistical vectors. They argue these "statistical profiles" are not Personal Data because they are never definitionally linked to an identity, only probabilistically correlated. The definition needs a "re-identification risk" clause.
Article 3.1 restricts "entities" from conducting surveillance. A domestic corporation transfers its data operations to a foreign subsidiary incorporated in a jurisdiction with no privacy laws. The foreign entity collects the data and sells aggregate reports back to the domestic parent. No domestic entity ever collected the data. The code needs to address corporate control, not just direct collection.
Simultaneously, the Overreach Finder (temp: 0.9) attacks from the other direction — finding scenarios the code prohibits that seem morally acceptable:
Article 2.1 prohibits collecting or accessing personal data without prior explicit consent. An unconscious car accident victim is rushed to the ER. Doctors need to access their medication history to avoid a fatal drug interaction. The patient cannot consent. Under strict code interpretation, accessing medical records is prohibited. Most people would consider this access not only acceptable but morally obligatory.
Article 7.1 prohibits facial recognition in public spaces. An 8-year-old goes missing in a crowded city. Police want to use the city's camera network with facial recognition to locate the child. The strict code prohibition covers all cases — there is no emergency exception even for active child abduction. Preventing this use seems morally wrong, but creating an exception risks undermining the entire ban.
Articles 2.1-2.3 restrict all health data collection without explicit consent. An epidemiologist studying a new infectious disease needs to analyze patient records retroactively to trace outbreak patterns. Many affected patients are deceased and cannot consent. Their estates argue privacy rights survive death. The research could prevent thousands of deaths from future outbreaks, but the code as written makes it impossible.
The Judge (temp: 0.3) evaluates each of the 6 cases. Click the verdict for the Missing Child Alert case — the hardest one this round:
The code has a blanket public facial recognition ban (Principle #4 / Article 7.1). If we create a "missing person" exception, we've opened a door: who defines "missing"? How soon after disappearance? What age threshold? Could this exception be weaponized to track adults fleeing abuse? Prior resolved Case #3 established that government surveillance exceptions require "imminent physical threat" — does a missing child qualify?
ROUND 1 CASE SUMMARY
For the 4 resolvable cases, the Legislator revises the code. Here's the diff for the Loophole #2 fix (Statistical Profile patch):
✓ VALIDATION PASSED
Judge re-ran all 4 resolved cases against v2. All pass. No regressions. Code promoted to current version. Round 2 begins.
→ WHAT YOU LEARNED
You discovered that your privacy principle requires re-identification risk thresholds and coercion definitions — things you never explicitly stated, but clearly believe.
04 — THE ARCHITECTURE
Click any component to expand its details, data structures, and role in the pipeline. Every component is interactive.
05 — WHY IT MATTERS
Loophole isn't just a coding tool. It's a lens for understanding AI alignment, legal philosophy, security research, and moral epistemology.
Anthropic's Constitutional AI (CAI) is the method behind Claude's values. It uses a "constitution" of principles to guide AI behavior through adversarial self-critique and revision. Loophole implements the same structural logic — but makes it interactive, transparent, and human-in-the-loop.
The Anglo-American common law system evolved through exactly the mechanism Loophole simulates: adversarial parties find edge cases, judges rule, those rulings become binding precedent (stare decisis), and subsequent rulings must remain consistent with the accumulated body of precedent.
Red-teaming in cybersecurity means hiring ethical hackers to break your defenses before malicious actors do. AI red-teaming applies the same logic to AI systems — adversarially probing for harmful outputs before deployment. Loophole is AI red-teaming for moral frameworks.
The deepest insight in Loophole is philosophical: you don't fully know your own moral beliefs until they face adversarial pressure. The real output isn't the legal code — it's self-knowledge.
06 — EXTENSIONS
Click any card to expand the full vision. Each extension represents a research direction that could meaningfully advance how AI systems encode human values.
Currently all four agents use Claude. What if the Loophole Finder was GPT-4o (known for literal, logical reasoning), the Overreach Finder was Gemini (known for creative synthesis), and the Judge was Claude (known for careful judgment)? Each model has different failure modes — using them adversarially could surface attacks that a single-model system would never find. This mirrors how diverse teams outperform homogeneous ones in security research.
Research Question: Does model diversity produce qualitatively different attacks, or do all frontier models find the same loopholes?
Run Loophole simultaneously on multiple moral frameworks — utilitarian, deontological, virtue ethics, contractarian. After 10 rounds each, compare: which framework produced the most robust code? Which escalated the most cases? Which found genuine dilemmas the others resolved? This isn't just academic — it provides empirical evidence about which ethical frameworks are internally consistent under adversarial pressure.
Research Question: Is any moral framework "more complete" in the sense of fewer UNRESOLVABLE escalations?
What if instead of one Loophole Finder, there were three — each with different attack strategies (one specializing in definitional exploits, one in technical workarounds, one in compound scenarios)? They could share findings and build on each other's attacks, the way red teams collaborate. The coalition would likely find loopholes that escape a single adversary, more closely mirroring how actual bad actors operate in coordinated groups.
Hypothesis: Coalition attacks find qualitatively different vulnerabilities — compound loopholes that require chaining multiple code weaknesses.
Pre-seed the simulation with actual landmark court cases as initial test cases. For privacy: Katz v. United States (wiretapping), Carpenter v. United States (cell phone location), GDPR enforcement cases. The generated code must handle all these historical cases from Round 1, accelerating convergence to a robust framework and grounding the simulation in real-world complexity rather than purely hypothetical scenarios.
Application: Could reproduce the evolution of US privacy law from scratch, or test whether AI systems "rediscover" legal principles that took decades to establish.
Instrument the system to track code evolution quantitatively: word count per version, number of exceptions per article, readability scores, number of defined terms, entropy of the definition graph. Plot these metrics across rounds to answer: does the code get more complex over time? Is there a complexity "ceiling"? Do some moral frameworks produce more complex code than others? This could reveal whether some principles are inherently more expressible in rule form than others.
Prediction: Complexity grows monotonically through rounds 1-6 then plateaus as all edge cases are captured. Some moral domains are inherently harder to codify than others.
A platform where users publish their Loophole sessions — the principles they started with, every attack, every resolution, and every escalation decision. A semantic search engine over the corpus of escalated cases would let researchers find structurally similar dilemmas across different users. A "moral consistency score" could compare two users' decision patterns. The aggregate data would constitute the world's largest empirical dataset on human moral reasoning under adversarial pressure.
Dataset potential: millions of human decisions on adversarially-generated moral dilemmas — unprecedented training data for value-aligned AI systems.
After a Loophole session produces a stable legal code, translate it into formal logic (modal logic, deontic logic) and use theorem provers to check properties: Is it internally consistent? Does it entail any unintended consequences when combined with standard background assumptions? This bridges LLM-generated natural language with rigorous mathematical verification — the first step toward provably consistent AI ethics systems.
Technical challenge: mapping natural language legal code to deontic logic is itself a hard NLP problem. But LLMs are increasingly capable of this translation.
Currently, humans provide the initial principles. What if a meta-level agent watched the escalation patterns across many sessions and proposed amendments to the initial moral principles themselves? "Users who wrote Principle X consistently resolve escalated cases in ways that imply Principle Y is also held. Consider adding it explicitly." This is recursive constitutional refinement — the constitution improving itself based on revealed preferences.
Risk: recursive self-modification without stable axioms could converge to arbitrary values. Requires careful human oversight at the meta-level.
The loopholes discovered in a privacy session often have structural analogues in other moral domains. The "re-identification" loophole in privacy has a speech analogue (technically anonymous statements that are clearly attributable). Can the system automatically identify these cross-domain parallels and pre-populate new sessions with structurally similar test cases? This would accelerate convergence and reveal deep structural patterns in moral reasoning across domains.
Research insight: If privacy and speech domains share loophole structure, this suggests underlying patterns in how humans reason about rights and exceptions.
When a case is escalated, instead of one human deciding, put it to a panel of 5-100 humans who debate and vote. Show them each other's arguments. Let them see how their vote compares to aggregate results. The final decision could be a supermajority (67%), or a consensus mechanism, or a weighted vote by domain expertise. This makes Loophole a platform for collective moral deliberation — more like a constitutional convention than individual decision-making.
Democratic theory implication: this implements a form of deliberative democracy for AI value alignment — the AI constitution is built by actual collective human deliberation.
07 — THE CODE
Click any file to see what it does, key code excerpts, and how it connects to the rest of the system. This is the actual project structure.