pepperd — papers, but you'll actually finish them

§1brainrot tldr

what if the AI didn't just learn to do the task better — what if it rewrote the part of itself that decides how to learn, and then rewrote that too. hyperagents are self-referential programs where the task agent (does the thing) and the meta agent (improves the thing) are folded into one editable blob of code. crucially, the meta-level improvement process is itself editable. so the system isn't just learning — it's learning how to learn how to learn. and somehow, it actually works.

§2key findings

—hyperagents fold task agent and meta agent into a single editable program — the system can rewrite both how it solves problems and how it improves itself
—the meta-modification procedure is itself editable: genuine recursive self-improvement, not just fine-tuning on feedback loops
—extends the Darwin Gödel Machine, dropping the domain-specific alignment assumption — theoretically self-improvable on any computable task
—outperforms baselines without self-improvement across diverse domains: coding benchmarks, robotics reward design, paper review pipelines
—emergent engineering tools appeared without explicit instruction — persistent memory and performance tracking were invented by the improvement loop, not the researchers
—meta-level gains transfer across domains and stack across runs: improvements compound rather than reset

§3interactive visual

figure 1 — the self-improvement loop

step through the cycle to see how a hyperagent improves itself.

task agent

solves the problem

↓ performance score logged

↓ meta agent reads code + history

meta agent

improves the system

↙
edits task agent

↺
edits itself

task agent runs

the task agent attempts the problem and produces an output. its performance is scored and logged for the meta agent to read.

1 / 4

figure 2 — gains compound across runs

drag the slider to see how performance diverges as improvement cycles stack up.

improvement runs completed5

baseline (no self-improvement)56%

DGM (task-level only)65%

DGM-H (hyperagent)72%

illustrative curves based on paper trends — not exact reported numbers

§4comprehension check

peer review quiz

[REVIEWER 2 DEMANDS YOU ANSWER THESE]

question 1

what makes a hyperagent fundamentally different from a regular self-improving AI?

question 2

the paper drops an assumption made by the original Darwin Gödel Machine. which one?

question 3

which of these emergent behaviors appeared in DGM-Hyperagents without being explicitly programmed?

question 4

what happens to meta-level improvements across different domains and runs?