the obvious question when building multi-agent AI systems is: how do you organize them? you could have one boss agent that delegates to workers (centralized), let every agent do whatever they want (fully autonomous), or something in between. the obvious answer is that you should design the structure carefully. this paper says: actually, no.
after 25,000 tasks, 8 models, 4–256 agents, and 8 coordination protocols, the winner is what the paper calls "Sequential" — a hybrid where the order agents speak in is fixed, but which role each agent plays is chosen by the agents themselves. this beats full central control by 14% and fully autonomous protocols by 44%. the authors call this the endogeneity paradox: the optimal protocol is neither designed nor free — it's constrained freedom.
what makes this more interesting than a benchmark win is the emergent behavior. from just 8 agents, the system invented 5,006 unique roles across tasks, including agents that voluntarily chose not to contribute when they had nothing useful to add. spontaneous hierarchy appeared without anyone programming it in.
step through the three main architectures and see how they stack up.
a coordinator agent breaks the task into subtasks, assigns roles to workers, and aggregates results. clean and predictable, but the coordinator is a bottleneck and role mismatches are common.
drag to see how each protocol scales. more agents keep helping — but with diminishing returns.
illustrative curves based on paper trends — not exact reported numbers
[REVIEWER 2 DEMANDS YOU ANSWER THESE]
what is the 'endogeneity paradox' described in the paper?
which protocol won across the 25,000-task experiment?
what happens to the relationship between structure and performance when a model falls below the capability threshold?
which of these emergent behaviors appeared without being explicitly programmed?