Lessons from 30 Years of Open Source for Running a Swarm of Coding Agents

Community Article Published April 9, 2026

The people behind Linux, SQLite, Django, and Rust spent decades building open source ecosystems that flourish with unpredictable contributors. If you're challenging yourself to scale to dozens of parallel coding agents, they solved your problem first. Here is how.

Running open source projects with dozens of contributors taught me that the quality of what contributors produce depends less on who they are than on the environment you build for them. The tests, the review process, the structure that guides work before it starts. Those are what scale.

When I started running dozens of coding agents in parallel, I recognized the same dynamic. The contributor changed. The problem didn't.

So I went back to the people who solved this at genuine scale, the maintainers of Linux, SQLite, Django, and Rust, and tried to translate what they learned into something useful for the rest of us.

In 1991, Linus Torvalds posted to the comp.os.minix newsgroup announcing a hobby operating system he was building. "Just a hobby, won't be big and professional," he said. Over the next three decades, what grew around that codebase became the most complex collaborative engineering project in human history. Millions of lines of code. Thousands of contributors. A set of practices for maintaining quality across all of it, developed slowly, expensively, and mostly by watching things go wrong. The hard part wasn't getting thousands of people to contribute. It was figuring out how to let them, without destroying what was already there, what others were building at the same time, or where the whole thing was supposed to go.

Most developers using coding agents today are recreating that situation in miniature, without the thirty years.

When you run a single agent on a task, you are a developer with a fast assistant. When you run five in parallel, each opening pull requests on separate branches, something more fundamental changes. You are no longer primarily writing code. You are managing contributors. That is a role with its own distinct demands, its own failure modes, and a rich body of knowledge attached to it, one that almost no one brings to the conversation about agents. As we will see, better coding agents will not solve this. Better agents just make the coordination problem arrive sooner and more intensely.

The illusion of scale

More agents feels like more output. It isn't. It's more review burden, and the two things feel identical right up until they don't.

Nadia Eghbal spent four years studying this dynamic, first independently with Ford Foundation funding, then at GitHub, then at Protocol Labs, interviewing hundreds of developers and analyzing open source projects by contribution volume [1]. Her central finding cuts against the intuition that drives most people toward parallel agents:

"I started to see the problem is not that there's a dearth of people who want to contribute to an open source project, but rather that there are too many contributors, or they're the wrong kind of contributors."

One study she cites found that in more than 85% of the GitHub projects researchers examined, fewer than 5% of developers were responsible for over 95% of code and social interactions [1]. Adding contributors to these projects didn't improve that ratio. It made it worse. Each new contributor added review burden without reliably adding review capacity. The maintainers who understood this stopped treating contribution volume as a success metric. The ones who didn't burned out.

A developer who spins up five parallel agents has, in that moment, created a small open source project with all the coordination overhead that implies. The bottleneck moves to the human reviewer immediately. Nothing about agent velocity changes that.

Everything that isn't code

There is a persistent fantasy about open source: that the best projects run themselves, held together by shared enthusiasm and the invisible hand of meritocracy. Karl Fogel spent years watching that fantasy collide with reality.

Fogel managed the creation of Subversion, the version control system that was widely adopted across the open source world before Git arrived. His book Producing Open Source Software [2] is the most systematic account of what actually keeps a project coherent, written by someone who had to figure it out in practice:

"Things won't simply run themselves, much as we might wish otherwise. But the management, though it may be quite active, is often informal and subtle."

The projects that looked self-organizing were always the ones where someone was quietly making hard decisions: what to reject, what to prioritize, what the project was not going to become. It doesn't show up in commit history. It doesn't get celebrated. But it is what holds a project together, and nothing in the way most people run agents replaces it.

Agents execute with precision the direction they are given. When that direction is absent or vague, each agent reasons from what it has: the immediate task, the surrounding code, its training. Those individual reasonings are coherent in isolation. Across five parallel agents working without shared context, they diverge. The resulting codebase compiles, the tests pass, and nothing quite fits together.

Direction before code

The most important part of that invisible work is making direction explicit before anyone starts building. The Rust programming language built a structural answer to this into its governance from the start. Every substantial change to the language requires a written RFC (Request for Comments) [3], a proposal argued in prose before anyone writes implementation code. The reason is simple: a pull request is the worst possible place to discover that the approach is wrong. By the time code exists, the author is invested, the reviewer is reading a diff rather than evaluating a direction, and reverting is expensive.

"A hastily-proposed RFC can hurt its chances of acceptance."

The process asks that substantial changes go through a structured review and produce consensus among the community before implementation begins.

Making an agent write a plan before touching any files is not overhead. It is the conversation that should have happened before the code existed.

What the RFC system also does, and what is easy to overlook, is make rejection legitimate. When the Rust team closes an RFC after a formal comment period, the contributor can be disappointed without feeling personally dismissed. The process absorbed the rejection. Without that structure, every "no" falls on an individual who has to explain themselves, defend their judgment, and absorb the frustration of the person whose work they're turning down.

Jacob Kaplan-Moss co-created Django, which was open-sourced in 2005, and led it for nearly a decade. When he stepped down as co-BDFL (Benevolent Dictator for Life, the person with final say on all decisions), his reasoning illuminated the cost of concentrating all those decisions in one person [4]:

"When I stepped down as BDFL, it was because I believed that the BDFL model of governance had run its course. I believed that for the project to continue to grow, it needed a more democratic leadership model."

Open source projects don't fail from too few contributions. They fail from too many undirected ones, and from the burnout of the people managing them. The work of deciding what not to build, of saying no to good ideas that are wrong for this codebase, lands entirely on the developer reviewing agent output. It is invisible, uncelebrated, and compounds with every agent running.

What survives

Return to a codebase your agents worked on three weeks ago. Much of it will feel opaque. The intent behind it is gone, and there is no one to ask. This is a problem open source communities solved long ago, and the solution is not what most people expect.

In a large project, the person submitting a patch today will never speak to the person who inherits that code in three years. No meeting, no handoff, no shared context beyond what is written down. This forces a discipline that most individual developers never develop, because they never need to. Kaplan-Moss stated the sharpest version of it [5]:

"Code without tests is broken by design."

The line gets quoted as a rule about quality. It is a statement about communication. When a contributor writes a test, they leave a message for everyone who comes after: this is what this code is supposed to do, this is the behavior you cannot break, this is the edge case that matters. That message survives refactors, maintainer turnover, and years. Comments rarely do. Django and CPython require tests with contributions not because they distrust contributors but because the test suite is the only durable record of intent in a codebase that thousands of people will touch over decades.

For agents, this is not aspirational. It is structural. The test suite is the primary channel through which intent travels. An agent working in a well-tested codebase has a rich record of what the code is supposed to do and what it is not allowed to break. An agent working in a codebase with weak coverage is working from the code alone, without the layer of documented expectation that makes the code's purpose legible.

Simulating open source in fast motion

The strongest open source projects are not shaped by their developers alone. They are shaped by their users. Every issue filed is a signal: what people actually use, what they expected to happen, what they care about enough to report. The issue tracker is the richest source of user intent a project has, and in the best projects, that intent gets encoded permanently. Each issue becomes a regression test. Over time, the test suite becomes a living record of what users actually needed, not what developers imagined they might.

SQLite is the purest expression of this. Richard Hipp's project, now running on billions of devices, maintains a 590:1 ratio of test code to library code [8]. Their policy is absolute: a bug is not considered fixed until new test cases that would exhibit the bug have been added to the test suite. Every real user problem becomes a permanent test. When SQLite achieved full MC/DC (Modified Condition/Decision Coverage, an aviation-grade testing standard) coverage driven by real-world Android deployment, they essentially stopped receiving bug reports [8].

Greg Kroah-Hartman, who has maintained the Linux kernel stable branch for over two decades [6], captured why this matters from the user's side:

"Users will not update if they are afraid you will break their current system."

Linux still runs decades-old binaries. Not because backward compatibility is easy but because breaking it would destroy trust that took years to build. The implicit contracts of a codebase, the ones that live in user scripts and tutorials and years of established behavior, are not visible in the code itself. They only become visible when a real user runs into them.

When you are developing fast with a team of agents, the community that would surface these problems does not exist yet. You are shipping faster than your users can react. So you have to build that feedback loop yourself. Before every release, have agents simulate real users: exercise the workflows people depend on, file issues for every problem they hit, and spawn agents to fix each one. You are running an open source community in fast motion, with agents playing the role of the users who haven't arrived yet.

Earning the right to touch it

The Linux kernel does not give new contributors access to critical subsystems on day one. They start with small, bounded changes, often in a dedicated staging area that Kroah-Hartman maintains specifically as an on-ramp: lower bar, messier code accepted, a place to demonstrate that you understand the process before you are trusted with something that matters more.

The subsystem maintainer who accepts a patch is not just accepting code. They are vouching for it to the people above them in the review chain, staking their own reputation on someone else's work. Of the tens of thousands of patches in a typical kernel release, Linus Torvalds directly handles a small fraction. The rest flows through a hierarchy of earned trust, built through demonstrated track record in specific parts of a specific codebase, over time.

A skilled engineer new to a project is still new to that project. They might be technically brilliant and still not know which interfaces have undocumented assumptions, which architectural decisions look wrong but were made deliberately, which files are intentionally pristine and which are intentionally rough. That knowledge comes from working in a codebase incrementally, making small mistakes with small consequences, earning the right to larger responsibilities before taking them.

Every agent begins fresh in a codebase. The situational knowledge described above is not in the code. It accumulates through working in a project over time, and it is not something an agent arrives with. Giving an agent a large cross-cutting task without supplying that context is the equivalent of asking someone who joined the company this morning to redesign the authentication system. The capability may well be there. But capability without context is just speed in the wrong direction.

Diomidis Spinellis and colleagues confirmed empirically in 2024, across large C and Java codebases, that the quality of existing code in a file shapes the quality of new contributions to that file, measurably and consistently [7]. As they put it: "disorder sets descriptive norms and signals behavior that further increases it." The codebase you hand to an agent shapes what it returns. Cleaning up before delegating is not housekeeping. It determines what you receive.

What this looks like in practice

These lessons translate into concrete habits:

Document plans and principles where agents can find them. Direction should be explicit and written down, not discovered across a series of pull requests. If the approach and the reasoning behind it haven't been argued in prose before code exists, your agents will each invent their own direction.
Start agents on small tasks and expand as your process holds. Your first parallel agents will expose every gap in your workflow: missing tests, brittle continuous integration, unclear review criteria. Fix those gaps first. Scale only after the process survives contact with real agent output.
Make agents communicate through tests. Agents working in parallel cannot talk to each other, but they can read and write tests. Instruct each agent to replicate the issue as a test, solve it, and write more tests around every assumption it made. The next agent that touches that code inherits those messages. This is how parallel agent work stays coherent.
Have agents simulate your users. In open source, every issue filed is a signal: what users care about, how they use the software, what they expected to happen. Your agents can play that role. Along the process, have agents exercise real user workflows, file issues for every problem they hit, and spawn agents to fix each one. You are building your own open source community in miniature.
Require agents to leave code cleaner than they found it. Configure your agents to flag code smells, not contribute to them. Disorder compounds. An agent that silently works around a mess guarantees the next agent will too.

Fogel watched collaboration collapse from invisible management failures. Kroah-Hartman built the trust infrastructure that now handles thousands of kernel patches a week without any single person reviewing them all. Kaplan-Moss ran Django long enough to see the model he had built stop working, and had the honesty to dismantle it.

The hard problems were never technical. Version control, continuous integration, code review: these solved the mechanical parts. What persisted, what required years of iteration to address, were coordination problems. How do you maintain quality and direction when you cannot control who contributes, when they contribute, or how much they understand about what they are changing?

Coding agents do not dissolve these problems. They surface them faster, at greater scale, and without the natural pacing that comes with human collaboration: the deliberation before a big change, the conversation before something irreversible, the shared sense that a decision deserves more thought. Building that pacing back into the workflow deliberately is part of the work.

The developers who navigate this well won't be the ones with the best tools. They'll be the ones who recognize, the moment they run a second agent, what kind of problem they've actually signed up for.

[1] Nadia Eghbal, Working in Public: The Making and Maintenance of Open Source Software (Stripe Press, 2020).

[2] Karl Fogel, Producing Open Source Software (O'Reilly, 2005; second edition 2017–2020).

[3] Rust core team, rust-lang/rfcs (2014–present).

[4] Jacob Kaplan-Moss, "Django, Teknival, and Governance" (2020).

[5] Jacob Kaplan-Moss, quoted in Django documentation.

[6] Greg Kroah-Hartman, "Lessons for Developers from 20 Years of Linux Kernel Work" (The New Stack, 2020).

[7] Diomidis Spinellis et al., "Broken Windows: Exploring the Applicability of a Controversial Theory on Code Quality" (ICSME, 2024).

[8] SQLite project, "How SQLite Is Tested". Richard Hipp, CoRecursive podcast, episode 066.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote