Every project has a contribution threshold. Contributions above it help the project. Contributions below it hurt it. There is no neutral zone. A change either moves the project forward or adds drag that someone else has to compensate for later.
The contribution threshold is never automatically verifiable. No CI pipeline, no linter, no type checker can tell you whether a change actually helps. They can tell you it does not break anything they know about. That is a much weaker statement.
The contribution threshold is the actual bar. Does this change make the project better? Is the abstraction right? Does the naming communicate intent? Is the scope appropriate? Will this be maintainable in six months? No tool answers these questions. Humans do.
The automation threshold is what your project-specific tooling catches. Your test suite, your linters, your static analysis, your mutation testing. Everything you have built or configured on top of what the language gives you. This is where CI lives.
The ecosystem threshold is what your language gives you for free. In Ruby, the baseline is: it parses and boots. In TypeScript, the compiler enforces some structural contracts. In Rust, lifetimes and ownership are checked. In Haskell, the type system encodes more invariants still. The ecosystem threshold varies dramatically across languages. You inherit it the moment you choose your stack.
The space between the automation threshold and the contribution threshold is filled entirely by human discipline. Code review, architectural judgment, taste, experience. This is the most volatile and expensive resource on any team.
Human discipline is not a constant. It varies across people, across days, and across hours. After a bad night of sleep it is low. After deep focus it peaks. One disruptive meeting and it drops again. You cannot budget for it because you cannot predict it.
The wider the discipline gap, the more your project quality becomes a function of how well someone slept. That is not engineering. That is luck.
Once you see the three thresholds, every tooling decision becomes a gap question. Does this investment shrink the discipline gap or not?
Adding mutation testing to CI raises the automation threshold. Adding a type system raises the ecosystem threshold. Both directly reduce the discipline gap. Both make your project less dependent on human consistency.
An unenforced style guide does not raise either threshold. It adds one more thing humans have to remember, widening the gap by adding surface area for discipline to fail on. Enforce it with a linter and it becomes an automation threshold investment. Leave it as a document and it is a discipline tax.
Hiring experienced developers puts better humans in the gap. Their discipline is real and valuable. But it is wasted if it is spent primarily on filling the gap instead of closing it. The highest-leverage use of experienced developers is raising the automation threshold: better tooling, better abstractions, better ecosystem choices. Spending their consistency on manual review that automation could handle is a misallocation.
The question for every process, tool, and decision: does this move the automation threshold closer to the contribution threshold, or does it add more surface area that discipline has to cover?
The ecosystem threshold is the one decision that determines how much work everything else has to do. A language with a weak baseline means your automation has more ground to cover just to reach parity with what another language gives for free.
In a dynamically typed language, your CI has to verify things that a compiler would catch in a typed language. Your test suite has to cover type errors, nil checks, structural mismatches. These are not your domain problems. They are the tax you pay for your ecosystem choice.
This is not an abstract preference. It is an engineering budget decision. Every hour spent building automation to compensate for a low ecosystem threshold is an hour not spent pushing the automation threshold closer to the contribution threshold. The gap that actually matters gets the least investment because the baseline demands so much.
Everything above was already true before LLMs. It was survivable because the rate of contributions was bounded by human typing speed. The discipline gap was expensive but the volume flowing through it was manageable.
LLMs changed two things at once. They increased the volume of contributions. And they made each contribution non-deterministic in quality.
A human contributor has a relatively stable skill level. Their discipline varies, but you can at least calibrate for the skill part. You know what a given person is capable of on a good day. You just cannot guarantee they are having one.
An LLM has no stable skill level. The same prompt can produce a contribution above the contribution threshold or below it. You cannot predict which. The output is not bad or good. It is uncertain.
This means the discipline gap now has to absorb more volume and more variance per contribution. Human review was already the bottleneck. Now each review also requires evaluating whether the contribution clears the threshold at all, not just whether it passes CI.
LLMs can absolutely help get above the contribution threshold. They can produce good abstractions, clean implementations, well-structured code. But they can also produce plausible code that passes every automated check and still hurts the project. The problem is that both outcomes look the same until a human applies judgment.
The discipline gap does not scale. The number of contributions flowing through it does, and each one carries more uncertainty than before. The answer is not more reviewers. It is a higher automation threshold. The less you leave to human judgment, the less the non-determinism of the contributor matters, whether that contributor is a tired human or a stochastic model.
For more on what happens when LLM-generated code meets weak verification, see Pattern Parrots and the Semantic Knot.
Be conscious that the contribution threshold exists. It is invisible, it is never automatically verifiable, and every contribution lands on one side of it or the other.
Measure your tooling decisions against the discipline gap. The goal is not more process. The goal is less surface area where human consistency is the only thing between your project and regression. Types, mutation testing, static analysis: these are not luxuries. They are the mechanisms that push the automation threshold closer to the contribution threshold.
And when you get to choose your stack: choose one where the ecosystem does more of the work. The gap between automation and contribution is hard enough to close. Do not start from a lower floor than you have to. If you are already on a weak ecosystem, at a minimum be conscious about it and invest in mitigating. The worst position is a low ecosystem threshold that nobody is actively compensating for.
The next post in this series will explore why the contribution threshold cannot be evaluated from a distance, and why leadership demands dirty hands.