Fast Feedback Beats Late Inspection: The Economics of Developer Testing

2. December 2025

Eight years ago, I sat through a project management lecture on systems thinking. It seemed abstract, theoretical, one of those academic frameworks you nod along with but never actually use. Then Will Larson reintroduced me to it through his blog, and suddenly it clicked. Systems thinking isn’t just theory, it’s a useful tool for understanding complex dynamics.

Recently, I encountered one of those debates that every engineering organization eventually faces: do we need a dedicated QA role? Not whether testing matters, everyone agrees it does, but whether we need separate people whose job is to test what developers build. The conversation went exactly where these conversations often go: passionate opinions, competing anecdotes, no clear resolution.

So I decided to do something different. Instead of arguing from intuition, I’d model both approaches using systems dynamics and see what the data actually shows.

The Real Question Isn’t “To Test or Not to Test”

Let me be clear upfront: Adding a QA organization doesn’t automatically improve software quality. Quality is created during development. Developers must test their own work. QA is an additional gate, and whether that gate helps or hurts depends entirely on whether it enables or replaces developer ownership.

But here’s where it gets interesting: the value of that additional gate depends heavily on context. Are you distributing software to a single customer who does their own rigorous testing? Are you shipping microservices to millions of users where real-world feedback is the only meaningful test? When I started my apprenticeship sixteen years ago, we burned software releases onto DVDs and shipped them to hundreds of customers. That’s a fundamentally different scenario than continuous deployment to cloud infrastructure.

The point is, just slapping a QA step into your workflow doesn’t solve quality problems. The question is: what system behaviors emerge from different testing approaches?

Building the Models

I used Will Larson’s systems dynamics language to model two distinct scenarios. These aren’t recommendations. They visualize and describe reality, modeling different philosophies about how quality happens.

Important

These models use systems dynamics simulation to explore how different quality practices create different system behaviors over time. The numbers are illustrative, but the research consensus is clear: developers who test their own code produce significantly higher quality software, and fast feedback loops are inherently more effective than slow ones.

Scenario 1: Developers Own Quality

This model assumes developers can and will invest in quality. They work at a sustainable pace at 10 items per round and maintain a reasonable defect rate at 20%. This isn’t wishful thinking; it reflects an environment where developers have time to think, test, and validate their work.

When defects are created, 75% get caught immediately through developer testing. Rework happens quickly at 8 items per round. Defects caught in development are dramatically cheaper to fix than those found later, and the tight feedback loop keeps work-in-progress low.

This assumes developers have the tools, skills, and culture to test their own work effectively. It’s not automatic, it requires investment.

Even with good practices, some defects (50% of what escapes dev) reach customers. No process is perfect. However, customer defects are processed fairly quickly at 0.5 items per round.

[Backlog] > Dev @ 10

Dev > CleanItems @ Rate(inf)
Dev > DefectiveItems @ Leak(0.2)

CleanItems > ReadyToDeploy @ Rate(8)
DefectiveItems > ReadyToDeploy @ Rate(inf)
DefectiveItems > Rework @ Leak(0.75)

Rework > Backlog @ Rate(8)

ReadyToDeploy > Done @ Rate(inf)
ReadyToDeploy > CustomerDefects @ Rate(0.5)

CustomerDefects > Backlog @ Rate(0.5)

Scenario 1 Stocks and Flows

This represents teams with high quality standards, automated tests, continuous integration, and strong code review practices. It assumes adequate time, testing infrastructure, a culture that values quality over speed, and proactive technical debt management.

Let’s contrast this with what happens when the system is structured differently. Not because people are less skilled, but because incentives and organizational structure create different behaviors.

Scenario 2: QA as Safety Net

This model doesn’t assume developers are careless or unskilled. Rather, it models what happens when organizational structure, incentives, and pressures create a system where QA becomes the primary quality mechanism regardless of developer intent.

Development still proceeds at 10 items per round, but defect creation increases to 40%. This isn’t because developers suddenly become worse at their jobs. It models the systemic effects when:

Developers face pressure to deliver features quickly because “QA will catch issues anyway”
Testing is seen as someone else’s job, not core development work
Deadlines emphasize handoff to QA rather than validated, working code
Developers receive little feedback on the quality of their testing practices
The organization measures developer productivity by features delivered, not defects prevented

This mirrors the scenario described in the introduction: teams where developers struggle with quality practices and dedicated QA is positioned as THE solution to quality problems. The promise is seductive: developers focus on features, QA ensures quality, everyone does what they’re best at.

But system dynamics reveal the hidden costs. When developers know QA will inspect their work, rational adaptation follows. Why spend time on comprehensive unit tests when that effort could go toward the next feature? Why refactor for testability when QA will test it anyway? These aren’t wrong decisions in isolation, they’re predictable responses to incentive structures.

QA acts as a safety net, catching 95% of defects, more effective than dev testing’s 75%. This higher catch rate comes from dedicated focus, systematic test cases, and independent verification. But here’s the catch: the feedback loop is slow. QA rework flows back to the backlog at 3 items per round versus 8. This reflects context switching costs: developers must reload mental models from work completed weeks earlier, often while juggling current priorities.

The quality gate creates a bottleneck. Work piles up in the QA stage. The separation between dev and QA creates handoff delays. Work-in-progress accumulates.

[Backlog] > Dev @ 10

Dev > CleanItems @ Rate(inf)
Dev > DefectiveItems @ Leak(0.4)

CleanItems > QA @ Rate(inf)
DefectiveItems > QA @ Rate(inf)

QA > ReadyToDeploy @ Rate(inf)
QA > QARework @ Leak(0.38)

QARework > Backlog @ Rate(3)

ReadyToDeploy > Done @ Rate(inf)
ReadyToDeploy > CustomerDefects @ Rate(0.2)

CustomerDefects > Backlog @ Rate(0.2)

Scenario 2 Stocks and Flows

This represents traditional phase-gate development with developer pressure to deliver quickly, reliance on QA as quality gatekeepers, weak developer testing practices, “throw it over the wall” mentality, and potential mistrust between dev and QA.

ℹ

Note

About the Model Parameters

These numbers are illustrative, not prescriptive. Real-world values vary widely based on context, technology, and team maturity. However, they’re not arbitrary:

Defect Rates (20% vs 40%): These represent different quality management rather than absolute measurements. A 2x difference models the observable gap between teams with strong testing discipline versus those relying primarily on downstream inspection. The specific numbers matter less than capturing the directional effect: when developers don’t own quality, initial defect creation increases.

Detection Rates (75% dev testing vs 95% QA): These reflect the fundamental difference between testing your own work versus independent verification. QA’s higher catch rate reflects fresh eyes, systematic test cases, and dedicated focus. However, QA can’t catch what isn’t there to catch, if developers ship code that doesn’t meet basic requirements, QA finds obvious breaks but misses subtle integration issues or architectural problems.

Rework Rates (8 vs 3 items/round): This 2.67x slowdown models context-switching costs. When developers fix code they wrote yesterday, context is fresh. Fixing code from two weeks ago after QA review requires reconstructing mental models, reviewing decisions, and often debugging in less familiar code.

The Key Insight: These specific numbers matter less than the relationships between them. Halving the defect rate difference or adjusting detection percentages changes the magnitude of effects but not the directional findings. The system dynamics remain: late feedback costs more, fast feedback enables learning, and prevention beats inspection. You could adjust these parameters to match your organization’s data and still observe similar patterns.

What the Models Reveal

The Throughput Paradox

Scenario 2 delivers 21.7% fewer items despite starting with the same input rate. Rework and quality issues create drag on the system. WIP accumulation slows everything down.

Model comparison throughput

Here’s what really surprised me: even when increasing Scenario 2’s input rate by 20%, accounting for developers completing 20% more items and moving faster, the “faster” approach is still slower in the long run.

Model comparison throughput adjusted for increased dev speed

The Lead Time Penalty

Items take 67% longer to complete in Scenario 2. Batch handoffs between dev and QA, context-switching costs when fixing old work, and queue time waiting for QA capacity all compound.

Model lead time comparison

The Rework Trap

Scenario 2 maintains 3x more items in rework at steady state. Late detection means more expensive fixes. Slow feedback loops keep defects in the system longer. The “quality gate” becomes a permanent fixture rather than a temporary filter.

Model rework comparison

What’s Missing from These Models

These models capture system mechanics but miss important human dimensions. Team morale and burnout affect quality in ways no defect rate can capture. Psychological safety determines whether people speak up about quality concerns before they become production incidents.

Technical debt compounds like interest, each shortcut makes the next change harder, creating drag that accumulates until teams can barely move forward. Customer impact varies wildly: a bug during a critical demo has outsized consequences, and reputation damage accumulates non-linearly.

Organizational dynamics shape everything. Power structures can perpetuate “dev vs QA” mindsets, which I have already witnessed more than once. Incentive systems that reward features over quality create predictable behaviors. Communication patterns determine whether problems are addressed early or hidden until they explode.

The Real-World Implications

These scenarios reflect what people actually see happening:

“Going Faster” Often Means Going Slower. Rushing through development creates more rework than it saves in time. The cost of poor quality is invisible until it accumulates. Short-term thinking creates long-term pain.

Quality Gates Can’t Fix Broken Processes. QA cannot test quality into a product. Inspection finds defects, but prevention avoids them. The earlier you catch issues, the cheaper they are to fix.

The Deployment Frequency Ceiling. Scenario 2 caps how often you can ship. Every release waits for the QA cycle. Want to deploy daily? You need daily QA cycles, which means either incomplete testing or massive QA teams. Scenario 1’s fast feedback enables continuous deployment naturally. High-performing teams show both high deployment frequency and developer-owned testing because they’re mechanically linked.

What the Research Says

The data backs this up. Beller et al. monitored 2,443 software engineers over 2.5 years and found troubling gaps [1]:

Half of developers in the study don’t test at all
Developers spend only 25% of their time testing but believe they spend 50%
Most programming sessions end without any test execution

Research on continuous integration provides additional support. Hilton et al. analyzed 34,544 open-source GitHub projects and found CI adoption correlates with more frequent releases, better quality, and improved developer confidence [2]. They identified automated testing as “the cornerstone of CI benefits.”

Google’s DORA program, surveying over 39,000 professionals, consistently identifies testing automation as a key differentiator [3]. High-performing teams demonstrate 224x faster deployment frequency, 30x lower change failure rates, and automate 75% or more of their testing.

Perhaps most striking is the cost research. Stecklein et al. found that if fixing a requirements error during requirements phase costs 1 unit, that same error costs 3-8 units during design, 7-16 units during build, 21-78 units during integration testing, and 29 to over 1500 units during operations [4]. The economic argument for early testing isn’t subtle, it’s overwhelming.

The Bottom Line

These aren’t recommendations, they’re observations. They capture two different ways teams work:

Scenario 1 assumes quality is built into the process, developers are trusted and skilled, and feedback loops are fast.

Scenario 2 assumes quality is inspected after the fact, developers are resource units optimized for throughput, and QA is the last line of defense.

The model’s outputs show us the systemic consequences: lower throughput, longer lead times, more defects, higher WIP, lower efficiency.

The question isn’t whether to test. The question is: when do you want to find your defects? The answer shapes your culture, your costs, and ultimately, your success.

Developer ownership isn’t just helpful, it’s necessary for quality. The 100x cost difference between early and late bug fixes provides overwhelming economic justification. Industry leaders like Google make developers responsible for quality, not as punishment, but as the most effective approach.

This doesn’t mean QA disappears. It means QA evolves. QA teams shift to enablement, tackling complex testing scenarios, building test infrastructure, and doing exploratory work that requires deep expertise. They become multipliers, not gatekeepers.

Continuous feedback loops enable faster delivery without sacrificing quality. But they only work when developers own quality from the start.

References

M. Beller, G. Gousios, A. Panichella, S. Proksch, S. Amann, and A. Zaidman, Developer Testing in the IDE: Patterns, Beliefs, and Behavior, IEEE Transactions on Software Engineering, vol. 45, no. 3, pp. 261–284, 2019. doi:10.1109/TSE.2017.2776152
M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig, Usage, costs, and benefits of continuous integration in open-source projects, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 426–437, 2016. doi:10.1145/2970276.2970358
DORA | Accelerate State of DevOps Report 2024, [Online]. Available: https://dora.dev [Accessed: Nov. 25, 2025].
J. Stecklein, J. Dabney, B. Dick, B. Haskins, R. Lovell, and G. Moroney, Error Cost Escalation Through the Project Life Cycle, 2004[Online]. Available: https://ntrs.nasa.gov/citations/20100036670 [Accessed: Nov. 25, 2025].