What If We Didn't Need a QA Team Anymore?
I’ve been researching this for weeks. The tools exist. The playbooks exist. Big tech has done it. But most of us haven’t. Here’s what I’ve found — and what I think it’ll actually take.
I manage a fullstack engineering team. We own web, mobile web, native apps, and a bunch of backend microservices. And like most teams, we have QA engineers who manually test features before they ship.
It works. Sort of. The feedback loop is slow — 2 to 3 days between a dev finishing a feature and QA finishing their pass. Bugs bounce back and forth. Releases pile up waiting for sign-off. And every time we try to speed up, quality takes a hit because QA can’t keep up with the pace.
So I started asking a question that felt almost taboo: what would it take to not need a separate QA function at all?
Not zero quality — I want to be clear about that. Zero manual testing effort. Zero QA bottleneck. Developers own quality end-to-end, supported by AI tools, automated pipelines, and production observability.
I’ve spent weeks going deep on this. Reading papers, trying tools, studying how Netflix, Google, and Microsoft operate. And I’ve come away thinking: this is genuinely achievable. The pieces are all there. But I also don’t think most of us — me included — have actually done the hard work of putting them together in a real team with real deadlines and real production pressure.
This article is me sharing what I’ve found, being honest about what I haven’t tried yet, and thinking out loud about what it would take to actually pull this off.
The Case That Got Me Thinking
Netflix hasn’t had a traditional QA team for years. They call their model the “Full Cycle Developer” — one team owns everything from design to production support. Google treats quality as a dev responsibility. PRs without tests don’t get merged. Period.
But the one that really got me was Microsoft. They had a dedicated SDET role with a 2:1 developer-to-tester ratio. Then in 2014, they retired the role entirely. Brian Harry, a Technical Fellow there, said the dedicated tester function was creating “lack of accountability on developers” and “making refactoring very hard.” After the transition, the Skype for Web team went from bi-weekly to daily shipping.
These aren’t startups with 5 engineers. These are some of the largest, most complex software organizations in the world. And they concluded that a separate QA function was slowing them down, not helping them.
Now — I know what you’re thinking. “We’re not Netflix.” Fair. But the tools and practices that made this possible for them are now available to everyone. That’s what’s changed.
The Frontend Layer: This Is Where I’m Most Convinced
The frontend testing space has changed so much in the last 18 months that I almost didn’t believe it when I started digging in.
Visual regression testing actually works now. Not the old pixel-diff tools that generated 500 false positives. Tools like Applitools Eyes use what they call Visual AI — it understands layout semantics, so it knows the difference between a real CSS regression and a harmless rendering variation. Percy by BrowserStack does something similar with DOM snapshots and just launched an AI review agent that filters out about 40% of false positives.
Chromatic turns your Storybook stories into visual tests with zero configuration. Every PR gets compared across browsers automatically. I haven’t tried it on our codebase yet, but the concept is solid — if you already have Storybook, the incremental effort is close to nothing.
But here’s the part that really surprised me: natural language E2E testing. Tools like testRigor, Momentic, and Autify let you describe test flows in plain English. Not pseudo-code, not CSS selectors — actual sentences like “click Submit Order and verify the confirmation page appears.” The AI figures out how to locate elements at runtime.
testRigor has a case study where IDT Corporation went from 34% to 91% test automation in 9 months using their existing manual QA staff. I’ll admit I’m skeptical of vendor case studies, but even if the real number is half that, it’s a massive shift.
The self-healing angle is interesting too. Traditional E2E tests break constantly because a developer changes a CSS class or renames a div. Intent-based tools like Momentic and Zerocheck don’t use selectors at all — they describe elements by what they are (“the login button”) and resolve them using AI at runtime. The promise is near-zero test maintenance. I haven’t validated this claim at scale, but the logic is sound.
My honest take: For web frontend, I think we can get to zero manual QA with current tools. Visual regression + natural language E2E + accessibility scanning (axe-core catches about 57% of issues automatically) covers the vast majority of what a manual QA engineer does. The remaining gap — UX quality, confusing workflows, screen reader testing — would need periodic human attention, but not a dedicated team.
Mobile: The Hardest Problem, But Cracking Open
Mobile testing is where my confidence drops a bit. The fragmentation is brutal — 20,000+ Android device types, different OS versions, gesture interactions, performance on low-end hardware in bad network conditions. This is where most QA teams spend the bulk of their manual effort, and for good reason.
That said, the tooling has gotten dramatically better.
Maestro is the one I keep hearing about. Open source, declarative YAML, works across Android, iOS, React Native, and Flutter. Microsoft, Meta, and DoorDash use it. It has built-in flakiness tolerance — auto-waits, pixel-by-pixel retries — and Todoist reports hitting 99% test reliability after switching to it. I want to try this on our native app flows.
KaneAI by LambdaTest ($299/month) is positioning itself as the first GenAI-native mobile testing agent. You describe objectives in natural language, it generates and runs tests on real devices. I’ve only seen demos, not production results from teams I trust, so I’m cautious.
For device coverage, BrowserStack (20,000+ real devices) and Pcloudy (5,000+ devices with an AI agent platform) handle the fragmentation problem in theory. But running your full test suite across hundreds of device/OS combos in CI is expensive and slow. How teams manage this tradeoff in practice — what subset of devices is “enough” — is something I still need to figure out.
The Uber paper is fascinating though. Their ICSE 2026 paper describes a system called DragonCrawl that uses LLMs to drive mobile apps while fault injection breaks backend services. They ran 180,000+ automated chaos tests and found crash-causing bugs that only appeared through mobile chaos testing — not through backend testing alone. Their insight: in a distributed system, you need to test what happens to the user experience when services fail, not just whether each layer works in isolation.
Now, Uber has resources most of us don’t. But the idea — AI-driven test execution combined with chaos engineering on mobile — feels like the direction things are heading.
My honest take: Mobile is achievable but harder than web. I think Maestro + a real device cloud + visual regression gets you 70-80% of the way. The remaining 20% — OS-specific edge cases, gesture interactions on weird devices, performance on budget phones — might still need some human attention. Not a QA team, but maybe periodic focused testing sprints.
Backend: This One I’m Actually Fairly Confident About
Backend microservices testing has always been messy. The traditional approach — spin up 8 services in a shared staging environment, run E2E tests, spend 3 hours debugging why the test failed because someone else deployed a broken build to staging — doesn’t scale.
But contract testing changes the game completely, and this isn’t new or unproven. Pact lets each service define and verify its API contracts independently. Consumer defines expectations, provider verifies it can meet them. The can-i-deploy CLI check gates deployments when contracts break. This replaces the need for integrated E2E environments for most service-to-service validation. It runs in seconds, not minutes. It’s deterministic, not flaky.
For API edge cases, Schemathesis auto-generates thousands of test cases from your OpenAPI or GraphQL schema. It typically finds 5-15 issues on first run — 500 errors, validation bypasses, schema violations. It’s open source and requires zero per-endpoint maintenance since it adapts as schemas evolve.
For resilience testing, chaos engineering tools like Gremlin and LitmusChaos (CNCF project, free) let you inject failures — network latency, pod crashes, timeout scenarios — and verify that retry logic, circuit breakers, and fallback mechanisms actually work. This catches the class of bugs that only appear under failure conditions, which is exactly the class of bugs that manual QA almost never finds.
And for deployment safety, progressive delivery with canary analysis is basically automated smoke testing. Argo Rollouts shifts traffic in small increments (1% >> 5% >> 25% >> 100%), compares error rates and latency against baselines, and auto-rolls back if metrics breach thresholds. No human in the loop.
My honest take: Backend is the layer where zero manual QA feels most achievable today. Contract testing + schema-based fuzzing + chaos engineering + canary deployments covers the territory. The tooling is mature, the patterns are well-documented, and the ROI is clear. If I were starting the zero-QA transition somewhere, I’d start here.
The AI Testing Landscape: Real But Not Magic
I need to talk about the AI angle honestly because there’s a lot of hype mixed in with real progress.
Gartner published its first-ever Magic Quadrant for AI-Augmented Software Testing Tools in October 2025. That’s significant — it means the category is real, not just vendor marketing.
The distinction that matters is AI-assisted vs. AI-agentic testing. AI-assisted means an LLM helps you write tests faster — you still decide what to test. AI-agentic means a system observes your app, reasons about what to test, generates tests, executes them, and reports findings with minimal human guidance.
We’re somewhere in between. Tools like Tricentis’s agentic testing, testRigor, and Qodo can generate tests from requirements and execute them. Tricentis reports one customer achieving an 85% reduction in manual effort. These are real tools shipping real value.
But fully autonomous, unsupervised AI QA? That’s not here yet. The current model is what I’d call supervised autonomy — AI agents generate and explore, humans review before committing to regression suites. The 2025 World Quality Report says AI “still requires oversight and is not yet reliable for fully unsupervised operation.” I think that’s accurate.
What concerns me about AI-generated code: Tricentis’s CEO noted that over 40% of code written last year was AI-generated. But a Stack Overflow survey found 88% of respondents weren’t confident deploying AI-generated code, and a GitLab survey found 29% had to roll back releases due to AI errors. More code, shipped faster, with less confidence. That actually increases the need for automated quality gates, not decreases it.
My honest take: AI testing tools are a force multiplier, not a replacement for strategy. They’ll cut your test creation and maintenance effort by 50-80%. But someone still needs to define what quality means for your product, decide what to test at what layer, and interpret results. AI is the engine, not the driver. At least not yet.
The Culture Problem Nobody Wants to Solve
Here’s the uncomfortable truth: tools are maybe 40% of this problem. The other 60% is culture.
If you fire your QA team tomorrow and tell developers “you own quality now” — without investing in tooling, training, and accountability — quality will collapse. This has happened. It’s not theoretical.
The companies that made this work all followed the same sequence:
- Invest in tooling first. Netflix built Spinnaker for deployments, Atlas for monitoring — “paved road” infrastructure that makes doing the right thing easier than doing the wrong thing. Only after the tools were in place did they change the org structure.
- Train developers to think like testers. This isn’t natural for most devs. They test the happy path and move on. You need bootcamps, pairing sessions, and code review practices that explicitly check for test quality — not just test existence.
- Make accountability real. At Netflix, if your service breaks in production, you fix it. No handoffs. No excuses. At Google, PRs without tests get rejected. These aren’t just policies — they’re cultural norms enforced by peers.
- Transition QA people, don’t just cut them. The people who understand your product’s edge cases, your users’ weird behaviors, your historical bug patterns — that knowledge is incredibly valuable. Transition them to Quality Engineers who build test infrastructure, or Quality Ops who own production monitoring, or Product Quality who focus on accessibility and UX.
I keep coming back to this line: “The companies that failed at this eliminated QA headcount. The companies that succeeded transformed the QA function.”
I haven’t done this transition yet. I’m being honest about that. But I’ve seen enough teams attempt it — some successfully, some not — to know that the culture work is the hard part, not the tooling.
What the Pipeline Would Look Like
If I were designing the zero-manual-QA pipeline for my team, here’s what I’d build:
On PR creation: Automated linting, static analysis (SonarQube), type checking. All on pre-commit hooks and CI triggers. No human involvement.
Pre-merge CI: Build >> unit tests with coverage thresholds >> contract tests (Pact) >> security scanning (Snyk, Semgrep) >> critical-path E2E smoke tests. Under 10 minutes total.
Post-merge: Full Playwright E2E regression >> visual regression (Chromatic or Percy) >> performance tests (Lighthouse CI, k6) >> DAST security scanning. Maybe 20-30 minutes.
Deployment: Canary at 1-5% traffic >> automated metric comparison >> progressive rollout >> auto-rollback on SLO breach. Feature flags for instant killswitch.
Production: Real User Monitoring (Datadog RUM) >> synthetic monitoring (Checkly) >> error tracking (Sentry) >> automated post-deploy smoke tests.
Every stage is automated. Every gate has clear pass/fail criteria. No “QA sign-off” step.
I think this is buildable in 9-12 months for a team that’s willing to invest. The rough cost — tooling plus 2-3 quality/platform engineers to maintain it — runs $400K-$800K annually. A 10-person QA team costs $1.5M+. The math works even before you factor in the speed improvements.
What I’m Still Unsure About
I want to be honest about the gaps in my thinking.
Flaky tests at scale. Slack found that 57% of their failing builds were caused by flaky tests before they built automated detection. If your automated pipeline is unreliable, developers will stop trusting it — which is worse than having no pipeline. I think Playwright is much better than Selenium here, but I haven’t operated a 500+ test suite at scale to say for sure.
The exploratory testing gap. Automated tests verify what you expect. They don’t find what you don’t expect. Exploratory testing — a human poking around the app with curiosity — catches the weird stuff. AI is starting to do this (Momentic’s autonomous crawling, for example), but I’m not convinced it matches a skilled human tester’s intuition yet.
Accessibility beyond automation. axe-core catches maybe 57% of WCAG issues. Screen reader testing, cognitive accessibility, confusing navigation flows — these need human judgment. I don’t see a path to fully automating this.
The “developers test the happy path” problem. This is a real cognitive bias. Devs build the feature, so they unconsciously test the paths they designed. Breaking this requires culture change, AI-generated edge cases, and mutation testing (Stryker) to catch weak tests. I believe it’s solvable but I haven’t solved it.
Regulated industries. If you’re in fintech, healthcare, or anything with compliance requirements, “zero QA” might need different guardrails. Audit trails, SOC 2 compliance, formal test evidence — these create constraints I haven’t fully thought through.
How I’d Roll This Out (If I Were Starting Tomorrow)
I wouldn’t do a big bang. I’d do it in phases, prove value at each stage, and use data to justify the next investment.
Months 1-3 — Foundation: SonarQube quality gates. Unit test coverage enforcement (start at 70%, ratchet up). Sentry for error tracking. Snyk for dependency scanning. QA team continues as-is while developers start learning automation tools.
Months 3-6 — Automation build-out: Playwright E2E framework. Pact contract tests for top 10 service boundaries. Chromatic or Percy for visual regression on the web layer. Maestro for mobile smoke tests. QA engineers retitled to “Quality Engineers” and start building framework.
Months 6-9 — Production observability: Datadog RUM or equivalent. Feature flags (LaunchDarkly or Flagsmith). Canary deployments with Argo Rollouts. Define SLOs and error budgets. Manual regression reduced by 50%.
Months 9-12 — Full pipeline: Mutation testing (Stryker) on critical modules. Complete canary pipeline with automated rollback. Manual regression sign-off eliminated. Quality Engineers focus on exploratory testing, chaos engineering, accessibility, and strategy.
Would I measure success with DORA metrics? Yes — deployment frequency, lead time, change failure rate, recovery time. But I’d also track defect escape rate (bugs found in production vs. pre-production), flaky test rate (budget: max 2%), and developer satisfaction with the testing process.
Why I Think This Matters Now
67% of teams have already shifted primary testing responsibility to developers. The AI testing market hit $1 billion in 2025. Over 40% of code is now AI-generated — which means more code, shipped faster, with potentially more edge cases.
The traditional QA model was designed for a world where humans wrote all the code, releases happened monthly, and you could afford 3-day testing cycles. That world doesn’t exist anymore.
I’m not saying every team should eliminate QA next quarter. I’m saying every engineering leader should be seriously thinking about what their quality strategy looks like in a world where AI writes code, AI tests code, and deployment happens multiple times a day.
The leaders who figure this out — who build the pipeline, drive the culture shift, and manage the human transition with empathy — are going to be the ones that top companies fight to hire.
I haven’t fully figured it out yet. But I’m going to try. And I’ll write about what I learn.
If you’re thinking about this transition for your team, I’d genuinely love to compare notes. What tools have you tried? What worked? What blew up in your face? Drop a comment or connect — I’m especially interested in hearing from teams at 50-500 engineers who’ve attempted this.