0 Comments

Listen to this article

Remember that feeling when you finish writing a great piece of code, commit it, and then… wait? Wait for CI to finish. Wait for a staging deploy slot. Wait for the 40-minute test suite. Wait for someone else’s conflicting changes to get sorted out. Then wait some more.

If you’re a developer, you probably know this pain all too well. Staging environments have been part of our workflow for so long that we’ve just accepted them as a necessary evil. But here’s the uncomfortable truth: they’re not necessary anymore. In fact, they might be actively holding your team back.

The Staging Problem Nobody Wants to Talk About

Let me paint you a picture. Your team has 50 developers, all trying to get their work validated before it hits production. Everyone’s code has to go through staging first because, well, that’s just how it’s done. But here’s what actually happens:

You merge your code and deploy to staging. Everything looks good. Then suddenly, your tests start failing. But it’s not your code that’s broken—it’s because three other developers deployed their changes at the same time, and now everything’s interfering with each other. So you’re back to square one, trying to figure out whose change broke what.

When multiple developers merge code, staging becomes a shared queue where tests often fail not because of bad code, but because another developer deployed a conflicting change. It’s like trying to have 50 people edit the same Google Doc simultaneously. Chaos, right?

But the problems run deeper than just queue management. The real kicker is this: staging environments are supposed to mimic production, but they never really do. The data doesn’t match. The traffic patterns are different. The security policies aren’t quite the same. You call it “production-like,” but let’s be honest—it’s more like a distant cousin who vaguely resembles production at family gatherings.

And this fidelity gap? That’s where the truly dangerous bugs hide. The ones that sail right through your staging tests and then blow up in production because, surprise, production actually has real user behavior, real data volumes, and real edge cases that staging never accounted for.

The Hidden Costs of “Safety”

Think about what staging is actually costing you. I’m not just talking about the infrastructure costs (though those add up fast). I’m talking about developer productivity.

Every time a developer has to wait hours for feedback from a staging environment, they lose their flow state. That deep focus where you’re really in the zone? Gone. Replaced by context switching, checking Slack, and basically doing anything else while waiting for validation that might not even be accurate.

Nobody maintains staging properly either. It becomes this dumping ground for unstable builds, getting more and more divergent from production over time. Teams treat it like that junk drawer in your kitchen—you know, the one where you throw random stuff and hope you never have to actually find anything in it.

This multihour cycle destroys flow state, and teams treat staging as a dumping ground for unstable builds, further diverging it from production. We’ve accepted this broken workflow for 20 years simply because we believed it was the only way.

The Paradigm Shift: Testing Where It Matters

Here’s where things get interesting. Some forward-thinking teams are doing something that sounds absolutely crazy at first: they’re killing their staging environments entirely and testing directly in production instead.

Before you close this tab thinking I’ve lost my mind, hear me out. This isn’t cowboy coding. This isn’t reckless. When done with the right guardrails, testing in production is actually safer and more reliable than testing in a fake environment that never quite matches reality.

The key is something called request-level isolation. Here’s how it works: instead of deploying your entire service to a separate environment, you deploy just your changed version alongside the stable version in production. When a test request comes in (tagged with a unique identifier), it gets routed to your new code. But here’s the clever part—as your service calls other dependencies, those calls get routed back to the stable, production services.

Your test request stays isolated as it travels through the system, while all real user traffic flows normally through the stable version. With request-level isolation, test requests are routed to sandboxed services while their calls to dependencies route back to stable baseline services, keeping the test isolated while other traffic flows normally.

Think about what this gives you: you’re testing with real data, real network policies, real everything. But without any of the contention, queues, or infrastructure duplication of staging environments. You get high-fidelity testing without the downsides of shared environments.

Feature Flags: Your New Best Friend

Another powerful tool in the “test in production” toolkit is feature flags. These let you deploy code to production but keep it hidden behind a toggle until you’re ready to expose it to users.

The beauty of feature flags is the control they give you. You can:

  • Turn on a feature just for yourself during development to make sure it works
  • Enable it for your internal team to get feedback
  • Gradually roll it out to 10% of users, then 25%, then 50%
  • Run A/B tests to see which version performs better
  • Kill switch the feature immediately if something goes wrong

This is how companies like Facebook have been operating for years. They’d often deploy new features to Brazil first (smaller user base, easier to monitor), check the error rates, and only then roll out globally. Not because they didn’t care about Brazilian users, but because having real production data from a smaller segment was infinitely more valuable than staging environment tests.

But What About the Risks?

I know what you’re thinking. “This sounds great, but what if something breaks and affects real users?”

Valid concern. But here’s the thing: properly implemented testing in production with feature flags and request isolation is often safer than traditional staging approaches. Why? Because you’re testing against reality, not a pale imitation of it.

Plus, with good monitoring and observability tools, you can catch issues immediately. If a problem shows up when you roll out to 10% of users, you can roll it back before it affects anyone else. Compare that to staging, where a problem might not show up at all until you deploy to production and suddenly discover that your database migration works fine with 1000 test records but locks up completely with 10 million real records.

Many teams have learned this lesson the hard way. They’ve had code that passed all staging tests with flying colors, only to cause catastrophic failures in production because of data discrepancies or scale issues that staging never exposed.

The Middle Ground: Review Apps

Not every team is ready to jump straight to testing in production, and that’s okay. There’s a middle ground that’s worth considering: review apps.

A review app is a standalone copy of your application that gets automatically generated for each pull request. It lets you test changes in isolation before they even get merged. This approach combines many of the advantages of staging (isolated testing, stakeholder feedback) with fewer of the drawbacks (no shared queue, no mysterious state issues).

The workflow becomes: write code, open a pull request, get an automatic review app spun up, let QA and product folks test it, gather feedback, and iterate—all before the code even reaches the main branch. It tightens the feedback loop dramatically.

Sure, review apps have some challenges. The tooling can be complex for sophisticated systems, and running multiple review apps simultaneously can get pricey. But for many teams, the productivity gains far outweigh the costs.

What This Really Means

The move away from staging isn’t just about adopting a new tool or technique. It represents a fundamental shift in how we think about software development and deployment.

Teams deprecating staging environments represent a broader trend: the rejection of approximation in favor of reality. Staging environments are artifacts from when duplicating infrastructure was harder than coordinating humans around shared resources. That era is ending.

Modern development is fast-paced. Your competitors aren’t waiting around for multi-hour staging validation cycles. They’re shipping features, gathering real user feedback, and iterating quickly. If your process forces developers to wait for slow, unreliable staging environments, you’re not just frustrating your team—you’re losing competitive ground.

Making the Transition

So how do you actually move away from staging? Here are some practical steps:

Start by auditing your current process. How much time does your team actually spend waiting for staging? How often do staging tests give false positives or miss real bugs? What’s the real ROI of your staging environment?

Then, build the necessary infrastructure. You need excellent automated testing, robust monitoring and observability in production, and feature flag capabilities. These are your safety nets.

Consider implementing request-level isolation for your microservices. Tools exist that can help you set this up without massive engineering effort.

Start small. Maybe test one low-risk service directly in production first. Build confidence. Learn what works for your team. Then gradually expand the approach.

The goal isn’t to be reckless. It’s to be more efficient and more accurate in your testing. It’s about acknowledging that an imperfect copy of production (staging) will never be as valuable as careful, controlled testing in the actual production environment.

Insights

Staging environments made sense 20 years ago when our tools and practices were different. But technology has evolved. We have better isolation techniques, more sophisticated deployment strategies, and vastly improved monitoring capabilities.

The question isn’t “Can we afford to test in production?” It’s “Can we afford not to?” Every hour your developers spend waiting for staging validation is an hour they’re not building new features. Every bug that staging misses but production catches is proof that your safety net has holes in it.

The teams making this shift aren’t just moving faster—they’re shipping more reliable code. They’ve stopped accepting the broken status quo and started embracing a new paradigm that matches how modern software should be built and deployed.

Your staging environment isn’t protecting you as much as you think it is. It might be time to kill it and embrace something better.


What’s your team’s experience with staging environments? Have you tried testing in production, or are you considering it? The conversation around this shift is just getting started, and there’s no one-size-fits-all solution. But one thing’s clear: the old ways of doing things aren’t cutting it anymore.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts