Seven hours, one bad assumption
On Monday our QA manager reported a performance regression in an upcoming release. A feature flag that was supposed to improve query performance was actually making things worse.
The feature controlled automatic date filtering - ensuring that queries always included a 90-day window at most, which meant the database could use its date-based partitions instead of scanning entire tables. In production, where data spans years, that's a meaningful optimization. The queries would hit one or two partitions instead of all of them.
But in the test environment, turning the flag on made performance worse. The QA manager flagged it, and we pulled a group together to figure out what was going on.
There were five new configuration options in this release, all with some impact on the endpoint in question. We spent the next seven hours on a call working through combinations - toggling flags, comparing results against the baseline, trying to isolate which setting was causing the regression. None of the combinations we tried explained the regression. The feature that should have been our biggest improvement was consistently slower.
Eventually I pulled up the execution plan. Full table scan. The partition pruning wasn't happening at all, which shouldn't have been possible with a 90-day date filter in the query. So I checked the test data directly.
Every row in the test database had a date within the same month.
When all your data fits inside a single partition, adding a date filter doesn't help the query planner - it's already reading the only partition that exists. Worse, the overhead of the filter logic itself made things marginally slower. The optimization was working exactly as designed. It just had nothing to optimize against.
Someone had told us the test data was representative of production. We took that at face value and spent seven hours chasing a problem that didn't exist. Reseeding the database with dates spread across multiple years - matching what production actually looks like - immediately showed the performance improvement we expected.
The frustrating part isn't that the data was wrong. Test environments drift. Seed scripts get written once and never updated. That's normal. The frustrating part is that we had an explicit statement that the data was accurate, and nobody verified it before we started testing. Including me.
"Trust but verify" is one of those phrases that sounds obvious until you're in the middle of a debugging session and someone tells you the environment is good. It's easy to take that at face value when there's pressure to find the real problem. Stopping to question whether the test setup is valid feels like a detour when you're already hours into an investigation.
But seven hours is a lot of detour. If any of us had spent 15 minutes looking at the shape of the test data before we started, we would have found the problem immediately. The fix was a database reseed. Everything after that confirmed the feature was working.
I don't think the takeaway is that you can't trust your colleagues. It's that verifying foundational assumptions should be the first step, not the last resort. Check the data. Check the environment. Check the thing everyone agrees is fine, because that's usually where the problem is hiding.