Too Important to Shut Down
A friend of mine told me a story recently that's been stuck in my head.
His company made an acquisition, and along with it they inherited the acquired company's infrastructure. Most of it was fine. But one system stood out: a desktop NAS with a couple of drives striped in RAID 0, running the acquired company's financial system.
For anyone unfamiliar, RAID 0 means if a single drive fails, all the data is gone. No redundancy, no recovery. This wasn't a dev environment or a staging box. This was production financial data. His team did the investigation, put together a proposal to migrate it onto their standard platform, and presented the risks. Management said no. Too expensive for what they considered a legacy system.
So they moved the NAS into their datacenter, plugged it into UPSes and generators, and left it running. For about three years.
Here's the part that sticks with me. There are lots of industry regulations that make storing legacy production and order data effectively mandatory. The data had to be available, but the system it lived on wasn't worth investing in, according to the people making those decisions. Too important to shut down, not important enough to protect.
The thing is, nobody actually decided to accept that risk. There was no meeting where someone said "we understand a drive failure would mean permanent data loss, and we're okay with that." What happened was a proposal got rejected, and the absence of action became the policy. The risk was accepted by default.
I think this pattern is more common than people realize. It's not that organizations deliberately choose to run critical systems on fragile infrastructure. It's that fixing those systems requires someone to approve a project, allocate budget, and prioritize it against everything else competing for attention. And when the system is currently working, "currently working" tends to win.
The people closest to the problem - the engineers who can see the RAID 0 configuration and know what happens when a drive fails - aren't the ones who control the budget. And the people who control the budget are evaluating the proposal against a dozen other proposals, most of which have more visible upside than "prevent a disaster that hasn't happened yet."
My friend's manager probably agreed with the proposal. I don't know that for certain, but in my experience, first-line managers usually understand the risk their teams are flagging. The problem is that understanding the risk and being able to do something about it are two different things. Managers are constrained by the priorities set above them, and "keep a legacy system from catastrophically failing" doesn't compete well against revenue-generating projects on a roadmap.
Eventually, the data got moved as part of a larger project to refactor several legacy applications into a reporting database. But for three years, one drive failure away from a regulatory problem, the implicit answer was "we'll deal with it if it happens."
I've seen versions of this at every company I've worked at. A system that everyone knows is fragile, that everyone agrees should be fixed, that nobody can get prioritized. The risk just sits there, getting a little worse every day, until either something breaks or someone finds a way to bundle the fix into a project that leadership already cares about.