“Why fear about one thing that isn’t going to occur?”
KGB Chairman Charkov’s query to inorganic chemist Valery Legasov in HBO’s “Chernobyl” miniseries makes a great epitaph for the tons of of software program growth, modernization, and operational failures I’ve lined for IEEE Spectrum since my first contribution, to its September 2005 particular problem on studying—or somewhat, not studying—from software program failures. I famous then, and it’s nonetheless true twenty years later: Software program failures are universally unbiased. They occur in each nation, to giant corporations and small. They occur in industrial, nonprofit, and governmental organizations, no matter standing or repute.
International IT spending has greater than tripled in fixed 2025 {dollars} since 2005, from US $1.7 trillion to $5.6 trillion, and continues to rise. Regardless of extra spending, software program success charges haven’t markedly improved up to now twenty years. The result’s that the enterprise and societal prices of failure proceed to develop as software program proliferates, permeating and interconnecting each facet of our lives.
For these hoping AI software program instruments and coding copilots will rapidly make large-scale IT software program initiatives profitable, overlook about it. For the foreseeable future, there are arduous limits on what AI can carry to the desk in controlling and managing the myriad intersections and trade-offs amongst methods engineering, challenge, monetary, and enterprise administration, and particularly the organizational politics concerned in any large-scale software program challenge. Few IT initiatives are shows of rational decision-making from which AI can or ought to be taught. As software program practitioners know, IT initiatives endure from sufficient administration hallucinations and delusions with out AI including to them.
As I famous 20 years in the past, the drivers of software program failure incessantly are failures of human creativeness, unrealistic or unarticulated challenge targets, the lack to deal with the challenge’s complexity, or unmanaged dangers, to call a number of that at the moment nonetheless commonly trigger IT failures. Quite a few others return many years, resembling these recognized by Stephen Andriole, the chair of enterprise know-how at Villanova College’s College of Enterprise, within the diagram beneath first revealed in Forbes in 2021. Uncovering a software program system failure that has gone off the rails in a singular, beforehand undocumented method could be stunning as a result of the overwhelming majority of software-related failures contain avoidable, recognized failure-inducing components documented in tons of of after-action stories, educational research, and technical and administration books for many years. Failure déjà vu dominates the literature.
The query is, why haven’t we utilized what we’ve got repeatedly been compelled to be taught?
Steve AndrioleThe Phoenix That By no means Rose
Lots of the IT developments and operational failures I’ve analyzed over the past 20 years have every had their very own Chernobyl-like meltdowns, spreading reputational radiation all over the place and contaminating the lives of these affected for years. Every usually has a narrative that strains perception. A first-rate instance is the Canadian authorities’s CA $310 million Phoenix payroll system, which went dwell in April 2016 and shortly after went supercritical.
Phoenix challenge executives believed they might ship a modernized fee system, customizing PeopleSoft’s off-the-shelf payroll package deal to comply with 80,000 pay guidelines spanning 105 collective agreements with federal public-service unions. It additionally was trying to implement 34 human-resource system interfaces throughout 101 authorities companies and departments required for sharing worker information. Additional, the federal government’s developer staff thought they might accomplish this for lower than 60 p.c of the seller’s proposed price range. They’d save by eradicating or deferring vital payroll capabilities, lowering system and integration testing, lowering the variety of contractors and authorities employees engaged on the challenge, and forgoing important pilot testing, together with a host of different overly optimistic proposals.
Phoenix’s payroll meltdown was preordained. In consequence, over the previous 9 years, round 70 p.c of the 430,000 present and former Canadian federal authorities workers paid via Phoenix have endured paycheck errors. At the same time as just lately as fiscal 12 months 2023–2024, a 3rd of all workers skilled paycheck errors. The continuing monetary stress and anxieties for 1000’s of workers and their households have been immeasurable. Not solely are recurring paycheck troubles sapping employee morale, however in at the very least one documented case, a coroner blamed an worker’s suicide on the insufferable monetary and emotional pressure she suffered.
By the tip of March 2025, when the Canadian authorities had promised that the backlog of Phoenix errors would lastly be cleared, over 349,000 have been nonetheless unresolved, with 53 p.c pending for greater than a 12 months. In June, the Canadian authorities as soon as once more dedicated to considerably lowering the backlog, this time by June 2026. Given earlier guarantees, skepticism is warranted.
The query is, why haven’t we utilized what we’ve got repeatedly been compelled to be taught?
What share of software program initiatives fail, and what failure means, has been an ongoing debate inside the IT neighborhood stretching again many years. With out diving into the controversy, it’s clear that software program growth stays one of many riskiest technological endeavors to undertake. Certainly, in accordance with Bent Flyvbjerg, professor emeritus on the College of Oxford’s Saїd Enterprise College, complete information exhibits that not solely are IT initiatives dangerous, they’re the riskiest from a price perspective.
The CISQ report estimates that organizations in the US spend greater than $520 billion yearly supporting legacy software program methods, with 70 to 75 p.c of organizational IT budgets dedicated to legacy upkeep. A 2024 report by providers firm NTT DATA discovered that 80 p.c of organizations concede that “insufficient or outdated know-how is holding again organizational progress and innovation efforts.” Moreover, the report says that nearly all C-level executives consider legacy infrastructure thwarts their capacity to reply to the market. Even so, provided that the price of changing legacy methods is usually many multiples of the price of supporting them, enterprise executives hesitate to interchange them till it’s not operationally possible or cost-effective. The opposite cause is a well-founded concern that changing them will flip right into a debacle like Phoenix or others.
However, there have been ongoing makes an attempt to enhance software program growth and sustainment processes. For instance, we’ve got seen growing adoption of iterative and incremental methods to develop and maintain software program methods via Agile approaches, DevOps strategies, and different associated practices.
The purpose is to ship usable, reliable, and inexpensive software program to finish customers within the shortest possible time. DevOps strives to perform this constantly all through your complete software program life cycle. Whereas Agile and DevOps have proved profitable for a lot of organizations, additionally they have their share of controversy and pushback. Provocative stories declare Agile initiatives have a failure charge of as much as 65 p.c, whereas others declare as much as 90 p.c of DevOps initiatives fail to fulfill organizational expectations.
It’s best to be cautious of those claims whereas additionally acknowledging that efficiently implementing Agile or DevOps strategies takes constant management, organizational self-discipline, persistence, funding in coaching, and tradition change. Nevertheless, the identical necessities have at all times been true when introducing any new software program platform. Given the historic lack of organizational resolve to instill confirmed practices, it’s not stunning that novel approaches for growing and sustaining ever extra advanced software program methods, irrespective of how efficient they could be, will even incessantly fall quick.
Persisting in Silly Errors
The irritating and perpetual query is why fundamental IT project-management and governance errors throughout software program growth and operations proceed to happen so typically, given the near-total societal reliance on dependable software program and an extensively documented historical past of failures to be taught from? Subsequent to electrical infrastructure, with which IT is more and more merging right into a mutually codependent relationship, the failure of our computing methods is an existential risk to trendy society.
Frustratingly, the IT neighborhood stubbornly fails to be taught from prior failures. IT challenge managers routinely declare that their challenge is in some way totally different or distinctive and, thus, classes from earlier failures are irrelevant. That’s the excuse of the conceited, although often not the ignorant. In Phoenix’s case, for instance, it was the federal government’s second payroll-system substitute try, the primary effort ending in failure in 1995. Phoenix challenge managers ignored the well-documented causes for the primary failure as a result of they claimed its classes weren’t relevant, which did nothing to maintain the managers from repeating them. Because it’s been mentioned, we be taught extra from failure than from success, however repeated failures are rattling costly.
Not all software program growth failures are dangerous; some failures are even desired. When pushing the boundaries of growing new varieties of software program merchandise, applied sciences, or practices, as is going on with AI-related efforts, potential failure is an accepted risk. With failure, expertise will increase, new insights are gained, fixes are made, constraints are higher understood, and technological innovation and progress proceed. Nevertheless, most IT failures at the moment are usually not associated to pushing the modern frontiers of the computing artwork, however the edges of the mundane. They don’t characterize Austrian economist Joseph Schumpeter’s “gales of inventive destruction.” They’re extra like gales of economic destruction. Simply what number of extra enterprise useful resource planning (ERP) challenge failures are wanted earlier than success turns into routine? Such failures needs to be referred to as IT blunders, as studying something new from them is doubtful at finest.
Was Phoenix a failure or a blunder? I argue strongly for the latter, however on the very least, Phoenix serves as a grasp class in IT challenge mismanagement. The query is whether or not the Canadian authorities discovered from this expertise any greater than it did from 1995’s payroll-project fiasco? The federal government maintains it’ll be taught, which may be true, given the Phoenix failure’s excessive political profile. However will Phoenix’s classes lengthen to the 1000’s of outdated Canadian authorities IT methods needing substitute or modernization? Hopefully, however hope shouldn’t be a technique, and purposeful motion can be essential.
The IT neighborhood has striven mightily for many years to make the incomprehensible routine.
Repeatedly making the identical errors and anticipating a distinct outcome shouldn’t be studying. It’s a farcical absurdity. Paraphrasing Henry Petroski in his e-book To Engineer Is Human: The Function of Failure in Profitable Design (Classic, 1992), we might have discovered find out how to calculate the software program failure because of threat, however we’ve got not discovered find out how to calculate to remove the failure of the thoughts. There are a plethora of examples of initiatives like Phoenix that failed partially because of bumbling administration, but this can be very tough to search out software program initiatives managed professionally that also failed. Discovering examples of what could possibly be termed “IT heroic failures” is like Diogenes searching for one trustworthy man.
The results of not studying from blunders can be a lot higher and extra insidious as society grapples with the rising results of synthetic intelligence, or extra precisely, “clever” algorithms embedded into software program methods. Hints of what may occur if previous classes go unheeded are discovered within the spectacular early automated decision-making failure of Michigan’s MiDAS unemployment and Australia’s Centrelink “Robodebt” welfare methods. Each used questionable algorithms to establish misleading fee claims with out human oversight. State officers used MiDAS to accuse tens of 1000’s of Michiganders of unemployment fraud, whereas Centrelink officers falsely accused tons of of 1000’s of Australians of being welfare cheats. Untold numbers of lives won’t ever be the identical due to what occurred. Authorities officers in Michigan and Australia positioned far an excessive amount of belief in these algorithms. They needed to be dragged, kicking and screaming, to acknowledge that one thing was amiss, even after it was clearly demonstrated that the software program was untrustworthy. Even then, officers tried to downplay the errors’ impression on individuals, then fought in opposition to paying compensation to these adversely affected by the errors. Whereas such habits is legally termed “maladministration,” administrative evil is nearer to actuality.
So, we’re left with solely knowledgeable and private obligation to reemphasize the plain: Ask what you do know, what it’s best to know, and the way massive the hole is between them earlier than embarking on creating an IT system. If nobody else has ever efficiently constructed your system with the schedule, price range, and performance you requested for, please clarify why your group thinks it will probably. Software program is inherently fragile; constructing advanced, safe, and resilient software program methods is tough, detailed, and time-consuming. Small errors have outsize results, every with an nearly infinite variety of methods they’ll manifest, from inflicting a minor useful error to a system outage to permitting a cybersecurity risk to penetrate the system. The extra advanced and interconnected the system, the extra alternatives for errors and their exploitation. A pleasant begin could be for senior administration who management the purse strings to lastly deal with software program and methods growth, operations, and sustainment efforts with the respect they deserve. This not solely means offering the personnel, monetary assets, and management help and dedication, but additionally the skilled and private accountability they demand.
It’s well-known that honesty, skepticism, and ethics are important to attaining challenge success, but they’re typically absent. Solely senior administration can demand they exist. As an example, honesty begins with the forthright accounting of the myriad of dangers concerned in any IT endeavor, not their rationalization. It’s a frequent “secret” that it’s far simpler to get funding to repair a troubled software program growth effort than to ask for what’s required up entrance to handle the dangers concerned. Vendor puffery might also be authorized, however meaning the IT buyer wants a wholesome skepticism of the usually too-good-to-be-true guarantees distributors make. As soon as the contract is signed, it’s too late. Moreover, computing’s malleability, complexity, velocity, low price, and skill to breed and retailer data mix to create moral conditions that require deep reflection about computing’s penalties on people and society. Alas, moral issues have routinely lagged when technological progress and earnings are to be made. This apply should change, particularly as AI is routinely injected into automated methods.
Within the AI neighborhood, there was a motion towards the thought of human-centered AI, which means AI methods that prioritize human wants, values, and well-being. This implies attempting to anticipate the place and when AI can go incorrect, transfer to remove these conditions, and construct in methods to mitigate the consequences in the event that they do occur. This idea requires utility to each IT system’s effort, not simply AI.
Given the historic lack of organizational resolve to instill confirmed practices…novel approaches for growing and sustaining ever extra advanced software program methods…will even incessantly fall quick.
Lastly, challenge cost-benefit justifications of software program developments hardly ever take into account the monetary and emotional misery positioned on finish customers of IT methods when one thing goes incorrect. These embrace the long-term failure after-effects. If these prices needed to be taken absolutely into consideration, resembling within the circumstances of Phoenix, MiDAS, and Centrelink, maybe there could possibly be extra realism in what’s required managerially, financially, technologically, and experientially to create a profitable software program system. It could be a forlorn request, however certainly it’s time the IT neighborhood stops repeatedly making the identical ridiculous errors it has made since at the very least 1968, when the time period “software program disaster” was coined. Make new ones, rattling it. As Roman orator Cicero mentioned in Philippic 12, “Anybody could make a mistake, however solely an fool persists in his error.”
Particular because of Steve Andriole, Hal Berghel, Matt Eisler, John L. King, Roger Van Scoy, and Lee Vinsel for his or her invaluable critiques and insights.
This text seems within the December 2025 print problem as “The Trillion-Greenback Value of IT’s Willful Ignorance.”
From Your Website Articles
Associated Articles Across the Internet

