14 Essential Lessons Tech Leaders Should Learn From High-Profile Tech Failures
Technology has become a cornerstone of almost every industry, and when it fails, the effects can be devastating. In late 2022 and early 2023, the aviation industry learned this lesson well when high-profile tech failures at Southwest Airlines and the FAA caused massive flight delays and cancellations for travelers across the U.S.
These significant disruptions highlight the need for robust technological systems and processes—something that applies to businesses regardless of their field or sector. Below, 14 Forbes Technology Council members share important lessons tech leaders and companies can learn from these highly publicized incidents and how they can apply them to prevent such high-impact problems in their organizations.
1. Understand That Failures Will Occur
It’s not a matter of if technology failures will happen, but when they will happen. It’s important for tech leaders to be proactive in their preparations and have contingency plans in place so that they can quickly and effectively respond and minimize the impact of these failures. And, it’s equally important to learn from them, whether it is your organization or someone else in the news. – Liat Hayun, Eureka Security
2. Regularly Upgrade Your Tech
It’s imperative to upgrade the tech you use on a regular basis. Just because a computer can run an old program for 20 years doesn’t mean you should wait 20 years to replace your computer. Yes, this takes time and money, but if it keeps your business running smoothly and your customers happy, it’s certainly worth the investment. – Syed Ahmed, Act-On Software
3. Balance New Tech Investments And Reliability
Organizations must balance staying current and delivering reliable systems. Perform regular maintenance, testing and contingency planning to secure systems, and you will maintain customer trust. Avoiding being perceived as a “laggard” and investing in tech for its own sake are both key. Companies should carefully evaluate new tech investments for business alignment and value. A balanced approach is necessary for success. – Renaldo Arciola, Fonicom
4. Emphasize Disaster Recovery, Contingency Plans And Regular Maintenance
Disaster recovery and contingency planning are crucial. They should involve doing regular backups of essential data, testing backup and recovery systems to verify they work and defining methods for communicating with stakeholders. IT executives must emphasize disaster recovery, contingency planning, and regular maintenance and upgrades to keep systems reliable and robust, even during outages. – Shelli Brunswick, Space Foundation
5. Plan For System Usage Spikes
Tech leaders must prioritize system reliability and redundancy. Companies should ensure that their systems are able to handle unexpected spikes in traffic or usage and should have multiple backup systems in place in case of an emergency. Companies should strive to create a culture of accountability to ensure that systems are up-to-date and reliable. – Marc Fischer, Dogtown Media LLC
6. Invest In Quality Data Monitoring And Analysis
Tech failures teach us that modern data infrastructure is fallible. Frequently, monitoring tools won’t help companies avoid outages. This leads to the next lesson: Data collection is less important than data-based insights. Moving forward, tech leaders will need to dissect their data more thoughtfully. Perhaps the real lesson is to invest in quality monitoring, as it costs less than reputational damage. – Phil Tee, Moogsoft
7. Don’t Rely On A Single Platform
Simply put, you always need to have a business continuity plan; you cannot rely on only one platform for such high-stakes work as scheduling flights, monitoring healthcare or powering the stock exchange. A backup plan needs to be the norm, and your security operations and network operations centers need to be ready to implement it fast. – Sergio Tang,Vivela
8. Develop A Detailed Business Continuity Plan
A failure to plan is planning to fail. In such incidents, the last thing you should have to do is think. An organization must have well-prescribed business continuity plans across the entire supply chain—including all internal and external parties. These plans should detail who does what, by when, for what result. Business continuity planning is the process of creating systems of prevention and recovery to deal with potential threats. – Spiros Liolis, Micro Focus
9. Perform A (True) Stress Test
Having worked with many enterprises in extremely large environments, these situations point to a challenge: There’s really no way to test how your code and infrastructure work in a crisis without causing a crisis. Just as disaster recovery “tests” are inferior to truly moving production around, you have to truly stress a system to know how it will respond. This is an underutilized way for the cloud to empower IT. -Matthew Wallace, Faction, Inc.
10. Maintain An Accurate Asset Inventory
The Southwest Airlines situation and the FAA computer outage demonstrate the absolute dependency modern companies have on computer systems and applications to run their business. Failures in a single system can often have a cascading effect, resulting in major issues and outages, so it is critical that companies maintain an accurate asset inventory of all applications and the interdependencies between them. -Carlos Morales, Neustar Security Services
11. Place QA Teams On The Same Level As Product Creators And Devs
It is a fact that the speed of system development, coupled with the complexity of the business problems to be solved, is the enemy of reliability and security. As a technological society, we need to come to terms with this dilemma and bring the teams that perform quality assurance on systems to the same decision-maker level as product creators and developers. -Emmanuel Ramos, OZ Digital Consulting
12. Communicate Openly And Clearly During Failures
Tech leaders must have robust backup plans and perform regular system maintenance and improvement to minimize the risk of tech failures. They should also be transparent and communicate clearly during failures. Tech is critical and must be taken seriously. – Ankush Sabharwal, CoRover
13. Consider Timing Carefully When Implementing Changes
Timing is everything. Don’t patch at noon. Don’t upgrade during customer implementations. Unpredictable conditions (the weather in Southwest Airlines’ case) combined with high travel volumes created the perfect storm for a miserable outcome. Be aware of external and internal conditions before carrying out technology updates, and plan carefully! – Saryu Nayyar, Gurucul
14. Prioritize Connecting Siloed Systems
The Southwest and FAA incidents should serve as a learning moment for CIOs. Many organizations have experienced similar problems because they are relying on outdated technology, which can lead to siloed data that delays processes or hurts decision making. Leaders need to prioritize connecting systems, applications and data, which will ultimately help businesses avoid these types of fiascos. – Ed Macosky, Boomi