Information security fundamentally revolves around managing information risk. The recent global outage caused by Crowdstrike rendered numerous information systems unusable, which hindered many companies from accessing vital business data due to unexpected and prolonged downtime.
This disruption affected not only the information systems themselves but also the processes associated with information management. It can be classified as both an information risk event and an information security incident. The repercussions were significant across various dimensions, including operational, financial, reputational, legal, technological, and regulatory aspects.
The Crowdstrike incident clearly showcases a failure in digital trust among vendors and their clients. So, what steps are necessary to rebuild this trust?
The Missteps
A critical area where many organizations, especially startups, often falter is in the automation of testing. While it’s common for startups to automate their internal testing and subsequently release updates to fix bugs—essentially relying on their customers for quality assurance—the practice is becoming increasingly prevalent as part of ‘agile methodologies’ or to enhance the efficiency of CI/CD pipelines.
Many security companies are now marketing themselves as ‘Swiss army knife’ solutions, offering automated updates and taking over maintenance tasks for businesses. The danger arises when an automated update occurs, and issues go undetected due to inadequate automated testing, leading to substantial outages for businesses operating in essential sectors and potentially jeopardizing public safety.
The Right Approach
In the wake of major security incidents, many companies are becoming increasingly transparent with their stakeholders, including customers, partners, employees, and investors. Crowdstrike exemplified this approach by acknowledging their fault and promptly collaborating with their teams and Microsoft to develop a solution. Their executive management took the initiative to reach out to several customers to offer remediation and recovery support while conducting a comprehensive root cause analysis (RCA) and maintaining transparency about the failures in their security controls.
The company has outlined an action plan focusing on enhancements to personnel, processes, and technology. They appear to be gearing up for heightened regulatory scrutiny, especially as the incident considerably impacted critical sectors. In the EU, legislation such as the Digital Operational Resilience Act (DORA), the Network and Information Security (NIS2) Directive, and the Cyber Resilience Act will require Crowdstrike to assure lawmakers that similar incidents will not occur in the future.
Enhancing Future Practices
Moving forward, organizations should prioritize business resilience with a specific emphasis on business continuity management (BCM), disaster recovery, third-party risk management (TPRM), and incident management. Risk monitoring for various scenarios, including supply chain disruptions, should be among the first steps taken. These should be incorporated into existing risk registers, business impact analyses (BIAs), and risk and control self-assessments (RCSAs).
Comprehensive risk treatment plans should include scrutinizing product security within third-party risk assessments, implementing more thorough testing for vendor updates (even those from endpoint detection tools), disabling automatic updates where possible, staggered deployments of vendor updates, updating incident response protocols and disaster recovery plans to mitigate third-party risks, and integrating risk simulations for third-party security incidents into cybersecurity drills.
Niel Harper serves as vice chair of the ISACA board of directors.