Friday, October 18, 2024

Security Think Tank: Rebuilding Trust Through Smarter Strategies

In a typical organization, responsibilities are clearly defined: the IT team manages IT systems, while the security team oversees security systems. However, conflicts can arise, particularly when security tools are deployed on end-user devices, servers, and active network components; firewall administrators often experience pushback from IT teams claiming “the firewall is slowing down the network.”

Among the security tools impacting IT-managed systems are anti-malware drivers that interact closely with kernel processes. As cyber threats become more sophisticated, anti-malware capabilities evolve as well. To operate effectively, these tools require elevated access to the core functions of operating systems and applications, leading to potential technical challenges, responsibility overlaps, and incident management issues. To address these complications, IT and security teams must collaborate rather than work at cross-purposes.

Consider a security tool that necessitates installation on IT-managed systems, whether they are end-user computers or servers. The security team should not unilaterally impose the installation of such software on the IT team without thorough justification, nor should they expect blind trust in the assertion that “this software is safe.” Instead, the IT team should advocate for a proper evaluation and performance impact analysis. It’s essential to assess how these security tools, governed by the security team, influence the IT team’s defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) as established with the broader business.

Unfortunately, many organizations, even within regulated industries, have neglected this approach. One notable incident involved CrowdStrike, which distributed a flawed channel update that disrupted operations at several companies. For instance, while other U.S. airlines resumed their operations within two days after the fix was issued, Delta Airlines was unable to return to normal operations for five days. CrowdStrike’s own blog post indicates that this delay reflected a shared accountability between its IT and security teams.

While I’m not suggesting CrowdStrike should bear all the blame, the significant operational disruption highlights a breakdown in collaboration between IT and security teams within affected organizations. The IT team’s primary responsibility is to ensure the availability and performance of critical IT systems, while the security team’s focus is on minimizing the risk of a cyber event’s material impact. However, the incident involving CrowdStrike was not a cyber event; it stemmed from an IT misstep attributed to a security vendor. Similar issues have also arisen from Microsoft errors on numerous occasions.

The failure to restore normal operations within the defined RTOs and RPOs tarnishes the reputations of both IT and security teams among their business peers. Once trust is lost, it can be difficult to rebuild. As an industry, we must learn from these experiences and work more effectively. Here are three key takeaways from this significant incident:

1. Prioritize recovery testing based on established RTOs and RPOs. Security teams should require IT teams to conduct recovery tests that address scenarios in which a security tool renders the operating system unbootable.

2. CIOs and CISOs should present a united front to business executives, articulating the necessity for specialized security tools while also providing assurances that recovery processes have been tested within the defined parameters (e.g., RTOs and RPOs).

3. Collaborate with the company’s legal counsel and procurement to review security vendor contracts, identifying any unfair advantages that may adversely impact compensation for service failures.