Operational resilience: More than disaster recovery
To contend with the explosion of cybercrime and its impact on business operations, many organisations are updating their disaster recovery plans to include cyber incident response. Many of the processes and guidelines in traditional disaster recovery plans have changed little in years, sometimes even in over a decade—making them ill-suited to address cyber disasters. More important, at a business level, disaster recovery is just one aspect of a larger discipline: operational resilience.
There is a need for organisations to view disaster recovery in the context of the overall viability of the business—including cyberattack prevention, detection, and response.
Disaster recovery is fairly narrow in its definition and typically viewed in a small timeframe. Operational resilience is much broader, including aspects like the sort of governance you've put in place; how you manage operational risk management; your business continuity plans; and cyber, information, and third-party supplier risk management.
In other words, disaster recovery plans are chiefly concerned with recovery. Operational resilience looks at the bigger picture: your entire ecosystem and what can be done to keep your business operational during disruptive events.
Mending a broken chain
The broader focus of operational resilience requires organisation-wide participation. You cannot simply leave it to a single department or team. Instead, everyone needs to be involved, from executives and the board of directors to individual employees in multiple departments.
In today's climate, it's not just your own organisation that's under threat. Your suppliers, partners, and vendors are targets, too. If a major supplier is compromised or taken down, your business might go down with them.
Leadership needs to understand risk and to know the risk tolerance and risk appetite of the company. That even includes things such as procurement functions and agreements with third-party suppliers. Resilience must be built into everything down to everyday workflows, and if a single supplier is insufficient to manage risk, then diversity of supply is a must.
There are many cases where a cyber event at a supplier rendered multiple organisations unable to fulfil their business outcomes. For instance, consider a retail organisation that is using a logistics provider to get products to their stores and that logistics provider experiences disruption caused by a cyber incident, which leads to stockouts in the retail organisation's stores. Avoiding such scenarios requires a broader perspective. In the context of operational resilience, every risk management scenario and process must consider the supply chain.
Putting the "operation" in operational resilience
The U.S. Department of Transportation proposed a $1 million fine against Colonial Pipeline for "control room management failures" in the 2021 cyberattack that disrupted gas delivery in the Eastern U.S., adding to the company's revenue losses from the attack itself. The government's take is that the company ignored operational resilience: Instead of planning how to manage and limit the scope of an incident, the organisation simply shut down its process control networks the instant malware hit its systems.
Sadly, this is a scenario that's not unique; many organisations don't fully understand the impact of operational technology in a cyber incident.
Ideally, organisations managing national infrastructure or critical supply would think much more about business continuity management and mitigating controls. Such thinking starts with knowing their risk profile and planning appropriately to manage it. Organisations must also test what they can do in terms of shutting down their networks, ensuring they have the capability to sever the connection between information technology and operational technology, so malware doesn't bring everything to a grinding halt.
Bridging the gap between IT and OT
The technology running systems such as pipelines and refineries is distinctly different from that found in a typical office environment. Different network protocols, a different approach to the security stack, and a greater concern for critical safety issues exist. One of the biggest sources of friction for industrial businesses—and the reason operational resilience efforts so often fail—involves a disconnect between information technology (IT) and operational technology (OT).
Neither department fully understands the other's workflows and challenges. This disconnect needs to change. And that begins with a change in perception.
Part of the issue is that cyber is still seen as special. The discussion always seems to conclude with the assumption that the security team or IT department is managing a particular risk, so no one else needs to worry about it. There is a need to demystify cybersecurity. It's only with the proper business understanding and risk ownership that you can put proper resilience mechanisms in place.
What worked well at bp was bringing engineering into cyber and cyber into engineering, giving each team expertise and perspective that it previously lacked.
The truth is that different teams have different priorities. The engineering team might be aware of the importance of cybersecurity but needs to prioritise procedural elements and safety-critical matters. By encouraging interdepartmental collaboration, businesses can determine how to facilitate the rollout of controls and strategies across each environment.
It's ultimately all about context; what is the business trying to achieve, and what outcomes is it trying to fulfil? How does it support those outcomes? What technology does it use? What matters to it in terms of confidentiality, integrity, and availability?
The role of Active Directory in building operational resilience
Active Directory (and Azure AD, in hybrid identity environments) has a central place in the quest for operational resilience.
The single most important application across dimensions is Active Directory. Without it, you cannot fulfil any of your business outcomes. Active Directory is at the very core of your ability to operate and deliver business outcomes, and it needs to be part of your operational resilience strategy instead of being treated as an island.
Take an active role in operational resilience
Disaster recovery plans that focus on natural disasters are insufficient for dealing with modern threats to operational resilience. Because the organisation's identity system is critical to keeping operations running—and is the prime target for cyberattacks—protecting it is paramount. By prioritising the resilience of the identity system, organisations can address one of the most serious threats to operational resilience.