SecurityBrief UK - Technology news for CISOs & cybersecurity decision-makers
Flux result 21b28ffc 2d00 47ff 8fb1 45052168cbea

ControlMonkey adds observability recovery for cloud tools

Wed, 25th Mar 2026

ControlMonkey has expanded its cloud configuration disaster recovery platform to cover observability and monitoring systems, adding support for Datadog, New Relic, Dynatrace, Grafana Cloud, and Splunk.

The update lets customers take daily snapshots of observability configurations, including dashboards, alert rules, monitors, escalation policies, and service monitoring definitions. Those snapshots can be used to restore environments during incidents and outages.

The addition extends ControlMonkey's recovery offering beyond infrastructure and network settings into another part of the cloud control plane that operations teams rely on when systems fail. Observability tools are often central to diagnosing faults and managing incident response, but the configurations behind them are frequently created manually and may not be included in formal recovery plans.

That can leave engineering teams exposed if dashboards or alerting rules are changed, deleted, or misconfigured. ControlMonkey argues that restoring the observability layer can be as important as restoring workloads or data when operators are trying to regain visibility into a disrupted system.

Amir Regev, Global Director of Partnerships and Cloud Alliances at ControlMonkey, outlined how the company distinguishes its approach from conventional backup and infrastructure-as-code tools.

"ControlMonkey is built specifically for configuration disaster recovery, not just for backup, code management, or visibility," said Regev.

He said the system continuously captures infrastructure configuration, converts it into deployable definitions, and stores each snapshot as a versioned record in a customer's Git environment. According to Regev, that supports recovery of individual resources as well as whole environments, including systems not fully managed through infrastructure-as-code frameworks.

"That distinction becomes critical during incidents," Regev said. "Traditional backup solutions focus primarily on data and workload recovery, while IaC tools assume environments are fully codified and up to date. In reality, teams often discover during an outage that they cannot accurately reconstruct infrastructure state and are forced into manual recovery under pressure. Observability and version-history tools can help identify what changed, but they do not provide a deterministic, deployable recovery path for production configurations."

Recovery Focus

ControlMonkey says its platform maintains daily configuration snapshots as versioned recovery points and supports recovery with dependency handling and ordering. The process is designed to reduce the need for engineers to manually rebuild interdependent settings across multiple services while systems are under strain.

"In a real incident, the priority is not simply understanding what changed, but restoring systems quickly and correctly," Regev said. "We maintain daily configuration snapshots as versioned recovery points and support recovery with dependency handling and ordering, so teams are not manually stitching together configurations across services during an outage."

According to Regev, the challenge becomes more pronounced in larger environments where dependencies span identity systems, networking, and monitoring tools.

"This is especially important in complex environments where dependencies span identity providers, networking, and monitoring systems," he said. "Observability tools can help assess impact, but they do not provide a repeatable mechanism to restore configurations to a known-good state. ControlMonkey is designed to deliver predictable recovery under pressure, ensuring organizations can regain operational visibility and control when it matters most."

The company also highlighted broader functions for configuration monitoring and governance. Users can track changes in observability environments, detect configuration drift across providers, and view a resilience score intended to show coverage and restore readiness across infrastructure, network, and observability layers.

Service Providers

ControlMonkey also said the product is relevant for managed security service providers and other managed service operators working across multiple customer environments.

"For managed security service providers, the value is in having a multi-tenant control and recovery layer rather than another monitoring surface," Regev said. "ControlMonkey provides a centralized control plane with governance, guardrails, Git-centric workflows, and integrated operations that align with managed service delivery models."

He said the approach is intended to give providers a common way to discover configurations, assess recovery coverage, and carry out restoration across cloud and software-as-a-service environments.

"This allows MSSPs to standardize how they discover configurations, assess coverage, and execute recovery across environments spanning cloud providers and SaaS platforms," Regev said. "Instead of relying on fragmented tooling across observability platforms, they get a consistent, repeatable approach to configuration recovery, along with clear audit evidence and disaster recovery readiness reporting."

One customer reference came from HoneyBook, which uses Datadog in its software operations.

"As a SaaS platform, observability is critical to how we operate and respond to incidents," said Doron Gutman, Director of DevOps and DevSecOps at HoneyBook. "Our Datadog dashboards, monitors, and alerting policies represent critical operational knowledge. ControlMonkey gives us confidence that our observability configuration is versioned and recoverable, ensuring we maintain visibility during incidents."