Network Security AI Security Zero Trust Security

Microsoft unveils MDASH to find Windows security flaws

Thu, 14th May 2026

Microsoft has introduced MDASH, a multi-model AI security system that helped identify 16 vulnerabilities in the Windows networking and authentication stack.

The findings included four critical remote code execution flaws in areas including the Windows kernel TCP/IP stack and the IKEv2 service.

Built by Microsoft's Autonomous Code Security team, the system is already in use by internal security engineering teams, while a small group of customers is testing it in a limited private preview.

Known internally as the Microsoft Security multi-model agentic scanning harness, MDASH uses more than 100 specialised AI agents and combines frontier and distilled models in a staged process to find, assess, and verify software flaws.

Microsoft described the system as a pipeline that prepares a target code base, scans for possible weaknesses, validates findings through a separate set of debating agents, removes duplicates, and then, where possible, attempts to prove the flaw with triggering inputs.

According to Microsoft, this approach differs from single-model systems, which can miss bugs that require reasoning across several files, complex execution paths, or concurrent processes.

Researchers using MDASH found 16 vulnerabilities that were included in a Patch Tuesday security release covering Windows networking components and related services. Of those, 10 were in kernel-mode software and six in user-mode software, with most reachable from a network position without credentials.

Benchmark results

Microsoft also published benchmark data for MDASH, saying it found 21 out of 21 deliberately planted vulnerabilities with no false positives in a private sample device driver called StorageDrive, which the company uses in interviews for offensive security researchers.

In retrospective testing against historical Microsoft Security Response Centre cases, MDASH achieved a 96% recall rate for 28 confirmed bugs in clfs.sys over five years and a 100% recall rate for seven confirmed bugs in tcpip.sys over the same period.

The tool also scored 88.45% on the public CyberGym benchmark, which covers 1,507 real-world vulnerability reproduction tasks from 188 OSS-Fuzz projects. Microsoft said that was the highest published score on the leaderboard at the time, about five points ahead of the next entry.

Microsoft argued that these results suggest the surrounding orchestration system contributes significantly to performance, rather than any one model alone.

Two examples

Microsoft highlighted two vulnerabilities as examples of the kinds of bug it believes MDASH can uncover more effectively than simpler AI scanning tools.

One, tracked as CVE-2026-33827, involved a remote, unauthenticated use-after-free flaw in tcpip.sys linked to Strict Source and Record Route processing in the Windows IPv4 receive path. According to Microsoft, the issue stemmed from improper lifetime management of a reference-counted object, creating conditions in which freed memory could later be accessed in kernel context.

Microsoft described the flaw as difficult to detect because the lifetime violation was not obvious within a single local code segment and depended on non-trivial control flow, reference ownership semantics, and concurrent cleanup routines elsewhere in the networking stack.

The second example, CVE-2026-33824, affected the IKEEXT service used for IKE and AuthIP keying for IPsec. Microsoft said a remote, unauthenticated attacker could trigger a deterministic double-free over UDP/500 on a host configured as an IKEv2 responder by sending crafted packets tied to IKE_SA_INIT and fragmentation handling.

Because IKEEXT runs as LocalSystem inside svchost.exe, the flaw created a pre-authentication remote code execution path into a highly privileged Windows context, the company said.

Microsoft said this bug spanned six files and involved an ownership error caused by a shallow memory copy that duplicated pointers to heap allocations without duplicating the underlying memory. As a result, two parts of the service believed they owned the same allocation and later freed it separately.

Internal effort

The work was carried out by the Autonomous Code Security team in collaboration with Windows Attack Research and Protection, which focuses on advanced Windows offensive research.

Several members of the Autonomous Code Security team previously worked on Team Atlanta, the group that won the DARPA AI Cyber Challenge with an autonomous system designed to find and patch bugs in open-source software.

Microsoft said its software estate presents unusual challenges for automated security auditing because much of the code is proprietary and therefore absent from public model training data, while the operational environment leaves little room for false positives in core systems.

Plugins can be added to the MDASH pipeline to inject specialist knowledge that general-purpose models may not infer on their own, including kernel calling conventions, lock rules, IPC trust boundaries, and file-system structures.

One such extension was built for CLFS to help construct triggering log files from candidate findings. Microsoft said that proving layer was important in moving from a possible issue to a validated vulnerability that engineering teams can fix.

"The Microsoft Security multi-model agentic scanning harness (codename MDASH) is helping our engineering teams meaningfully improve security outcomes using generally available AI models-today," said Taesoo Kim, vice president of Agentic Security at Microsoft.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google