Data Protection DevOps Application Security

F5 Labs unveils monthly AI model security leaderboards

Mon, 2nd Mar 2026

F5 Labs has launched monthly AI security leaderboards that score and rank widely used AI models using two new metrics designed to compare model risk and resistance under attack.

The rankings, the Comprehensive AI Security Index and the Agentic Resistance Score, come with monthly research notes that explain score changes and cover recent AI security developments. The programme aims to give security leaders a consistent way to assess models during selection and deployment.

The launch comes as businesses expand pilots of generative AI and AI agents into customer service, software development, data analysis, and internal operations. That shift has sharpened focus on how models behave under adversarial inputs, including attempts to bypass safety controls and extract sensitive information.

F5's approach tests models with attack prompts that mirror techniques used by threat actors. F5 evaluates major models and publishes monthly top-10 rankings that show relative movement over time, alongside threat intelligence and analysis from F5 Labs researchers.

Two measures

CASI is positioned as a baseline index that combines security testing with performance measures. It includes an "Average Performance" score for standardised tasks under normal conditions, a "Risk-to-Performance Ratio" that reflects trade-offs between safety and performance, and a "Cost of Security" measure that compares inference cost with the CASI score.

ARS measures how well a system withstands sustained attacks carried out by an AI agent working towards a goal over multiple steps. Rather than relying on a single prompt, the attacker conducts longer interactions and varies tactics to try to bypass safeguards.

ARS also assesses "Required Sophistication", described as the minimum ingenuity needed for a successful compromise; "Defensive Endurance", reflecting how long a system remains secure during extended attacks; and "Counter-Intelligence", which measures whether failed attempts expose signals or behaviour that could aid later exploits.

F5 also described the leaderboards as a way to identify "paths of least resistance" and the minimum compute resources needed for simple and complex attacks. The framing reflects a wider industry debate about how to compare AI safety claims across model providers using reproducible tests.

Attack library

F5 linked the new benchmarks to assets gained through its acquisition of CalypsoAI. The combined resources include an AI vulnerability library updated with more than 10,000 new attack prompts each month and drawing on more than a year of accumulated attack data.

F5 presented the library and scoring system as complementary inputs into model evaluation, designed to show which models are improving and which are falling behind, as well as the trade-offs between performance and resilience under attack.

Security teams have struggled with a lack of consistent yardsticks for comparing model risk, particularly when AI systems are embedded in workflows that touch proprietary data and critical business processes. Model providers often publish safety notes and benchmarking results, but enterprises typically need independent validation against their threat models and usage patterns.

"Deploying unverified AI models into critical infrastructure is not innovation; it is negligence. Organisations need a way to continuously quantify resilience. F5 Labs AI Leaderboards offer that standard. These rankings isolate specific weaknesses in the model layer, giving security teams the intelligence they need to govern inference and block attacks before they happen," said Kunal Anand, Chief Product Officer, F5.

Related tools

The leaderboards sit alongside other F5 products for AI security testing and runtime controls. The benchmarks complement the general availability of F5 AI Guardrails and F5 AI Red Team.

F5 AI Guardrails applies controls that govern how AI interacts with users and data. F5 AI Red Team uses autonomous AI agents to run repeated tests that adapt over time.

F5 also linked the leaderboards to its broader application and API security portfolio, including DDoS protection and security operations features such as inspection and logging. It described the work as part of a wider effort to provide visibility into AI system behaviour and governance at runtime.

IDC Research Director Grace Trinidad described the acquisition and associated research as expanding F5's coverage from traffic-focused security into AI-specific controls.

"F5's existing traffic inspection and logging capabilities are expanded with the acquisition of CalypsoAI, adding runtime AI governance capabilities and visibility into AI system behaviour. Paired with their security research arm, F5 Labs, that updates CASI and ARS rankings every month, F5 customers are provided with both a useful set of security metrics to consider when selecting an AI model, as well as the tools needed to constrain how AI systems are used, what data they may access, and how outputs are generated," said Trinidad.

F5 Labs said it will continue to update CASI and ARS monthly and publish AI Security Insights each month explaining score movements and emerging attack techniques and research.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google