AI Research in Security Operations
Pushing the frontier of AI agents for real security work
Benchmarks
macOS Threat Investigation
BlueBench-Intrusion-001: Real macOS infostealer intrusion spanning incident response, threat hunting, and detection engineering
36 samples · 9 models · Mar 2026
NYU CTF Bench
Real CTF challenges from CSAW competitions covering reverse engineering, forensics, and miscellaneous problem-solving
81 samples · 11 models · Feb 2026
Cybench (Defensive Subset)
Defensive security CTF challenges testing forensics, reverse engineering, and miscellaneous security skills
18 samples · 10 models · Jan 2026
BOTSv3 Blue Team CTF
Blue team CTF scenarios testing incident response and threat hunting
51 samples · 15 models · Dec 2025
Sigma Detection Classification
Multi-label classification of MITRE ATT&CK tactics and techniques from Sigma rules
2733 samples · 12 models · Jan 2026
CyberMetric
Multiple-choice cybersecurity knowledge evaluation across 10,000 questions
10180 samples · 13 models · Feb 2026
AI for the blue team.
Scale detection, response, and threat hunting beyond headcount. Build AI agents across your entire security stack.
Book a demo