Back to home

BountyBench Research

Last year, I worked on evaluating the cybersecurity capabilities of LLM agents on real world tasks (from bug bounty/white hat hacker sites) in Percy Liang and Dan Boneh's research lab.

See our paper and poster published in NeurIPS.

Project Page

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Neurips Poster