Red-Teaming for Generative AI: Silver Bullet or Security Theater?

Published in arXiv, 2024

Recommended citation: Feffer, M., Sinha, A., Lipton, Z. C., & Heidari, H. (2024). Red-Teaming for Generative AI: Silver Bullet or Security Theater? arXiv (2024) https://arxiv.org/abs/2401.15897

Download paper here

In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of the relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices. Our analysis reveals that prior methods and practices of AI red-teaming diverge along several axes, including the purpose of the activity (which is often vague), the artifact under evaluation, the setting in which the activity is conducted (e.g., actors, resources, and methods), and the resulting decisions it informs (e.g., reporting, disclosure, and mitigation).

Recommended citation: Feffer, M., Sinha, A., Lipton, Z. C., & Heidari, H. (2024). Red-Teaming for Generative AI: Silver Bullet or Security Theater? arXiv (2024)