Robust feature-level adversaries are interpretability tools
Casper S, Nadeau M, Hadfield-Mennel D, Kreiman G
NeurIPS (2022)
You must agree with the terms and conditions specified in this link before downloading any material from the Kreiman lab web site. Downloading any material from the Kreiman Lab web site implies your agreement with this license.