Robust feature-level adversaries are interpretability tools

Casper S, Nadeau M, Hadfield-Mennel D, Kreiman G

NeurIPS (2022)

You must agree with the terms and conditions specified in this link before downloading any material from the Kreiman lab web site. Downloading any material from the Kreiman Lab web site implies your agreement with this license.

GitHub Repository with all data and code: https://github.com/thestephencasper/feature_level_adv

Manuscript PDF

Supplementary Material PDF

Top