Follow
Aleksandar Makelov
Aleksandar Makelov
Visiting Scientist, Guide Labs
Verified email at guidelabs.ai - Homepage
Title
Cited by
Cited by
Year
Towards deep learning models resistant to adversarial attacks
A Madry, A Makelov, L Schmidt, D Tsipras, A Vladu
ICLR 2018, 2018
143492018
Towards deep learning models resistant to adversarial attacks
A Mądry, A Makelov, L Schmidt, D Tsipras, A Vladu
stat 1050 (9), 2017
562017
Towards principled evaluations of sparse autoencoders for interpretability and control
A Makelov, G Lange, N Nanda
Secure and Trustworthy Large Language Models Workshop, ICLR 2024, 2024
212024
Rethinking backdoor attacks
A Khaddaj*, G Leclerc*, A Makelov*, K Georgiev, H Salman, A Ilyas, ...
International Conference on Machine Learning, 16216-16236, 2023
192023
Is this the subspace you are looking for? an interpretability illusion for subspace activation patching
A Makelov, G Lange, N Nanda
ICLR 2024, 2023
142023
Expansion in lifts of graphs
A Makelov
72015
Sparse Autoencoders Match Supervised Features for Model Steering on the IOI Task
A Makelov
ICML 2024 Workshop on Mechanistic Interpretability Spotlight, 2024
22024
Backdoor or Feature? A New Perspective on Data Poisoning
A Khaddaj, G Leclerc, A Makelov, K Georgiev, A Ilyas, H Salman, A Madry
2
Evaluating Sparse Autoencoders for Controlling Open-Ended Text Generation
A Makelov, N Monson, J Adebayo
Second NeurIPS Workshop on Attributing Model Behavior at Scale, 2024
2024
mandala: Compositional Memoization for Simple & Powerful Scientific Data Management
A Makelov
Python in Science (SciPy) Conference 2024, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–10