Latent-space Attacks for Refusal Evasion in Language Models

Published in arXiv 2026, 2026

Recommended citation: Piras, G., Mura, R., Brau, F., Pintor, M., Oneto, L., Roli, F., & Biggio, B. (2026). "Latent-space Attacks for Refusal Evasion in Language Models." arXiv preprint arXiv:2605.21706.
Download Paper