LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Published in arXiv 2025, 2025
Recommended citation: Mura, R., Piras, G., Lukosiute, K., Pintor, M., Karbasi, A., & Biggio, B. (2025). "LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback." arXiv preprint arXiv:2510.08604.
Download Paper
