Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
BatsResearch
Fetched on 2026/03/02 15:12
BatsResearch
/
self-jailbreaking
[ICLR'26] Official code for "Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training" -
View it on GitHub
https://arxiv.org/abs/2510.20956
Star
11
Rank
1482100