Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
BatsResearch
Fetched on 2025/12/19 06:15
BatsResearch
/
self-jailbreaking
Official code repository for "Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training" -
View it on GitHub
https://arxiv.org/abs/2510.20956
Star
7
Rank
1878810