Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts. - View it on GitHub
Star
3
Rank
3289859