A proof-of-concept eval testing whether models continue with (or mention) pre-filled malign behaviours. - View it on GitHub
Star
0
Rank
13713992