Early proof of concept. Operationalizing trust and safety policies as evals at scale (using model-written examples). - View it on GitHub
Star
2
Rank
3685777