UKGovernmentBEIS/misalignment-continuation

UKGovernmentBEIS

Fetched on 2026/03/03 01:26

A proof-of-concept eval testing whether models continue with (or mention) pre-filled malign behaviours. - View it on GitHub

Star

Rank

14024084