Classify a prompt as safe or prompt-injection/jailbreak. Fine-tuned ModernBERT-base, ~570 MB, ~6 ms per prompt on Apple Silicon, sub-20 ms on CPU. Drop it in front of your LLM calls as a cheap first-pass filter; you decide what to do with the verdict. -
View it on GitHub