Toy but end-to-end RL drug-optimization pipeline combining real drug–target activity data, a learned promiscuity→cytotoxicity safety signal (Discovery2), and a clear path to production-scale RL via Mixture-of-Experts (MoE) with MILES-style distributed rollouts. -
View it on GitHub