gmh5225/MasterThesis - Gitstar Ranking

gmh5225

Fetched on 2026/05/08 11:55

LLM finetuning pipeline consisting in a separate critic Reward Model, implements PPO and GRPO, and evaluates them both individually - View it on GitHub

Star

Rank

13993518

gmh5225

gmh5225 / MasterThesis