This is an evaluation set for the problem of directed/targeted test input generation. We use it to benchmark the ability of Large Language Models for generating inputs to reach a certain code location or produce a particular result. - View it on GitHub
Star
34
Rank
644795