This is an evaluation set for the problem of directed/targeted test input generation. We use it to benchmark the ability of Large Language Models for generating inputs to reach a certain code location or produce a particular result. -
View it on GitHub