MUG dataset contains user-agent interaction (language instructions and screen clicks) that aims to guide agents choosing the intended object. It is used for training and evaluation for UI grounding models in our paper (https://arxiv.org/abs/2209.15099) -
View it on GitHub