Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training. - View it on GitHub
Star
322
Rank
95328