Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" - View it on GitHub
Star
0
Rank
13813667