ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration - View it on GitHub
Star
0
Rank
12125866