Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch - View it on GitHub
Star
98
Rank
258064