Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch - View it on GitHub
Star
92
Rank
248360