A Python implementation to extract multimodal features (visual and textual). - View it on GitHub
Star
7
Rank
1939327