A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings. - View it on GitHub
Star
1
Rank
5980881