jackaduma/SpeakerRecognition-ResNet-GhostVLAD

jackaduma

Fetched on 2026/06/23 00:55

Utterance-level Aggregation For Speaker Recognition In The Wild, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end - View it on GitHub

Star

Rank

2010938

jackaduma

jackaduma / SpeakerRecognition-ResNet-GhostVLAD