A concise but complete full-attention transformer with a set of promising experimental features from various papers - View it on GitHub
Star
4783
Rank
6495