Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
lucidrains
Fetched on 2026/06/26 11:03
lucidrains
/
coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk -
View it on GitHub
Star
47
Rank
545731