Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
lucidrains
Fetched on 2026/01/13 13:54
lucidrains
/
coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk -
View it on GitHub
Star
47
Rank
506084