Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
lucidrains
Fetched on 2025/01/09 06:29
lucidrains
/
coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk -
View it on GitHub
Star
46
Rank
459477