Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
lucidrains
Fetched on 2025/02/19 04:27
lucidrains
/
coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk -
View it on GitHub
Star
46
Rank
484352