lucidrains/coordinate-descent-attention

lucidrains

Fetched on 2026/06/26 11:03

Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk - View it on GitHub

Star

Rank

545731

lucidrains

lucidrains / coordinate-descent-attention