Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process with disentangled search and retrieval head aggregation, in Pytorch - View it on GitHub
Star
50
Rank
396983