Fork for testing my novel self-attention mechanisms on some common Huggingface transformers in maximal update parametrization (µP) - View it on GitHub
Star
0
Rank
12125876