Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - View it on GitHub
Star
0
Rank
12125847