CHAI is a library for dynamic pruning of attention heads for efficient LLM inference. - View it on GitHub
Star
17
Rank
1000682