CHAI is a library for dynamic pruning of attention heads for efficient LLM inference. - View it on GitHub
Star
10
Rank
1287664