CHAI is a library for dynamic pruning of attention heads for efficient LLM inference. - View it on GitHub
Star
13
Rank
1173499