CHAI is a library for dynamic pruning of attention heads for efficient LLM inference. - View it on GitHub
Star
23
Rank
909698