A high-throughput and memory-efficient inference and serving engine for LLMs(Fork for contributing. All changes intended for upstream PRs.) - View it on GitHub
Star
3
Rank
3325145