Gitstar Ranking
Users
Organizations
Repositories
Rankings
Users
Organizations
Repositories
Sign in with GitHub
sayakpaul
Fetched on 2025/07/26 22:47
sayakpaul
/
count-tokens-hf-datasets
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow. -
View it on GitHub
Star
27
Rank
726408