This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow. - View it on GitHub
Star
27
Rank
726408