Optimize dataset preprocessing before model traning/evaluation
This MR optimizes the dataset preprocessing before model training/evaluation by applying the following changes:
- use HF map operations to preprocess the dataset on rank 0 only and cache it. Other ranks load the dataset from the cache.
- support both full preprocessing and streaming. By default, we set streaming to True since our full dataset requires at least 4TB to store cache
Edited by Alexander Chueshev