Use BlendedDataset to select samples from different languages with different probabilities.
Introduced BlendedDataset
, which is a blend of multiple datasets. Each dataset within it contains only code of a single language. BlendedDataset
will draw sample from each dataset with the given probability
The goal is to train on all 13 languages, but given higher probabilities to new languages.