Prepend language identifier to every training sequence. Do not add special tokens.
Fix 3 bugs:
- Prepend language identifier to every training sequence.
- Do not add any special tokens, just use plain english to as language identifier for now.
- Fix random sampling
Edited by Hongtao Yang