Skip to content

Prepend language identifier to every training sequence. Do not add special tokens.

Hongtao Yang requested to merge hotfix_prefix_newtoken_random into main

Fix 3 bugs:

  1. Prepend language identifier to every training sequence.
  2. Do not add any special tokens, just use plain english to as language identifier for now.
  3. Fix random sampling
Edited by Hongtao Yang

Merge request reports

Loading