Skip to content

Faster gettext extractor

Lukas Eipert requested to merge leipert-move-js-gettext-parser into master

What does this MR do and why?

The rake task gettext:regenerate takes around 50 seconds on my machine to extract all externalized strings from ruby, haml, erb, js and vue sources. It is implemented in a blocking way and therefore parsing roughly 20000 files takes a long time.

We are introducing a new tooling script tooling/bin/gettext_extractor which is 3x faster by making the following improvements:

  1. We parallelize the extraction of ruby, haml and erb with the parallel gem.
  2. Instead of passing files through a parser stack and checking which files a parser can parse, we directly call the parser for each file type. The original implementation e.g. checked for every file, whether it is a glade file (whatever that is), which took a long time.
  3. js and vue files are still parsed by a shell-out to the pre-existing node script: scripts/frontend/extract_gettext_all.js

This new parser is now used under the hood for the rake tasks: gettext:regenerate and gettext:update_check.

There is still room for improvement, and we should look into the following ideas:

  1. We currently scan ee/spec, which probably should not scan. We still scan it for now, in order to have parity in results
  2. The shell-out to node can be changed to stream the data, rather than blocking until all frontend files are scanned. However initial tests have not shown any performance improvements from that.
  3. The HamlParser probably could use GetText::RubyParser instead of RubyGettextExtractor under the hood.
  4. We likely can improve parsing performance a lot by adding guard checks to see whether a file actually contains the literal names of the gettext methods, e.g. _(, n_( or N_(). We are already doing that in the Frontend script and improved performance by at least 20 percent there: !115561 (comment 1324725274)

Screenshots or screen recordings

N/A

How to set up and validate locally

  1. tooling/bin/gettext_extractor locale/gitlab.pot should create no diff.
  2. bin/rake gettext:regenerate and bin/rake gettext:updated_check should also create no diff.
  3. Change an externalized string
  4. All the commands above should find it

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Peter Leitzen

Merge request reports

Loading