Faster gettext extractor
What does this MR do and why?
The rake task gettext:regenerate
takes around 50 seconds on my machine
to extract all externalized strings from ruby, haml, erb, js and vue
sources. It is implemented in a blocking way and therefore parsing
roughly 20000 files takes a long time.
We are introducing a new tooling script tooling/bin/gettext_extractor
which is 3x faster by making the following improvements:
- We parallelize the extraction of ruby, haml and erb with the
parallel
gem. - Instead of passing files through a parser stack and checking which files a parser can parse, we directly call the parser for each file type. The original implementation e.g. checked for every file, whether it is a glade file (whatever that is), which took a long time.
- js and vue files are still parsed by a shell-out to the pre-existing
node script:
scripts/frontend/extract_gettext_all.js
This new parser is now used under the hood for the rake tasks:
gettext:regenerate
and gettext:update_check
.
There is still room for improvement, and we should look into the following ideas:
- We currently scan
ee/spec
, which probably should not scan. We still scan it for now, in order to have parity in results - The shell-out to node can be changed to stream the data, rather than blocking until all frontend files are scanned. However initial tests have not shown any performance improvements from that.
- The
HamlParser
probably could useGetText::RubyParser
instead ofRubyGettextExtractor
under the hood. - We likely can improve parsing performance a lot by adding guard
checks to see whether a file actually contains the literal names of
the gettext methods, e.g.
_(
,n_(
orN_(
). We are already doing that in the Frontend script and improved performance by at least 20 percent there: !115561 (comment 1324725274)
Screenshots or screen recordings
N/A
How to set up and validate locally
-
tooling/bin/gettext_extractor locale/gitlab.pot
should create no diff. -
bin/rake gettext:regenerate
andbin/rake gettext:updated_check
should also create no diff. - Change an externalized string
- All the commands above should find it
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Peter Leitzen