ci: Download assets from generic package
What does this MR do and why?
The goal is to implement something similar to !79766 (merged) but for assets.
Related to #371244 (closed).
Current state
This essentially reproduce the built-in caching mechanism that we already have but with this MR the "cache key" includes the hash sum of all asset files.
Currently, the assets cache has the key assets-debian-${DEBIAN_VERSION}-ruby-${RUBY_VERSION}-node-${NODE_ENV}-v2
and we check the hash sum of all asset files to determine if the cache can be used as-is, or if we need to recompile all assets.
This is pretty inefficient as the asset files change often, so in a lot of cases, an MR cannot use the latest cache from master
since it's rebuilt only every 2 hours.
What this MR improves
With this new strategy, we build a "cache" package on each master
pipelines as soon as any asset file is touched. Cache building is also available as a manual job on other master
commits, and MRs.
That way, the chance to use a fresh cache is more likely, and cache won't be downloaded if the cache package doesn't exist (since we build the hash sum beforehand instead of downloading "the latest cache" and checking it's usable or not).
Notes:
- The new strategy is gated by the
CACHE_ASSETS_AS_PACKAGE
being set totrue
to ensure we can fallback to the legacy strategy in case we have any issue. - We continue to download the "legacy" cache until we switch 100% to this new strategy so that if we fallback to the legacy strategy, everything would continue to work as today. This means in some cases we'd download an outdated cache, and then download a fresh cache from the packages registry, but that's temporary.
- If we were able to include all the asset files as dependencies for the cache key, we could use the native caching feature, but it's currently limited to 2 files.
Performance improvements
The new strategy removes the need to run gettext:po_to_json
prior to check if the cache is fresh or not. The reason is that gettext:po_to_json
generates files under app/assets/javascripts/locale/**/app.js
which are currently part of the hash sum calculation. With the new strategy, we calculate the hash sum prior to downloading the cache, so that the app/assets/javascripts/locale/**/app.js
aren't part of the cache key, and thus don't need to be generated prior to calculate it. Note that locale/gitlab.pot
was added to the dependency files for the cache key since app/assets/javascripts/locale/**/app.js
depends on it.
This should save 34 seconds for test assets compilation, and 48 seconds for production assets compilation.
With a fresh cache from package, the performance should be similar to what's currently happening with a fresh legacy cache, but the point here is that a lot more MRs will be able to use a fresh cache.
Test matrix
{
"fields" : [
{"key": "a", "label": "Legacy cache"},
{"key": "b", "label": "New strategy"},
{"key": "c", "label": "Assets package"},
{"key": "d", "label": "`compile-test-assets` job"},
{"key": "e", "label": "Duration", "sortable": true}
],
"items" : [
{"a": "Empty", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3070853080", "e": "7m 49s"},
{"a": "Empty", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071663274", "e": "8m 6s"},
{"a": "Empty", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3071742224)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071776607", "e": "2m 56s"},
{"a": "Fresh (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3074737586)", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3074844815", "e": "1m 59s"},
{"a": "Fresh", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027161", "e": "2m 33s"},
{"a": "Fresh", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027146)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075150789", "e": "2m 48s"}
]
}
Previous tests
Legacy cache | New strategy | Assets package |
compile-test-assets job |
Duration |
---|---|---|---|---|
Empty | Disabled | N/A | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941006395 | 7m 34s |
Empty | Enabled | Absent | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941033418 | 8m 26s |
Empty | Enabled | Present | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941118730 | 2m 26s |
Fresh | Disabled | N/A | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941371887 | 3m 33s |
Fresh | Enabled | Absent | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941376574 | 2m 25s |
Fresh | Enabled | Present | https://gitlab.com/gitlab-org/gitlab/-/jobs/2941399334 | 2m 56s |
- With no legacy cache
Next steps
-
Periodically delete old packages: https://docs.gitlab.com/ee/api/packages.html#delete-a-project-package => #375606 (closed) -
What about mirrors (including on other instances?) and forks: these should always try to download from the canonical package registry, and never upload. -
Stop downloading/uploading "legacy" cache (for now we keep downloading/uploading it so that we can fallback to it if needed), and remove the CACHE_ASSETS_AS_PACKAGE == "true"
gatekeeper -
We could generate cache packages for every MR commit, to maximize the chances for MRs to use a fresh cache, but that would increase the number of cache packages a lot (so we should also be aggressive in deleting cache packages if we do that)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.