ci: Download assets from generic package (!96297) · Merge requests · GitLab.org / GitLab

Rémy Coutable requested to merge retrieve-and-store-assets-as-generic-packages into master Aug 25, 2022

What does this MR do and why?

The goal is to implement something similar to !79766 (merged) but for assets.

Current state

This essentially reproduce the built-in caching mechanism that we already have but with this MR the "cache key" includes the hash sum of all asset files.

Currently, the assets cache has the key assets-debian-${DEBIAN_VERSION}-ruby-${RUBY_VERSION}-node-${NODE_ENV}-v2 and we check the hash sum of all asset files to determine if the cache can be used as-is, or if we need to recompile all assets.

This is pretty inefficient as the asset files change often, so in a lot of cases, an MR cannot use the latest cache from master since it's rebuilt only every 2 hours.

What this MR improves

With this new strategy, we build a "cache" package on each master pipelines as soon as any asset file is touched. Cache building is also available as a manual job on other master commits, and MRs.

That way, the chance to use a fresh cache is more likely, and cache won't be downloaded if the cache package doesn't exist (since we build the hash sum beforehand instead of downloading "the latest cache" and checking it's usable or not).

Notes:

The new strategy is gated by the CACHE_ASSETS_AS_PACKAGE being set to true to ensure we can fallback to the legacy strategy in case we have any issue.
We continue to download the "legacy" cache until we switch 100% to this new strategy so that if we fallback to the legacy strategy, everything would continue to work as today. This means in some cases we'd download an outdated cache, and then download a fresh cache from the packages registry, but that's temporary.
If we were able to include all the asset files as dependencies for the cache key, we could use the native caching feature, but it's currently limited to 2 files.

Performance improvements

The new strategy removes the need to run gettext:po_to_json prior to check if the cache is fresh or not. The reason is that gettext:po_to_json generates files under app/assets/javascripts/locale/**/app.js which are currently part of the hash sum calculation. With the new strategy, we calculate the hash sum prior to downloading the cache, so that the app/assets/javascripts/locale/**/app.js aren't part of the cache key, and thus don't need to be generated prior to calculate it. Note that locale/gitlab.pot was added to the dependency files for the cache key since app/assets/javascripts/locale/**/app.js depends on it.

This should save 34 seconds for test assets compilation, and 48 seconds for production assets compilation.

With a fresh cache from package, the performance should be similar to what's currently happening with a fresh legacy cache, but the point here is that a lot more MRs will be able to use a fresh cache.

Test matrix

{
    "fields" : [
        {"key": "a", "label": "Legacy cache"},
        {"key": "b", "label": "New strategy"},
        {"key": "c", "label": "Assets package"},
        {"key": "d", "label": "`compile-test-assets` job"},
        {"key": "e", "label": "Duration", "sortable": true}
    ],
    "items" : [
      {"a": "Empty", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3070853080", "e": "7m 49s"},
      {"a": "Empty", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071663274", "e": "8m 6s"},
      {"a": "Empty", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3071742224)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071776607", "e": "2m 56s"},
      {"a": "Fresh (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3074737586)", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3074844815", "e": "1m 59s"},
      {"a": "Fresh", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027161", "e": "2m 33s"},
      {"a": "Fresh", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027146)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075150789", "e": "2m 48s"}
    ]
}

Previous tests

Legacy cache	New strategy	Assets package	`compile-test-assets` job	Duration
Empty	Disabled	N/A	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941006395	7m 34s
Empty	Enabled	Absent	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941033418	8m 26s
Empty	Enabled	Present	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941118730	2m 26s
Fresh	Disabled	N/A	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941371887	3m 33s
Fresh	Enabled	Absent	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941376574	2m 25s
Fresh	Enabled	Present	https://gitlab.com/gitlab-org/gitlab/-/jobs/2941399334	2m 56s

With no legacy cache

Next steps

Periodically delete old packages: https://docs.gitlab.com/ee/api/packages.html#delete-a-project-package => #375606 (closed)
What about mirrors (including on other instances?) and forks: these should always try to download from the canonical package registry, and never upload.
Stop downloading/uploading "legacy" cache (for now we keep downloading/uploading it so that we can fallback to it if needed), and remove the CACHE_ASSETS_AS_PACKAGE == "true" gatekeeper
We could generate cache packages for every MR commit, to maximize the chances for MRs to use a fresh cache, but that would increase the number of cache packages a lot (so we should also be aggressive in deleting cache packages if we do that)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Sep 28, 2022 by Rémy Coutable

ci: Download assets from generic package