Skip to content

Caching logic in Maven virtual registry (cold cache)

David Fernandez requested to merge 467983-cache-logic into master

Context

With Maven virtual registry MVC (single upstream and... (&14137), we're starting the work on Virtual Registries. Virtual Registries is a feature that could be described as the evolution of the dependency proxy idea: having the GitLab instance play man in the middle between clients and artifacts registries. Artifacts can be any kind but we're going to focus on packages and container images, starting with Maven packages specifically.

In other words, the GitLab instance can be configured to contact a set of upstreams and expose a specific virtual registry url that "talks" the artifact type API, in this case the Maven API. When a request hits this API, we'll check with the set of upstreams and the first one to answer successfully "wins". We will pull the response from that upstream, cache it in the GitLab instance and return it to the client.

At the time of this writing, we're in the first iteration of the implementation. To avoid working on a very large scope, we restricted it to:

  • Only maven packages supported.
  • Works at a root Group level only.
  • A root Group can only have 1 maven virtual registry.
  • A virtual maven registry can only have 1 upstream.
  • This being a multi MR implementation, everything is gated behind a feature flag.

Obviously, the Maven package API exposed by virtual registries can be pretty complex (upstream and caching handling). The implementation of these has been split in steps.

In Add a basic maven virtual registry download end... (!160891 - merged), we added a basic download endpoint. We would get the file from the upstream and we would return it to the client. That's it.

With Maven Virtual Registry: Cache logic (#467983 - closed), we're starting the caching logic implementation. As described in #467983 (closed), we have two cases. Basically a cold cache and a warm cache situation. We're not going to implement everything in a single MR, that would be too large.

This MR will deal with the cold cache situation. Basically, we pull a file from the upstream and that file has not been put in cache yet. We will download the file from upstream, return it to the client while we upload it to GitLab to create the cache entry. All of this thanks to the send_dependency logic in workhorse.

🤔 What does this MR do and why?

  • Update the maven virtual registry download endpoint to:
    • check the file on upstream.
    • if all good, use the workhorse send_dependency logic to return + upload the file at the same time.
  • Add the upload endpoints (authorize and finalize) to the maven virtual registry API class so that send_dependency can upload the file to GitLab.
    • The finalize endpoint will create the related cached_response response in the related upstream object.
  • Add or update all the related specs.

The maven virtual registry is still being implemented and is behind a feature flag : #474863.

🏁 MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

🦄 Screenshots or screen recordings

No UI changes

How to set up and validate locally

We're not going to use a fully fledged Maven client to verify this MR. Instead, we will simply use $ curl to simulate calls done by Maven clients.

We're going to set up an upstream that will target the public maven registry.

  1. Enable the feature flag : Feature.enable(:virtual_registry_maven).

  2. Have a PAT and a root group (any visiblity) ready.

  3. For the virtual registry settings, we don't have an UI or API (yet), we thus need to create them in a rails console:

    r = ::VirtualRegistries::Packages::Maven::Registry.create!(group: <root_group>)
    u = ::VirtualRegistries::Packages::Maven::Upstream.create!(group: <root_group>, url: 'https://repo1.maven.org/maven2')
    VirtualRegistries::Packages::Maven::RegistryUpstream.create!(group: <root_group>, registry: r, upstream: u)
  4. Pull a package file:

    $ curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<r.id>/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom"
    • Check that you get the .pom file back.
  5. In the rails console:

    ::VirtualRegistries::Packages::Maven::CachedResponse.last
    => #<VirtualRegistries::Packages::Maven::CachedResponse:0x00000001689326a0
     id: 5,
     group_id: 811,
     upstream_id: 10,
     upstream_checked_at: Fri, 23 Aug 2024 06:26:27.558326000 UTC +00:00,
     downloaded_at: Fri, 23 Aug 2024 06:26:27.558326000 UTC +00:00,
     created_at: Fri, 23 Aug 2024 06:26:27.579483000 UTC +00:00,
     updated_at: Fri, 23 Aug 2024 06:26:27.579483000 UTC +00:00,
     file_store: 1,
     size: 2398,
     downloads_count: 1,
     relative_path: "/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom",
     file: "upload",
     object_storage_key: "[FILTERED]",
     upstream_etag: "\"54ce07f4124259b2ea58548e9d620004\"",
     content_type: "[FILTERED]">

Cache entry created ! 🎉

Again, we can't re-execute the $ curl command and hope to use the cache entry. That's the warm cache situation and it will be implemented in a follow up MR. 😸

Edited by David Fernandez

Merge request reports

Loading