Maven virtual registries: improve the senddependency workhorse command
🎈 Context
In the maven virtual registry feature, we use heavily the send-dependency from workhorse.
The send-dependency
command is pretty powerful. It instruct workhorse to:
- Get the contents from a remote url (passed in the parameters).
- Ask rails for an upload location.
- Stream the contents from the remote url and pipe into a "
T
junction" (see https://about.gitlab.com/blog/2024/02/15/compose-readers-and-writers-in-golang-applications/#reader--%3E-reader-%2B-writer for more details) to send it to the external client and the upload location in parallel or at the same time. - Confirm the upload to rails.
From a rails point of view, workhorse will hit two endpoints for:
- Getting the upload location. This is called the upload authorization.
- Confirming that the file was sent to the upload location. This is mainly used to create or persist whatever business objects we need on the rails side. For example, link this upload with an Active Record instance.
In #451242 (comment 1890233598), we raised the Wild Idea (
The upload authorization comes from the upload handling in workhorse. This handling is mainly used when the external clients upload a file to GitLab. However, here, the upload is not initiated by an external client. It's initiated by the rails backend sending the send-dependency
command to workhorse. Thus, we can easily imagine: why should workhorse contact rails again for the upload authorization if rails could send this authorization directly along with the send-dependency
command? This would help with the overall execution time and also lower the load on the rails backend.
The upload authorization endpoints in rails are usually used to validate that an upload can be done. For example, during the upload authorization, user permissions are checked to make sure that the current user can upload a file. Another check can be the file size to avoid accepting super large send-dependency
command case, rails is already doing part of these validations (such as user permissions). There is no need to tell workhorse to ask for an upload authorization, we can simply build the upload authorization response structure and send it along with the send-dependency
command. Workhorse can then simply read this authorization to upload the file to the target destination and then, simply confirm the upload.
This optimization has been described in #461561 (closed) for virtual registries and this MR implements it.
🔬 What does this MR do and why?
- Update the workhorse
send-dependency
command. TheUploadConfig
structure has now a new field:AuthorizedUploadResponse
. This new field will hold the upload authorization structure (essentially, the destination to upload the file (remote or local), the maximum file size and the digest functions that are allowed).- Notice that the
send-dependency
command is used by several different features. Thus, by default, this new field is not set and the workhorse part will behave as it currently behaves.
- Notice that the
- Update the related
send-dependency
tests. - Update the rails workhorse helper to support the new field.
- Update the Maven virtual registry to set this new field when using the
send-dependency
command. - Update the related rails specs.
🗒 MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
🖥 Screenshots or screen recordings
No UI changes.
⚗ How to set up and validate locally
$ cd gitlab/workhorse
$ make
$ gdk restart workhorse
Let's set up a maven virtual registry and pull a file through it. This pull will use the send-dependency
command and upload the file to the GitLab instance. This upload should not need to ping the upload authorize endpoint.
Have a root group and a PAT ready.
In a rails console:
Feature.enable(:virtual_registry_maven) # enable the maven virtual registry
group = Group.find(<root group id>)
r = ::VirtualRegistries::Packages::Maven::Registry.create!(group: root_group)
u = ::VirtualRegistries::Packages::Maven::Upstream.create!(group: root_group, url: 'https://repo1.maven.org/maven2')
VirtualRegistries::Packages::Maven::RegistryUpstream.create!(group: root_group, registry: r, upstream: u)
Now, from the gitlab folder, watch the rails logs with:
$ tail -f log/development.log | grep "Started "
Pull a file with:
$ curl --header "Private-Token: <PAT>" "http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<r.id>/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom"
Check the logs:
Started GET "/api/v4/virtual_registries/packages/maven/<r.id>/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom" for 172.16.123.1 at 2024-09-24 16:54:15 +0200
Started POST "/api/v4/virtual_registries/packages/maven/<r.id>/org/springframework/spring-web/6.1.12/spring-web-6.1.12.pom/upload" for 127.0.0.1 at 2024-09-24 16:54:15 +0200
As we can see, there is:
- the request received by the virtual registry endpoint. This request will return the workhorse
send-dependency
command with the upload authorization. - Workhorse will upload the file to its destination and confirm the upload. That's the second request (on
/upload
).
We completely avoided the upload authorization endpoint (url ending with /authrorize
)
$ curl
will not use the send-dependency
command. To reset the cache:
u.reload.cached_responses.destroy_all
🚀 Performance review
For the performance review, we're going to use to following scenario:
- Use this dummy maven application.
- Configure a virtual registry in our local GitLab instance that targets maven central, which is the official public registry.
- Configure the maven application so that we replace the maven central reference with the virtual registry endpoint. We will use this
settings.xml
file<settings> <mirrors> <mirror> <id>gitlab-maven</id> <name>GitLab proxy of central repo</name> <url>http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<registry.id></url> <mirrorOf>central</mirrorOf> </mirror> </mirrors> <servers> <server> <id>gitlab-maven</id> <configuration> <httpHeaders> <property> <name>Private-Token</name> <value><PAT></value> </property> </httpHeaders> </configuration> </server> </servers> </settings>
-
$ mvn compile -s settings.xml
that will pull packages (through the virtual registry) and compile the application.- In this configuration, we will pull close to
1000
files.
- In this configuration, we will pull close to
- To keep the analysis focused on the web requests done between workhorse and rails, the object storage is disabled in our local GitLab instance (the file system will be used to store the uploaded files).
- This is, by no means, a highly accurate performance analysis but the goal here is to have a glimpse on the improvements.
Between each scenario, we will make sure that:
- the virtual registry is completely empty (no cached entries) to make sure that all packages are downloaded from Maven central and they go through the
send-dependency
command. -
~/.m2/repository
is removed to make sure that$ mvn
will pull the packages and not use the ones that are present in the local maven cache.
Here are the results:
Metric | On master
|
With this MR | Improvement |
---|---|---|---|
Execution time reported by $ mvn
|
04:16 min | 03:58 min | 7 % |
Amount of files pulled | 916 | 916 | - |
Amount of requests done to rails | 2748 | 1832 | 33.33 % |
We can say that the change has a positive impact.