Sort maven metadata results by recent file
🏛 Context
Maven (Java) packages are made up of a variety of files. Each package has a maven-metadata.xml
file that is unique not to the specific package version, but is used to store metadata for all versions. So if I have package Foo
with versions 1.0, 2.0, 2.1, and 3.0, there is a maven-metadata.xml
file that describes all versions of the package. So you could say the maven-metadata.xml
file is a "versionless" file that belongs to many versions of the same package.
In the GitLab package registry each version is it's own Package
record. Every time a user publishes a new Maven package, their local Maven client will update the maven-metadata.xml
file, and upload it as part of the package. Because this file has no version attached to it, it is assigned to a versionless package record in the GitLab database. As more package versions are published, more maven-metadata.xml
files are uploaded to the versionless package.
When a user installs a Maven package, the Maven client first requests the maven-metadata.xml
to understand what versions are available. Currently, the way GitLab finds this file is we find the matching package with no version and return the last uploaded maven-metadata.xml
file.
There are occasions where customers have had different versions of the same package uploaded to different projects, sometimes in different subgroups. This means that for each project, a versionless package will exist with the maven-metadata.xml
files that were included when those versions were published. The problem is, if a user searches the group for a package (using Maven's group-level endpoint), we may or may not return the most recent maven-metadata.xml
file.
🔎 What does this MR do?
Currently, in this situation, projects are sorted by ID (by default), so the last matching versionless package will be used to fetch maven-metadata.xml
file. But if it is not the last one to have had a file uploaded to it, it will have an older version of the maven-metadata.xml
file.
This MR updates the Maven package finder so when we are dealing with versionless packages, we don't just return the .last
package, but we return the package with the most recently created PackageFile
.
🐘 Database
A new scope: .order_by_package_file
is introduced.
This test query finds a package that is nested several namespaces deep.
Query without the scope:
Explain plan: https://postgres.ai/console/shared/831f855a-7489-4627-8a5c-c7d475d98cf0
SELECT "packages_packages".*
FROM "packages_packages"
INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE "packages_packages"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" in (
WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 785414)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"
) AND (
EXISTS (
SELECT 1 FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 3983112
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0,10,20)
)
)
AND "packages_maven_metadata"."path" = 'com/sandbox/bananas'
ORDER BY "packages_packages"."id" DESC
LIMIT 1;
Time: 17.614 ms
- planning: 2.582 ms
- execution: 15.032 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 5027 (~39.30 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Query with the scope:
Explain plan: https://postgres.ai/console/shared/ab7926e4-d7b9-497e-8236-5bc9a7deed4a
The scope changes final ORDER BY
and adds an additional JOIN
against packages_package_files
SELECT "packages_packages".*
FROM "packages_packages"
INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE "packages_packages"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" in (
WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 785414)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces"
) AND (
EXISTS (
SELECT 1 FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 3983112
AND (project_authorizations.project_id = projects.id)
)
OR projects.visibility_level IN (0,10,20)
)
)
AND "packages_maven_metadata"."path" = 'com/sandbox/bananas'
ORDER BY packages_package_files.created_at DESC
LIMIT 1;
Time: 25.130 ms
- planning: 2.939 ms
- execution: 22.191 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 5779 (~45.10 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
📸 Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Related to #285334 (closed)