PyPi group-level package API
🏛 Context
Users can publish Python PyPI packages to their GitLab projects.
Currently, when users install packages from their registry, they need to specify the specific project in which the package resides. This means, if users have packages published through a collection of different projects, they need to provide remotes for each project that contains packages:
pip install \
--extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/1/packages/pypi/simple \
--extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/2/packages/pypi/simple \
--extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/3/packages/pypi/simple \
my-pkg1 my-pkg2 other-pkg and-another-pkg
Being able to use a single group remote would be much more user friendly:
pip install \
--extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/groups/1/-/packages/pypi/simple \
my-pkg1 my-pkg2 other-pkg and-another-pkg
And that is exactly what this MR does!
🔎 What does this MR do?
- Adds two new API endpoints to support installing PyPI packages using a group-level remote
- Updates the related documentation.
🐘 Database
The Packages::Pypi::PackageFinder
processes two types of queries:
-
Project level searches - this query does not change here
-
Group level searches - The Finder was recently set up to accept group-level queries in preparation for this MR, but it has never been used for group-level queries yet, so this is a new query.
Visual Explain Plan: https://explain.depesz.com/s/ySoR
SQL Query
SELECT "packages_packages".* FROM "packages_packages" INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id" WHERE "packages_packages"."project_id" IN ( SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN ( WITH RECURSIVE "base_and_descendants" AS ( ( SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 785414 ) UNION ( SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id" ) ) SELECT id FROM "base_and_descendants" AS "namespaces" ) ) AND "packages_packages"."status" = 0 AND "packages_packages"."package_type" = 5 AND "packages_packages"."version" IS NOT NULL AND "packages_package_files"."file_name" = 'mypkg-0.1.tar.gz' AND "packages_package_files"."file_sha256" = '\x66633964663031326136386538663436323834333631633962623137376331613561363336333134616663313532363033663732383665316666343533653033';
Explain plan (cold cache on production replica)
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=3015.05..5049.26 rows=1 width=83) (actual time=45.048..60.175 rows=2 loops=1) Buffers: shared hit=7017 read=159 I/O Timings: read=51.173 -> Nested Loop (cost=3014.49..5008.48 rows=13 width=83) (actual time=30.825..59.150 rows=2 loops=1) Buffers: shared hit=7012 read=154 I/O Timings: read=50.207 -> HashAggregate (cost=3014.06..3051.20 rows=3714 width=4) (actual time=2.685..3.255 rows=1661 loops=1) Group Key: projects.id Buffers: shared hit=2178 -> Nested Loop (cost=1590.73..3004.77 rows=3714 width=4) (actual time=0.669..2.223 rows=1661 loops=1) Buffers: shared hit=2178 -> HashAggregate (cost=1590.29..1592.20 rows=191 width=4) (actual time=0.653..0.669 rows=60 loops=1) Group Key: namespaces.id Buffers: shared hit=305 -> CTE Scan on base_and_descendants namespaces (cost=1584.08..1587.90 rows=191 width=4) (actual time=0.059..0.632 rows=60 loops=1) Buffers: shared hit=305 CTE base_and_descendants -> Recursive Union (cost=0.43..1584.08 rows=191 width=348) (actual time=0.056..0.557 rows=60 loops=1) Buffers: shared hit=305 -> Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1 (cost=0.43..3.45 rows=1 width=348) (actual time=0.029..0.030 rows=1 loops=1) Index Cond: (((type)::text = 'Group'::text) AND (id = 785414)) Buffers: shared hit=4 -> Nested Loop (cost=0.56..157.68 rows=19 width=348) (actual time=0.016..0.098 rows=15 loops=4) Buffers: shared hit=301 -> WorkTable Scan on base_and_descendants (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.002 rows=15 loops=4) -> Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2 (cost=0.56..15.73 rows=2 width=348) (actual time=0.004..0.006 rows=1 loops=60) Index Cond: (parent_id = base_and_descendants.id) Filter: ((type)::text = 'Group'::text) Buffers: shared hit=301 -> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..7.21 rows=19 width=8) (actual time=0.005..0.022 rows=28 loops=60) Index Cond: (namespace_id = namespaces.id) Heap Fetches: 238 Buffers: shared hit=1873 -> Index Scan using index_packages_packages_on_project_id_and_package_type on packages_packages (cost=0.43..0.50 rows=3 width=83) (actual time=0.033..0.033 rows=0 loops=1661) Index Cond: ((project_id = projects.id) AND (package_type = 5)) Filter: ((version IS NOT NULL) AND (status = 0)) Buffers: shared hit=4834 read=154 I/O Timings: read=50.207 -> Index Scan using index_packages_package_files_on_package_id_and_file_name on packages_package_files (cost=0.56..3.13 rows=1 width=8) (actual time=0.266..0.508 rows=1 loops=2) Index Cond: ((package_id = packages_packages.id) AND ((file_name)::text = 'mypkg-0.1.tar.gz'::text)) Filter: (file_sha256 = '\x66633964663031326136386538663436323834333631633962623137376331613561363336333134616663313532363033663732383665316666343533653033'::bytea) Buffers: shared hit=5 read=5 I/O Timings: read=0.966 Planning Time: 7.949 ms Execution Time: 60.838 ms (45 rows)
📽 Screenshots (strongly suggested)
→ pip3 install --index-url http://root:$TOKEN@gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple --no-deps my.pypi.package --trusted-host gdk.test
Looking in indexes: http://root:****@gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple
Collecting my.pypi.package
Downloading http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff/my.pypi.package-0.0.1-py3-none-any.whl (1.6 kB)
Installing collected packages: my.pypi.package
Successfully installed my.pypi.package-0.0.1
→ curl --user root:$TOKEN "http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple/my.pypi.package"
<!DOCTYPE html>
<html>
<head>
<title>Links for pypi-package-1</title>
</head>
<body>
<h1>Links for pypi-package-1</h1>
<a href="http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff/my.pypi.package-0.0.1-py3-none-any.whl#sha256=3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff" data-requires-python=">=3.6">my.pypi.package-0.0.1-py3-none-any.whl</a><br><a href="http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2/my.pypi.package-0.0.1.tar.gz#sha256=5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2" data-requires-python=">=3.6">my.pypi.package-0.0.1.tar.gz</a><br>
</body>
</html>
→ curl --user root:$TOKEN "http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2/my.pypi.package-0.0.1.tar.gz#sha256=5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2" >> pkg.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1163 100 1163 0 0 3313 0 --:--:-- --:--:-- --:--:-- 3313
☑ Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. - [-] I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Related to #225545 (closed)