Geo: Ensure Package File checksums on primary
What does this MR do?
- Runs background jobs that continuously backfill
packages_package_files.verification_checksum
on Geo primaries. - Concurrent verification jobs are run up to
max_verification_capacity
setting for that Geo node- Actually, up to 1/2 capacity since the setting includes project/wiki repo verification as well
- Concurrent workers do not pick up the same records
- Failures are retried
- Not resolved in this MR, for reviewability: Lost or killed jobs can cause records to become stuck in the "started" state. The next MR handles that !48006 (merged)
Closes #13839 (closed)
Screenshots (strongly suggested)
DB queries
Packages::PackageFile.verification_pending_batch(batch_size: 10)
UPDATE packages_package_files
SET "verification_state" = 1, "verification_started_at" = NOW()
WHERE id IN (
SELECT "packages_package_files"."id"
FROM "packages_package_files"
WHERE (
"packages_package_files"."verification_state" IN (0)
)
ORDER BY verified_at ASC NULLS FIRST
LIMIT 10
)
RETURNING id
Before new index added for this query:
https://explain.depesz.com/s/LZAw
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on packages_package_files (cost=52425.97..52450.14 rows=10 width=814) (actual time=173.424..179.375 rows=10 loops=1)
-> Nested Loop (cost=52425.97..52450.14 rows=10 width=814) (actual time=170.930..173.547 rows=10 loops=1)
-> HashAggregate (cost=52425.54..52425.64 rows=10 width=40) (actual time=170.112..170.123 rows=10 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=52425.39..52425.52 rows=10 width=40) (actual time=170.091..170.096 rows=10 loops=1)
-> Limit (cost=52425.39..52425.42 rows=10 width=16) (actual time=170.013..170.015 rows=10 loops=1)
-> Sort (cost=52425.39..53916.22 rows=596329 width=16) (actual time=170.012..170.013 rows=10 loops=1)
Sort Key: packages_package_files_1.verified_at NULLS FIRST
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using index_packages_package_files_on_verification_state on packages_package_files packages_package_files_1 (cost=0.43..39538.94 rows=596329 width=16) (actual time=0.014..114.827 rows=600000 loops=1)
Index Cond: (verification_state = 0)
-> Index Scan using packages_package_files_pkey on packages_package_files (cost=0.43..2.45 rows=1 width=740) (actual time=0.340..0.340 rows=1 loops=10)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.583 ms
Execution Time: 179.548 ms
(15 rows)
After new index added for this query:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on packages_package_files (cost=1.67..25.84 rows=10 width=814) (actual time=1.746..8.166 rows=10 loops=1)
-> Nested Loop (cost=1.67..25.84 rows=10 width=814) (actual time=1.080..3.792 rows=10 loops=1)
-> HashAggregate (cost=1.24..1.34 rows=10 width=40) (actual time=0.045..0.052 rows=10 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.42..1.22 rows=10 width=40) (actual time=0.029..0.041 rows=10 loops=1)
-> Limit (cost=0.42..1.12 rows=10 width=16) (actual time=0.027..0.037 rows=10 loops=1)
-> Index Scan using packages_packages_pending_verification on packages_package_files packages_package_files_1 (cost=0.42..41269.65 rows=596337 width=16) (actual time=0.026..0.035 rows=10 loops=1)
-> Index Scan using packages_package_files_pkey on packages_package_files (cost=0.43..2.45 rows=1 width=740) (actual time=0.372..0.372 rows=1 loops=10)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.209 ms
Execution Time: 8.229 ms
(11 rows)
Packages::PackageFile.verification_failed_batch(batch_size: 10)
UPDATE packages_package_files
SET "verification_state" = 1, "verification_started_at" = NOW()
WHERE id IN (
SELECT "packages_package_files"."id"
FROM "packages_package_files"
WHERE (
"packages_package_files"."verification_state" IN (3)
)
ORDER BY verification_retry_at ASC NULLS FIRST
LIMIT 10
)
RETURNING id
Before new index added for this query:
https://explain.depesz.com/s/og3T
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on packages_package_files (cost=52660.46..52684.63 rows=10 width=814) (actual time=201.895..210.828 rows=10 loops=1)
-> Nested Loop (cost=52660.46..52684.63 rows=10 width=814) (actual time=200.978..204.428 rows=10 loops=1)
-> HashAggregate (cost=52660.03..52660.13 rows=10 width=40) (actual time=199.984..199.998 rows=10 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=52659.88..52660.01 rows=10 width=40) (actual time=199.971..199.976 rows=10 loops=1)
-> Limit (cost=52659.88..52659.91 rows=10 width=16) (actual time=199.967..199.969 rows=10 loops=1)
-> Sort (cost=52659.88..54163.66 rows=601509 width=16) (actual time=199.966..199.967 rows=10 loops=1)
Sort Key: packages_package_files_1.verification_retry_at NULLS FIRST
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using index_packages_package_files_on_verification_state on packages_package_files packages_package_files_1 (cost=0.43..39661.49 rows=601509 width=16) (actual time=0.041..143.964 rows=600000 loops=1)
Index Cond: (verification_state = 3)
-> Index Scan using packages_package_files_pkey on packages_package_files (cost=0.43..2.45 rows=1 width=740) (actual time=0.440..0.440 rows=1 loops=10)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.215 ms
Execution Time: 210.886 ms
(15 rows)
After new index added for this query:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on packages_package_files (cost=1.67..25.84 rows=10 width=814) (actual time=2.404..7.620 rows=10 loops=1)
-> Nested Loop (cost=1.67..25.84 rows=10 width=814) (actual time=1.381..3.501 rows=10 loops=1)
-> HashAggregate (cost=1.24..1.34 rows=10 width=40) (actual time=0.040..0.049 rows=10 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.42..1.21 rows=10 width=40) (actual time=0.017..0.034 rows=10 loops=1)
-> Limit (cost=0.42..1.11 rows=10 width=16) (actual time=0.013..0.027 rows=10 loops=1)
-> Index Scan using packages_packages_failed_verification on packages_package_files packages_package_files_1 (cost=0.42..41360.49 rows=601517 width=16) (actual time=0.012..0.024 rows=10 loops=1)
-> Index Scan using packages_package_files_pkey on packages_package_files (cost=0.43..2.45 rows=1 width=740) (actual time=0.343..0.343 rows=1 loops=10)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.748 ms
Execution Time: 7.721 ms
(11 rows)
Packages::PackageFile.needs_verification_count(limit: 1000)
SELECT COUNT(*)
FROM (
SELECT 1 AS one
FROM "packages_package_files"
WHERE (
(
"packages_package_files"."verification_state" IN (0)
)
OR (
"packages_package_files"."verification_state" IN (3)
)
) LIMIT 1000
) subquery_for_count
Before adding the partial index:
https://explain.depesz.com/s/Eats
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=73.02..73.03 rows=1 width=8) (actual time=50.320..50.321 rows=1 loops=1)
-> Limit (cost=0.00..60.52 rows=1000 width=4) (actual time=49.970..50.256 rows=1000 loops=1)
-> Seq Scan on packages_package_files (cost=0.00..62159.93 rows=1027029 width=4) (actual time=49.969..50.158 rows=1000 loops=1)
Filter: ((verification_state = 0) OR (verification_state = 3))
Rows Removed by Filter: 300025
Planning Time: 0.117 ms
Execution Time: 50.348 ms
(7 rows)
After adding the partial index:
https://explain.depesz.com/s/Kz96
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=57.94..57.95 rows=1 width=8) (actual time=0.681..0.682 rows=1 loops=1)
-> Limit (cost=0.43..45.44 rows=1000 width=4) (actual time=0.084..0.598 rows=1000 loops=1)
-> Index Only Scan using packages_packages_needs_verification on packages_package_files (cost=0.43..46226.42 rows=1027043 width=4) (actual time=0.083..0.481 rows=1000 loops=1)
Heap Fetches: 1000
Planning Time: 3.037 ms
Execution Time: 0.736 ms
(6 rows)
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry => behind feature flag
-
Documentation (if required) => !49146 (merged) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Edited by Michael Kozono