Geo framework: Fix retry time of verification failures
What does this MR do?
Problems
staging.gitlab.com has package file checksum failures which surfaced the following problems with failure retry logic:
- When an error is raised in
track_checksum_result!
while attempting to transition toverification_succeeded
, some fields have already been cleared in-memory during thebefore_transition
block. This causes that kind of failure to always setverification_retry_count
to 1, instead of incrementing it. -
verification_failed_batch_relation
andneeds_verification_relation
are not excluding things which have averification_retry_at
in the future
To do
-
Call reset
before callingverification_failed_with_message!
in the rescue https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/gitlab/geo/verification_state.rb#L222? -
Scope verification_failed_batch_relation
andneeds_verification_relation
towhere("verification_retry_at < NOW()")
-
Produce query plans with a lot of test data (primary side: https://gitlab.com/-/snippets/2042944 and secondary side: https://gitlab.com/-/snippets/2057444)
Resolves #300255 (closed)
Query plans
Packages::PackageFile.verification_failed_batch
Test data setup: https://gitlab.com/-/snippets/2042944
https://explain.depesz.com/s/lU5T
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on packages_package_files (cost=1.17..8.12 rows=3 width=782) (actual time=4.667..6.325 rows=3 loops=1)
-> Nested Loop (cost=1.17..8.12 rows=3 width=782) (actual time=1.541..1.992 rows=3 loops=1)
-> HashAggregate (cost=0.74..0.77 rows=3 width=40) (actual time=0.731..0.733 rows=3 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=0.42..0.74 rows=3 width=40) (actual time=0.661..0.712 rows=3 loops=1)
-> Limit (cost=0.42..0.71 rows=3 width=16) (actual time=0.583..0.632 rows=3 loops=1)
-> Index Scan using packages_packages_failed_verification on packages_package_files packages_package_files_1 (cost=0.42..42921.84 rows=459104 width=16) (actual time=0.582..0.631 rows=3 loops=1)
Filter: ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 05:44:30.114031-08'::timestamp with time zone))
-> Index Scan using packages_package_files_pkey on packages_package_files (cost=0.43..2.45 rows=1 width=740) (actual time=0.417..0.417 rows=1 loops=3)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 2.559 ms
Execution Time: 6.499 ms
(12 rows)
UPDATE packages_package_files
SET
"verification_state" = 1,
"verification_started_at" = NOW()
WHERE id IN (
SELECT "packages_package_files"."id"
FROM "packages_package_files"
WHERE (
"packages_package_files"."verification_state" IN (3)
)
AND (
"packages_package_files"."verification_retry_at" IS NULL
OR "packages_package_files"."verification_retry_at" < '2021-02-02 05:44:30.114031'
)
ORDER BY verification_retry_at ASC NULLS FIRST
LIMIT 10
)
RETURNING id
Packages::PackageFile.needs_verification_count
Test data setup: https://gitlab.com/-/snippets/2042944
https://explain.depesz.com/s/XOOV
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=71.62..71.63 rows=1 width=8) (actual time=2.610..2.611 rows=1 loops=1)
-> Limit (cost=0.43..59.12 rows=1000 width=4) (actual time=1.547..2.521 rows=1000 loops=1)
-> Index Scan using packages_packages_needs_verification on packages_package_files (cost=0.43..54432.96 rows=927361 width=4) (actual time=1.546..2.386 rows=1000 loops=1)
Filter: ((verification_state = 0) OR ((verification_state = 3) AND ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 05:49:25.612157-08'::timestamp with time zone))))
Planning Time: 0.605 ms
Execution Time: 2.648 ms
(6 rows)
SELECT COUNT(*)
FROM (
SELECT 1 AS one
FROM "packages_package_files"
WHERE (
(
"packages_package_files"."verification_state" IN (0)
) OR (
"packages_package_files"."verification_state" IN (3)
) AND (
"packages_package_files"."verification_retry_at" IS NULL
OR "packages_package_files"."verification_retry_at" < '2021-02-02 05:49:25.612157')
) LIMIT 1000
)
subquery_for_count
PackageFileRegistry.verification_failed_batch
Test data setup: https://gitlab.com/-/snippets/2057444
UPDATE package_file_registry
SET
"verification_state" = 1,
"verification_started_at" = NOW()
WHERE package_file_id IN (
SELECT "package_file_registry"."package_file_id"
FROM "package_file_registry"
WHERE ("package_file_registry"."verification_state" IN (3))
AND (
"package_file_registry"."verification_retry_at" IS NULL
OR "package_file_registry"."verification_retry_at" < '2021-02-02 06:23:50.006618'
)
AND (
"package_file_registry"."state" IN (2)
)
ORDER BY verification_retry_at ASC NULLS FIRST
LIMIT 10
) RETURNING package_file_id
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on package_file_registry (cost=17.70..74.56 rows=10 width=858) (actual time=2.184..11.697 rows=20 loops=1)
-> Nested Loop (cost=17.70..74.56 rows=10 width=858) (actual time=1.633..4.013 rows=20 loops=1)
-> HashAggregate (cost=17.27..17.33 rows=6 width=32) (actual time=0.402..0.413 rows=10 loops=1)
Group Key: "ANY_subquery".package_file_id
-> Subquery Scan on "ANY_subquery" (cost=0.42..17.24 rows=10 width=32) (actual time=0.115..0.389 rows=10 loops=1)
-> Limit (cost=0.42..17.14 rows=10 width=12) (actual time=0.110..0.377 rows=10 loops=1)
-> Index Scan using package_file_registry_failed_verification on package_file_registry package_file_registry_1 (cost=0.42..112897.16 rows=67495 width=12) (actual time=0.109..0.374 rows=10 loops=1)
Filter: ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 06:23:50.006618-08'::timestamp with time zone))
-> Index Scan using index_package_file_registry_on_repository_id on package_file_registry (cost=0.43..9.51 rows=2 width=820) (actual time=0.326..0.357 rows=2 loops=10)
Index Cond: (package_file_id = "ANY_subquery".package_file_id)
Planning Time: 0.420 ms
Execution Time: 11.809 ms
(12 rows)
PackageFileRegistry.needs_verification_count
Test data setup: https://gitlab.com/-/snippets/2057444
SELECT COUNT(*)
FROM (
SELECT 1 AS one
FROM "package_file_registry"
WHERE (
("package_file_registry"."verification_state" IN (0))
OR ("package_file_registry"."verification_state" IN (3)
)
AND (
"package_file_registry"."verification_retry_at" IS NULL
OR "package_file_registry"."verification_retry_at" < '2021-02-02 06:23:50.199724')
)
AND (
"package_file_registry"."state" IN (2)
)
LIMIT 1000
)
subquery_for_count
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=165.26..165.27 rows=1 width=8) (actual time=1.283..1.284 rows=1 loops=1)
-> Limit (cost=0.43..152.76 rows=1000 width=4) (actual time=0.045..1.165 rows=1000 loops=1)
-> Index Scan using index_package_file_registry_on_state on package_file_registry (cost=0.43..61739.31 rows=405295 width=4) (actual time=0.044..0.990 rows=1000 loops=1)
Index Cond: (state = 2)
Filter: ((verification_state = 0) OR ((verification_state = 3) AND ((verification_retry_at IS NULL) OR (verification_retry_at < '2021-02-02 06:23:50.199724-08'::timestamp with time zone))))
Rows Removed by Filter: 524
Planning Time: 0.456 ms
Execution Time: 1.322 ms
(8 rows)
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry => behind feature flag
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Edited by Michael Kozono