Check for plans in update all mirrors worker
What does this MR do?
When we disable the free_period_for_pull_mirroring
feature flag, we try to process a lot of mirrors on free private projects, that can no longer be processed. If we do the check in Ruby, we put a lot of strain on the worker, and make things worse for paying and public projects.
We could mark the mirrors as failed if they don't have a license, or not as mirrors at all, but then you'd need to re-enable your mirror after starting a subscription, which isn't ideal.
This change makes the query only select projects that are either public or on paid plans. This prevents us from needing to iterate through a bunch of mirrors that were set up on free private projects when pull mirrors were free, and still allows those projects to just work if they do start using a paid plan.
Query plans
Old:
gitlabhq_production=> EXPLAIN ANALYZE SELECT "projects".* FROM "projects" INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id WHERE "projects"."archived" = FALSE AND "projects"."mirror" = TRUE AND "import_state"."status" NOT IN ('scheduled', 'started') AND (import_state.next_execution_timestamp <= '2020-03-23 13:43:55.245995') AND (import_state.retry_count <= 14) ORDER BY import_state.next_execution_timestamp LIMIT 500;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.99..144891.78 rows=500 width=761) (actual time=167.322..185.315 rows=4 loops=1)
-> Nested Loop (cost=0.99..357301.67 rows=1233 width=761) (actual time=167.321..185.313 rows=4 loops=1)
-> Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state (cost=0.56..102274.28 rows=95083 width=12) (actual time=0.498..81.842 rows=19303 loops=1)
Index Cond: ((next_execution_timestamp <= '2020-03-23 13:43:55.245995+00'::timestamp with time zone) AND (retry_count <= 14))
Filter: ((status)::text <> ALL ('{scheduled,started}'::text[]))
Rows Removed by Filter: 180
-> Index Scan using projects_pkey on projects (cost=0.43..2.67 rows=1 width=753) (actual time=0.005..0.005 rows=0 loops=19303)
Index Cond: (id = import_state.project_id)
Filter: ((NOT archived) AND mirror)
Rows Removed by Filter: 1
Planning time: 0.709 ms
Execution time: 185.500 ms
(12 rows)
New:
gitlabhq_production=> EXPLAIN ANALYZE SELECT "projects".* FROM "projects" INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id INNER JOIN namespaces AS root_namespaces ON root_namespaces.id = (WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE (id = projects.namespace_id))
gitlabhq_production(> UNION
gitlabhq_production(> (SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "id" FROM "base_and_ancestors" AS "namespaces" WHERE "namespaces"."parent_id" IS NULL) LEFT JOIN gitlab_subscriptions ON gitlab_subscriptions.namespace_id = root_namespaces.id LEFT JOIN plans ON plans.id = gitlab_subscriptions.hosted_plan_id WHERE "projects"."archived" = FALSE AND "projects"."mirror" = TRUE AND "import_state"."status" NOT IN ('scheduled', 'started') AND (import_state.next_execution_timestamp <= '2020-03-23 13:44:00.840198') AND (import_state.retry_count <= 14) AND (plans.name IN ('bronze','silver','gold','early_adopter') OR projects.visibility_level = 20) ORDER BY import_state.next_execution_timestamp LIMIT 500;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=357.34..455398.41 rows=500 width=761) (actual time=158.396..176.673 rows=2 loops=1)
-> Nested Loop Left Join (cost=357.34..800319.53 rows=879 width=761) (actual time=158.395..176.671 rows=2 loops=1)
Join Filter: (plans.id = gitlab_subscriptions.hosted_plan_id)
Rows Removed by Join Filter: 30
Filter: (((plans.name)::text = ANY ('{bronze,silver,gold,early_adopter}'::text[])) OR (projects.visibility_level = 20))
Rows Removed by Filter: 4
-> Nested Loop Left Join (cost=357.34..800148.33 rows=1238 width=765) (actual time=158.228..176.613 rows=6 loops=1)
-> Nested Loop (cost=356.91..799566.69 rows=1238 width=765) (actual time=158.212..176.539 rows=6 loops=1)
-> Nested Loop (cost=0.99..358474.72 rows=1238 width=761) (actual time=158.127..176.354 rows=6 loops=1)
-> Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state (cost=0.56..102625.89 rows=95458 width=12) (actual time=0.503..76.648 rows=19305 loops=1)
Index Cond: ((next_execution_timestamp <= '2020-03-23 13:44:00.840198+00'::timestamp with time zone) AND (retry_count <= 14))
Filter: ((status)::text <> ALL ('{scheduled,started}'::text[]))
Rows Removed by Filter: 174
-> Index Scan using projects_pkey on projects (cost=0.43..2.67 rows=1 width=753) (actual time=0.005..0.005 rows=0 loops=19305)
Index Cond: (id = import_state.project_id)
Filter: ((NOT archived) AND mirror)
Rows Removed by Filter: 1
-> Index Only Scan using namespaces_pkey on namespaces root_namespaces (cost=355.92..356.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=6)
Index Cond: (id = (SubPlan 2))
Heap Fetches: 0
SubPlan 2
-> CTE Scan on base_and_ancestors namespaces_2 (cost=353.47..355.49 rows=1 width=4) (actual time=0.023..0.025 rows=1 loops=6)
Filter: (parent_id IS NULL)
CTE base_and_ancestors
-> Recursive Union (cost=0.43..353.47 rows=101 width=326) (actual time=0.019..0.021 rows=1 loops=6)
-> Index Scan using namespaces_pkey on namespaces (cost=0.43..3.45 rows=1 width=326) (actual time=0.010..0.010 rows=1 loops=6)
Index Cond: (id = projects.namespace_id)
-> Nested Loop (cost=0.43..34.80 rows=10 width=326) (actual time=0.001..0.002 rows=0 loops=6)
-> WorkTable Scan on base_and_ancestors (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.001 rows=1 loops=6)
-> Index Scan using namespaces_pkey on namespaces namespaces_1 (cost=0.43..3.45 rows=1 width=326) (actual time=0.000..0.001 rows=0 loops=6)
Index Cond: (id = base_and_ancestors.parent_id)
-> Index Scan using index_gitlab_subscriptions_on_namespace_id on gitlab_subscriptions (cost=0.43..0.46 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=6)
Index Cond: (namespace_id = root_namespaces.id)
-> Materialize (cost=0.00..4.09 rows=6 width=11) (actual time=0.003..0.005 rows=6 loops=6)
-> Seq Scan on plans (cost=0.00..4.06 rows=6 width=11) (actual time=0.009..0.009 rows=6 loops=1)
Planning time: 9.320 ms
Execution time: 177.614 ms
(37 rows)
Note that these can change depending on the mirrors scheduled, but it looks pretty good so far.
For #212074 (closed).