Include empty/no repo in projects_not_indexed rake task
What does this MR do and why?
Related to #374536 (closed)
This MR adds fixes for elastic rake tasks: projects_not_indexed
and index_projects_status
- Projects with empty or missing repositories should be included in counts
- Respect index limiting setting to determine project list to use
- Add missing specs
- Fix rubocop todos for both
elastic.rake
andelastic_rake_spec.rb
Database
Before
Project.all.count
Query: Note: this query still exists for self managed instances that do not use index limiting
- explain plan (postgresai): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19874/commands/64951
select count(*) from projects
IndexStatus.count
Query: - explain plan (local): https://explain.depesz.com/s/gPxS
- explain plan (postgresai): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19874/commands/64958
SELECT COUNT(*) FROM "index_statuses"
After/New queries
IndexStatus.for_project(Project.all).count
Query: - explain plan (postgresai): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19874/commands/64952
SELECT COUNT(*) FROM "index_statuses" WHERE "index_statuses"."project_id" IN (SELECT "projects"."id" FROM "projects")
::Gitlab::CurrentSettings.elasticsearch_limited_projects.count
Query: - explain plan (local): https://explain.depesz.com/s/pvXW
- explain plan (postgresai): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19874/commands/64954
SELECT
COUNT(*)
FROM ((
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."namespace_id" IN ( WITH "descendants_base_cte" AS MATERIALIZED (
SELECT
"namespaces"."id",
"namespaces"."traversal_ids"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (
SELECT
"elasticsearch_indexed_namespaces"."namespace_id"
FROM
"elasticsearch_indexed_namespaces"
)
),
"superset" AS (
SELECT
d1.traversal_ids
FROM
descendants_base_cte d1
WHERE
NOT EXISTS (
SELECT
1
FROM
descendants_base_cte d2
WHERE
d2.id = ANY (d1.traversal_ids)
AND d2.id <> d1.id))
SELECT DISTINCT
"id"
FROM
"superset",
"namespaces"
WHERE
next_traversal_ids_sibling ("superset"."traversal_ids") > "namespaces"."traversal_ids"
AND "superset"."traversal_ids" <= "namespaces"."traversal_ids"))
UNION (
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"elasticsearch_indexed_projects"."project_id"
FROM
"elasticsearch_indexed_projects"))) projects
IndexStatus.for_project(::Gitlab::CurrentSettings.elasticsearch_limited_projects).count
Query: - explain plan (local): https://explain.depesz.com/s/yk1F#html
- explain plan (postgresai): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19874/commands/64959
SELECT
COUNT(*)
FROM
"index_statuses"
WHERE
"index_statuses"."project_id" IN (
SELECT
"projects"."id"
FROM ((
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."namespace_id" IN ( WITH "descendants_base_cte" AS MATERIALIZED (
SELECT
"namespaces"."id",
"namespaces"."traversal_ids"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (
SELECT
"elasticsearch_indexed_namespaces"."namespace_id"
FROM
"elasticsearch_indexed_namespaces"
)
),
"superset" AS (
SELECT
d1.traversal_ids
FROM
descendants_base_cte d1
WHERE
NOT EXISTS (
SELECT
1
FROM
descendants_base_cte d2
WHERE
d2.id = ANY (d1.traversal_ids)
AND d2.id <> d1.id))
SELECT DISTINCT
"id"
FROM
"superset",
"namespaces"
WHERE
next_traversal_ids_sibling ("superset"."traversal_ids") > "namespaces"."traversal_ids"
AND "superset"."traversal_ids" <= "namespaces"."traversal_ids"))
UNION (
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"elasticsearch_indexed_projects"."project_id"
FROM
"elasticsearch_indexed_projects"))) projects)
Screenshots or screen recordings
N/A
How to set up and validate locally
- setup gdk for elasticsearch and index everything
- run the rake tasks and verify everything is indexed:
bundle exec rake gitlab:elastic:index_projects_status Indexing is 100.00% complete (36/36 projects) bundle exec rake gitlab:elastic:projects_not_indexed All projects are currently indexed
- enable index limiting in the admin UI for 1 project: http://gdk.test:3000/admin/application_settings/advanced_search
- run the rake tasks and verify everything is indexed and only 1 project is checked:
bundle exec rake gitlab:elastic:index_projects_status Indexing is 100.00% complete (1/1 projects) bundle exec rake gitlab:elastic:projects_not_indexed All projects are currently indexed
- delete index_status record for that project:
::Gitlab::CurrentSettings.elasticsearch_limited_projects.first.index_status.delete
- run the rake tasks and verify nothing is indexed and only 1 project is checked
bundle exec rake gitlab:elastic:index_projects_status Indexing is 0.00% complete (0/1 projects) bundle exec rake gitlab:elastic:projects_not_indexed Project 'gitlab-org/gitlab-shell' (ID: 3) isn't indexed. 1 out of 1 non-indexed projects shown.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Terri Chu