Backfill Epics into elasticsearch
What does this MR do and why?
Context
Currently all epic searches are doing a Basic Search and we want to allow Advanced Search to be used when Elasticsearch is available for faster and better searching.
To achieve this, we need the following:
- Create the index in Elasticsearch: Create Epic Elasticsearch index (!121635 - merged)
- Make sure epics are created/updated/deleted when needed: Keep epics index up to date in Elasticsearch (!123526 - merged)
- Enable the feature flag and remove from code
-
Backfill all epics
👈 This MR - Perform advanced search using the new epic index: !124072 (merged)
Details
Adds an Elastic migration to backfill Epics.
- If elastic namespace limiting is disabled: indexes all epics
- If elastic namespace limiting is enabled: indexes epics belonging to groups that are indexed
NOTE: Before this migration is merged, elastic_index_epics
feature flag has to be fully enabled.
Estimated run time
For the total number of epic records in Gitlab.com, this migration will require 11 migration runs. However, namespace limiting is enabled so there will be less epics -> this migration will finish in less than 11 migration runs.
Database queries
When limiting is enabled (Gitlab.com has limiting enabled)
Getting batch of epics: SELECT "epics"."id" FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" ASC LIMIT 1 OFFSET 1000
41.248 ms
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67679
Getting the last id: SELECT "epics".* FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" DESC LIMIT 1
10.122 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67680
For each unique group in the list of epics:
1: Find the group: SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 9970 LIMIT 1
4.694 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67683
2: Check if the group or its descendants are in ElasticsearchIndexedNamespace
: SELECT 1 AS one FROM "elasticsearch_indexed_namespaces" WHERE "elasticsearch_indexed_namespaces"."namespace_id" IN (9970, 9971, 9972)
0.695 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67684
Checking if there are any epics (run once per migration): SELECT 1 AS one FROM "epics" LIMIT 1
2.492 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67681
Finding the maximum Epic id (run once per migration): SELECT MAX("epics"."id") FROM "epics"
0.622 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67682
When limiting is not enabled
Getting batch of epics: SELECT "epics"."id" FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" ASC LIMIT 1 OFFSET 1000
41.248 ms
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67679
Getting the last id: SELECT "epics".* FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" DESC LIMIT 1
10.122 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67680
Checking if there are any epics (run once per migration): SELECT 1 AS one FROM "epics" LIMIT 1
2.492 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67681
Finding the maximum Epic id (run once per migration): SELECT MAX("epics"."id") FROM "epics"
0.622 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67682
Logs
MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 0
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 48
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":48}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(48); maximum_epic_id(48)
Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillEpics] updating with completed: true
If the BATCH_SIZE is 10 and ITERATIONS_PER_RUN is 2 as an example:
MigrationWorker: migration[BackfillEpics] executing migrate method
BackfillEpics","message":"[Elastic::Migration: 20230614090600] Indexing epics starting from id = 0
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 10
[Elastic::Migration: 20230614090600] Executing iteration 2 with last epic id: 20
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":20}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(20); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: false
MigrationWorker: migration[BackfillEpics] kicking off next migration batch
MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 20
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 30
[Elastic::Migration: 20230614090600] Executing iteration 2 with last epic id: 40
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":40}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(40); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: false
MigrationWorker: migration[BackfillEpics] kicking off next migration batch
MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 40
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 48
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":48}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(48); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: true
How to set up and validate locally
- Disable elastic index limiting
- Execute the migration worker a few times:
Elastic::MigrationWorker.new.perform
- Check that
Epic.count
records are enqueued:Elastic::ProcessBookkeepingService.queue_size
- Enable elastic index limiting and add a group containing epics
- Delete the migration record from elasticsearch:
curl -X "DELETE" "http://localhost:9200/gitlab-development-migrations/_doc/20230614090600"
- Execute the migration worker a few times:
Elastic::MigrationWorker.new.perform
- Check that the number of epics in the limited group is equal to the number of records enqueued:
Elastic::ProcessBookkeepingService.queue_size
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #250699 (closed)