Add a migration to reindex commits to fix repository_access_level
What does this MR do and why?
In this MR we are reindexing all the commits docs to fix the repository_access_level
. The following is the approach:
- Aggregate the docs by the field
rid
which is missingschema_version
and take the 100 project_ids. The missingschema_version
docs are the target docs because all the docs which haveschema_version
are the new docs means they already have the correct value ofrepository_access_level
. - Run
update_by_query
on these100
projects in the batch of10_000
docs
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Make sure Elasticsearch is enabled
- Open the Rails console
bundle exec rails c
- Check commits on ES
curl -XGET "http://localhost:9200/gitlab-development-commits/_search" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "schema_version"
}
}
]
}
}
}' | json_pp
- Run the following command
require_relative 'ee/elastic/migrate/20230628112233_reindex_commits_to_fix_permissions.rb'
ReindexCommitsToFixPermissions.new(20230628112233).migrate
- Now check again the docs with the above command. Now you should not see any result.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Approximate time to completion
~ 8 days
[17] pry(main)> number_of_documents = 1_250_072_954
=> 1250072954
[18] pry(main)> batch_size = 10_000
=> 10000
[19] pry(main)> throttle_delay = 5.seconds
=> 5 seconds
[20] pry(main)> ((number_of_documents / batch_size) * throttle_delay / 86400.seconds)
=> 7
Query plan
There is an SQL query here https://gitlab.com/gitlab-org/gitlab/-/blob/caf3eae36a79e423a1ae841829a56c78793f32b7/ee/elastic/migrate/20230703112233_reindex_commits_to_fix_permissions.rb#L40
The maximum number of project_ids_to_work
can be 100
. Here is the query plan for this: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/20037/commands/65370
Related to #410777