Force reindex commits for leftover projects
What does this MR do and why?
This MR looks for the commit
documents in the main index. Aggregate the results with the commit.rid
. And then call the ElasticCommitIndexerWorker
passing project_id as rid
. After this, an update_by_query
call will be fired to update the schema_version
. All the commits
documents in the main index don't have schema_version
. We had to do this because it looks like there are some commits are missing the separate index that exists in the main index.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Make sure the elasticsearch is enabled in GDK.
- Open the rails console
bundle exec rails c
- Populate
commits
in the main index
project = Project.last
::Gitlab::Search::Client.new.index(index: 'gitlab-development', routing: "project_#{project.id}", refresh: true,
body: { commit: { type: 'commit',
author: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
committer: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
rid: project.id, message: 'test' },
join_field: { name: 'commit', parent: "project_#{project.id}" },
repository_access_level: project.repository_access_level, type: 'commit',
visibility_level: project.visibility_level })
project = Project.first
::Gitlab::Search::Client.new.index(index: 'gitlab-development', routing: "project_#{project.id}", refresh: true,
body: { commit: { type: 'commit',
author: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
committer: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
rid: project.id, message: 'test' },
join_field: { name: 'commit', parent: "project_#{project.id}" },
repository_access_level: project.repository_access_level, type: 'commit',
visibility_level: project.visibility_level })
- Ensure there is at least one commit in the main index by running the following curl command in bash
curl -XGET "http://localhost:9200/gitlab-development/_count" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{
"query": {
"bool": {
"filter": [
{ "term": { "type": "commit" } }
],
"must_not": {
"exists": {
"field": "schema_version"
}
}
}
}
}'
count
should be greater than 0
- Now run the following command in the rails console
Elastic::DataMigrationService[20230901120542].send(:migration).migrate
- Run again the curl command and ensure the
count
is0
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Completion time
~105 minutes
[6] pry(main)> number_of_projects = 7159
=> 7159
[7] pry(main)> throttle_delay = 3.minutes
=> 3 minutes
[8] pry(main)> batch_size = 200
=> 200
[9] pry(main)> ((number_of_projects / batch_size) * throttle_delay)
=> 105 minutes
Related to #419781 (closed)