Skip to content

Force reindex commits for leftover projects

What does this MR do and why?

This MR looks for the commit documents in the main index. Aggregate the results with the commit.rid. And then call the ElasticCommitIndexerWorker passing project_id as rid. After this, an update_by_query call will be fired to update the schema_version. All the commits documents in the main index don't have schema_version. We had to do this because it looks like there are some commits are missing the separate index that exists in the main index.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Make sure the elasticsearch is enabled in GDK.

  1. Open the rails console
bundle exec rails c
  1. Populate commits in the main index
project = Project.last
::Gitlab::Search::Client.new.index(index: 'gitlab-development', routing: "project_#{project.id}", refresh: true,
    body: { commit: { type: 'commit',
        author: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
        committer: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
      rid: project.id, message: 'test' },
    join_field: { name: 'commit', parent: "project_#{project.id}" },
    repository_access_level: project.repository_access_level, type: 'commit',
  visibility_level: project.visibility_level })
project = Project.first
::Gitlab::Search::Client.new.index(index: 'gitlab-development', routing: "project_#{project.id}", refresh: true,
    body: { commit: { type: 'commit',
        author: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
        committer: { name: 'F L', email: 't@t.com', time: Time.now.strftime('%Y%m%dT%H%M%S+0000') },
      rid: project.id, message: 'test' },
    join_field: { name: 'commit', parent: "project_#{project.id}" },
    repository_access_level: project.repository_access_level, type: 'commit',
  visibility_level: project.visibility_level })
  1. Ensure there is at least one commit in the main index by running the following curl command in bash
curl -XGET "http://localhost:9200/gitlab-development/_count" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "type": "commit" } }
      ],
      "must_not": {
        "exists": {
          "field": "schema_version"
        }
      }
    }
  }
}'

count should be greater than 0

  1. Now run the following command in the rails console
 Elastic::DataMigrationService[20230901120542].send(:migration).migrate
  1. Run again the curl command and ensure the count is 0

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Completion time

~105 minutes

[6] pry(main)> number_of_projects = 7159
=> 7159
[7] pry(main)> throttle_delay = 3.minutes
=> 3 minutes
[8] pry(main)> batch_size = 200
=> 200
[9] pry(main)> ((number_of_projects / batch_size) * throttle_delay)
=> 105 minutes

Related to #419781 (closed)

#423929 (closed)

Edited by Ravi Kumar

Merge request reports

Loading