Do not requeue the indexing worker if failures occur (!122100) · Merge requests · GitLab.org / GitLab

Terri Chu requested to merge 413524-do-not-requeue-indexing-worker-when-failures-occur into master May 30, 2023

Background

Currently the bulk cron worker will requeue itself if records remain in the queue instead of waiting for the scheduled worker to start. The worker is on a 1 minute schedule. This poses a problem when many records are not indexed due to failures.

What does this MR do and why?

This MR introduces a change in the requeue logic:

return the number of records that failed to index to the worker
do not requeue if any failures happened during indexing
updates to specs

Screenshots or screen recordings

N/A - all work is in background jobs

How to set up and validate locally

setup gdk for elasticsearch
checkout master branch, probably a good idea to restart background jobs: gdk restart rails-background-jobs

introduce a new field in issue mapping

diff --git a/ee/lib/elastic/latest/issue_instance_proxy.rb b/ee/lib/elastic/latest/issue_instance_proxy.rb
index 0b054b3acf89..3d694d2b7540 100644
--- a/ee/lib/elastic/latest/issue_instance_proxy.rb
+++ b/ee/lib/elastic/latest/issue_instance_proxy.rb
@@ -28,6 +28,8 @@ def as_indexed_json(options = {})
         data['namespace_ancestry_ids'] = target.namespace_ancestry
         data['label_ids'] = target.label_ids.map(&:to_s)

+        data['i_am_missing'] = 'TEST'
+
         if ::Elastic::DataMigrationService.migration_has_finished?(:add_hashed_root_namespace_id_to_issues)
           data['hashed_root_namespace_id'] = target.project.namespace.hashed_root_namespace_id
         end

reindex everything from scratch: bundle exec rake gitlab:elastic:index
open rails console and start the initial bulk cron worker: ElasticIndexInitialBulkCronWorker.new.perform
Elastic::ProcessInitialBookkeepingService.queue_size should continue to have records in it despite the cron worker running
you should see the ElasticIndexInitialBulkCronWorker worker continue to re-queue itself in sidekiq logs and the indexing attempts will show up in elasticsearch.log
checkout this branch (make sure you still have the new field added in issue config)
restart background jobs: gdk restart rails-background-jobs
reindex everything from scratch: bundle exec rake gitlab:elastic:index
open rails console and start the initial bulk cron worker: ElasticIndexInitialBulkCronWorker.new.perform
you should NOT see the ElasticIndexInitialBulkCronWorker worker to re-queue itself in sidekiq logs, it should run only every 1 minute as scheduled.

Note: there are 16 shards that get processed so you will see a message for each shard with the shard number being send in args, BUT it should not be happening repeatedly for each shard

{"severity":"INFO","time":"2023-05-30T16:18:05.807Z","retry":0,"queue":"default","backtrace":true,"version":0,"queue_namespace":"cronjob","args":["13"],"class":"ElasticIndexInitialBulkCronWorker

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited May 30, 2023 by Terri Chu

Do not requeue the indexing worker if failures occur

Background

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports