Do not requeue the indexing worker if failures occur
Background
Related to #413524 (closed)
Currently the bulk cron worker will requeue itself if records remain in the queue instead of waiting for the scheduled worker to start. The worker is on a 1 minute schedule. This poses a problem when many records are not indexed due to failures.
What does this MR do and why?
This MR introduces a change in the requeue logic:
- return the number of records that failed to index to the worker
- do not requeue if any failures happened during indexing
- updates to specs
Screenshots or screen recordings
N/A - all work is in background jobs
How to set up and validate locally
- setup gdk for elasticsearch
- checkout master branch, probably a good idea to restart background jobs:
gdk restart rails-background-jobs
- introduce a new field in issue mapping
diff --git a/ee/lib/elastic/latest/issue_instance_proxy.rb b/ee/lib/elastic/latest/issue_instance_proxy.rb index 0b054b3acf89..3d694d2b7540 100644 --- a/ee/lib/elastic/latest/issue_instance_proxy.rb +++ b/ee/lib/elastic/latest/issue_instance_proxy.rb @@ -28,6 +28,8 @@ def as_indexed_json(options = {}) data['namespace_ancestry_ids'] = target.namespace_ancestry data['label_ids'] = target.label_ids.map(&:to_s) + data['i_am_missing'] = 'TEST' + if ::Elastic::DataMigrationService.migration_has_finished?(:add_hashed_root_namespace_id_to_issues) data['hashed_root_namespace_id'] = target.project.namespace.hashed_root_namespace_id end
- reindex everything from scratch:
bundle exec rake gitlab:elastic:index
- open rails console and start the initial bulk cron worker:
ElasticIndexInitialBulkCronWorker.new.perform
-
Elastic::ProcessInitialBookkeepingService.queue_size
should continue to have records in it despite the cron worker running - you should see the
ElasticIndexInitialBulkCronWorker
worker continue to re-queue itself in sidekiq logs and the indexing attempts will show up inelasticsearch.log
- checkout this branch (make sure you still have the new field added in issue config)
- restart background jobs:
gdk restart rails-background-jobs
- reindex everything from scratch:
bundle exec rake gitlab:elastic:index
- open rails console and start the initial bulk cron worker:
ElasticIndexInitialBulkCronWorker.new.perform
- you should NOT see the
ElasticIndexInitialBulkCronWorker
worker to re-queue itself in sidekiq logs, it should run only every 1 minute as scheduled.
Note: there are 16 shards that get processed so you will see a message for each shard with the shard number being send in args
, BUT it should not be happening repeatedly for each shard
{"severity":"INFO","time":"2023-05-30T16:18:05.807Z","retry":0,"queue":"default","backtrace":true,"version":0,"queue_namespace":"cronjob","args":["13"],"class":"ElasticIndexInitialBulkCronWorker
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Terri Chu