Enable de-duplication of the ElasticCommitIndexerWorker jobs
What does this MR do?
This MR changes the way the commit and blob indexer issues work to the underlying worker.
Before this change, each indexation event (push, commit, …) would enqueue a job with the range of commit to index.
In order to improve the handling of jobs, this MR defer the selection of the commit range when the job runs, such as the indexation always run for LAST_INDEXED_COMMIT..HEAD
.
With that change, we can now toggle the job queue to be idempotent
and thus de-duplicate redundant jobs.
Future iterations
index_status
Use a git ref instead of the Instead of writing to the database the index status, could we create a git ref in the repository with the indexation status?
That would prevent having to query the database and would enable the full processing to happen in the gitlab-elasticsearch-indexer
.
# create the ref
git update-ref refs/elasticsearch/master $(git rev-parse master)
# then you can use it as a Git object
git diff refs/elasticsearch/master..master
Buffer the index updates
We should implement the same logic as Gitlab::Elastic::BulkIndexer
, such as the Elasticsearch updates are bulked into a buffer that gets flushed periodically.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Related to #205178