Skip to content

Enable de-duplication of the ElasticCommitIndexerWorker jobs

What does this MR do?

This MR changes the way the commit and blob indexer issues work to the underlying worker.

Before this change, each indexation event (push, commit, …) would enqueue a job with the range of commit to index.

In order to improve the handling of jobs, this MR defer the selection of the commit range when the job runs, such as the indexation always run for LAST_INDEXED_COMMIT..HEAD.

With that change, we can now toggle the job queue to be idempotent and thus de-duplicate redundant jobs.

Future iterations

Use a git ref instead of the index_status

Instead of writing to the database the index status, could we create a git ref in the repository with the indexation status?

That would prevent having to query the database and would enable the full processing to happen in the gitlab-elasticsearch-indexer.

# create the ref
git update-ref refs/elasticsearch/master $(git rev-parse master)

# then you can use it as a Git object
git diff refs/elasticsearch/master..master

Buffer the index updates

We should implement the same logic as Gitlab::Elastic::BulkIndexer, such as the Elasticsearch updates are bulked into a buffer that gets flushed periodically.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Related to #205178

Edited by 🤖 GitLab Bot 🤖

Merge request reports

Loading