GitLab should not index conflicting commit ranges in parallel
Summary
It's possible for multiple ElasticCommitIndexerWorker
to be running in parallel and depending on timing an older job may finish later than a newer job and it could possibly mean that you overwrite the index with stale content.
It's also very wasteful to run multiple of these for the same project at the same time. Further it's problematic since Elasticsearch will often fail the bulk requests of the earlier jobs if a new one ends up updating the same object at the same time which means an infinite spiral of retrying the same object and never succeeding a single bulk insert of a project.
Steps to reproduce
Push a very large project over and over to GitLab multiple times before the first indexing ever finishes and you will likely see earlier indexes fail due to conflicts and end up retrying and if the repo is big enough and you keep pushing frequently enough it may never succeed.
What is the current bug behavior?
Multiple workers run at the same time for the same project.
What is the expected correct behavior?
We should add a redis lock to our ElasticCommitIndexerWorker
so that it is only ever processing the same project once at a time. If another version of the project is still in flight we can simply fail the job allowing retries to pick it up later in case it included extra updates that needed indexing.