Roll out `elastic_bulk_incremental_updates` feature flag on GitLab.com
What
Remove the :elastic_bulk_incremental_updates
feature flag
Owners
- Team: groupglobal search
- Most appropriate slack channel to reach out to:
#g_global_search
- Best individual to reach out to:
@nick.thomas
or@DylanGriffith
Expectations
### What are we expecting to happen?
Enabling this feature flag moves from an ElasticIndexerWorker
sidekiq job per individual document update in elasticsearch to a queue-based approach using redis ZSET
s for all resources that within project scope. The queue is filled by after_commit
hooks (instead of enqueuing those sidekiq jobs), and emptied by an ElasticIndexBulkCronWorker
, which pulls 1000 documents each minute and submits them as a single bulk request. The idea is that bulk processing is more efficient and should lead to less pressure on redis, sidekiq, and the elasticsearch cluster itself.
Project index/update/delete operations remains as separate ElasticIndexerWorker
jobs, as does all backfill.
What might happen if this goes wrong?
There are a few possible scenarios.
We might have a bug that leads to some indexing not happening when it should.
We might not empty the queue as fast as it is filled. This would have negative impacts for redis. Queue size can be determined from rails console with Elastic::ProcessBookkeepingService.queue_size
We might misbehave when processing a project that hasn't yet had initial backfill applied (I did test this locally, but you never know).
Other issues might manifest in a highly concurrent environment that are not obvious from a single-node dev deployment.
We might DESTROY REDIS. But hopefully not.
What can we monitor to detect problems with this?
Beta groups/projects
If applicable, any groups/projects that are happy to have this feature turned on early. Some organizations may wish to test big changes they are interested in with a small subset of users ahead of time for example.
-
gitlab-org/gitlab
project -
gitlab-org
/gitlab-com
groups - ...
Roll Out Steps
-
Confirm the new worker is running on dedicated elasticsearch sidekiq fleet -
Enable on staging -
Test on staging -
Ensure that documentation has been updated -
Enable the feature flag for gitlab-org/gitlab
project and verify behaviour -
Enable the feature flag globally and verify behaviour -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry -
After the flag removal is deployed, clean up the feature flag by running chatops command in #production
channel