elastic: Allow maximum bulk request size to be configured
Problem to solve
Elasticsearch clusters have a "batch API", which is used extensively when submitting repository data. Clusters are configured to have a maximum request size which varies by cluster. In particular, in AWS, smaller ES instances are configured to have a maximum batch size of 10MiB, while larger instances are configured to have a maximum batch size of 100MiB.
Currently, we hard-code bulk request API sizes on the client to 10MiB. This is to allow us to fit into "small" AWS elastic node resource limits. However, it's much more normal for the maximum bulk request size to be 100MiB than 10MiB on normal-sized nodes.
Intended users
Instance administrators wrestling with elasticsearch
Further details
Bulk request API slots are a limited resource - each shard has 200 slots by default, and once they're full, the bulk API endpoint starts to reject requests, which is expensive and needs retries that we don't currently have (https://gitlab.com/gitlab-org/gitlab-ee/issues/12372). So artificially making these numbers even more restricted is bad for us. Further reading on this is here: https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster
Proposal
Add a setting to configure the maximum batch request size in the elasticsearch section of the admin panel. This would just be another box next to the "Elasticsearch Shards" and replicas setting. Users can then set this to a value that makes sense for their elasticsearch cluster.
People can set up their own elasticsearch clusters with arbitrary limits, so I don't think limiting to the AWS tiers makes much sense. We should probably default to 10MiB so we work in the broadest range of cases though.
On GitLab.com, I'd love to set it to 100MiB - that would drastically reduce the number of bulk-indexing failures we see, which is super-valuable at initial backfill indexing time, as well as under high git push
load.
I don't think it's introspectable from the elastic cluster. If it were, we could autodetect instead of making it configurable.
The setting needs to be respected both by the es_import
method (so Gitlab::Elastic::Client.build
needs to take it into account) and by gitlab-elasticsearch-indexer
(so we pass it through in ELASTIC_CONNECTION_INFO
, overriding the currently hardcoded numbers here: https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer/blob/master/elastic/client.go#L28).
Permissions and Security
Only for instance admins
Documentation
We'll need to update https://docs.gitlab.com/ee/integration/elasticsearch.html
Testing
Unit testing in both the gitlab-ee and gitlab-elasticsearch-indexer projects
What does success look like, and how can we measure that?
We can index projects against large AWS indexes more efficiently
What is the type of buyer?
Links / references
cc @phikai