Make sure we use retry_on_failure option for search requests with Elasticsearch integration
Summary
The customer (internal) is asking why we are not using retry_on_failure
option for search requests. They came to that conclusion because they have 3 replicas configured in the cluster which they specify as URLS. When one is down they start getting random 500
errors during search which they don't expect to happen if retry_on_failure
is configured. Seems like we do set it when we run migrations (example) and there is also a declaration of it in the Client module, but it doesn't seem like the config config[:retry_on_failure]
is configured by default and I don't see an option to enable it:
My test instance doesn't have retry_on_failure
option available in config:
irb(main):001:0> ::Gitlab::CurrentSettings.elasticsearch_config
=> {:url=>[{:scheme=>"http", :host=>"x.x.x.x", :path=>"", :port=>9200}], :aws=>false, :aws_access_key=>"", :aws_region=>"us-east-1", :max_bulk_size_bytes=>10485760, :max_bulk_concurrency=>10}
There is some history in #297204 (closed) and !67273 (merged).
Possible fixes
We should consider having retry_on_failure
configurable for search requests or document how to do it if it is possible to enable it.