Add search index pruning
What does this MR do and why?
Automatically trims read-only indices that have been rolled over from search curation by reindexing documents to the current write index. This will result in all of our indices being roughly within the same sizing guidelines that we recommend: https://docs.gitlab.com/ee/integration/advanced_search/elasticsearch.html#tuning.
When bloated read-only indices are present, Search::IndexPruningWorker
will continuously schedule itself to reindex those documents to the current write index. When there are not any bloated read-only indices, Search::IndexPruningWorker
will have a schedule of checking every 30 minutes.
Note: this only addresses the case where rolled over indices are too big. There will be instances where rolled over indices can actually become too small over time, but that will be addressed in another iteration.
The changes here are behind the feature flag search_index_pruning_worker
Screenshots or screen recordings
Curator + Pruner results from running locally in a development environment. Indices are roughly the same size.
GET _cat/indices/gitlab-development-*?v&s=index
How to set up and validate locally
- (Optional) Start tailing advanced search log file in another pane:
tail -f log/elasticsearch.log
- Ensure you have a rolled-over index locally, by checking
GET _cat/indices/gitlab-development*?v
. You should have one large read-only index, and one almost empty write-index.- If you don't have any indices rolled-over yet, in the console run:
Gitlab::Search::IndexCurator.curate(dry_run: false)
(this will ignore curation feature flags)
- If you don't have any indices rolled-over yet, in the console run:
- Verify that
Gitlab::CurrentSettings.search_pruning_max_docs
is 100 for your local dev environment - Manually trigger pruner (example below)
- There should now be 100 fewer documents in the read-only index and 100 more in write index.
Pruning is done in reverse alphabetical order on index_name
. In my local environment, notes
index gets pruned first:
Before
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open gitlab-development-20230204-2003 xczKTZJHTMes4a-03Zx71w 5 1 98 0 117.5kb 117.5kb
yellow open gitlab-development-commits-20230204-2003 YyKwDGvcR6iG9f4cXFgZxg 5 1 0 0 1kb 1kb
yellow open gitlab-development-issues-20230204-2003 I6y5XrhUT3Ct4zEIhMTawA 5 1 461 0 155.6kb 155.6kb
yellow open gitlab-development-issues-20230204-2004 9UOD3domS6CY9BsEG_fP_w 5 1 0 0 1kb 1kb
yellow open gitlab-development-merge_requests-20230204-2003 J8EosgbPRbKUhndJ_GYaOw 5 1 141 0 105.2kb 105.2kb
yellow open gitlab-development-merge_requests-20230204-2004 V12_1H45SdiKm_rTAH1ppg 5 1 0 0 1kb 1kb
yellow open gitlab-development-migrations aOjdQ2EWQ72lokYmDGqoCw 1 1 31 0 6.2kb 6.2kb
yellow open gitlab-development-notes-20230204-2003 uHna-nu5Ss6wJeh8svLxLg 5 1 937 0 154.6kb 154.6kb
yellow open gitlab-development-notes-20230204-2004 dwx4oT0XSQig6i4bwEbaqA 5 1 0 0 1kb 1kb
yellow open gitlab-development-users-20230204-2003 kz6tzHs8SEa6o7ycqy_-tw 5 1 47 0 61.9kb 61.9kb
pry(main)> p = ::Gitlab::Search::Curation::Pruner.new(curator_settings: {ignore_patterns: []}, max: Gitlab::CurrentSettings.search_pruning_max_docs)
=> #<Gitlab::Search::Curation::Pruner:0x0000000138fed3f8
@curator=
#<Gitlab::Search::IndexCurator:0x0000000138fed3d0
@settings={:dry_run=>true, :debug=>false, :force=>false, :max_shard_size_gb=>1, :max_docs_denominator=>100, :min_docs_before_rollover=>50, :max_docs_shard_count=>5, :ignore_patterns=>[], :include_patterns=>[], :index_pattern=>"gitlab-development*"}>,
@debug=false,
@max=100,
@pct=0.2>
pry(main)> p.bloated_readonly_indices
=> [{:reasons=>["too many docs"], :info=>{"health"=>"yellow", "status"=>"open", "index"=>"gitlab-development-notes-20230204-2003", "uuid"=>"uHna-nu5Ss6wJeh8svLxLg", "pri"=>"5", "rep"=>"1", "docs.count"=>"937", "docs.deleted"=>"0", "store.size"=>"0", "pri.store.size"=>"0"}},
{:reasons=>["too many docs"], :info=>{"health"=>"yellow", "status"=>"open", "index"=>"gitlab-development-merge_requests-20230204-2003", "uuid"=>"J8EosgbPRbKUhndJ_GYaOw", "pri"=>"5", "rep"=>"1", "docs.count"=>"141", "docs.deleted"=>"0", "store.size"=>"0", "pri.store.size"=>"0"}},
{:reasons=>["too many docs"], :info=>{"health"=>"yellow", "status"=>"open", "index"=>"gitlab-development-issues-20230204-2003", "uuid"=>"I6y5XrhUT3Ct4zEIhMTawA", "pri"=>"5", "rep"=>"1", "docs.count"=>"461", "docs.deleted"=>"0", "store.size"=>"0", "pri.store.size"=>"0"}}]
[24] pry(main)> p.prune(p.bloated_readonly_indices.first)
=> true
After refreshing the notes index with POST gitlab-development-notes*/_refresh
and looking at the sizes with GET _cat/indices/gitlab-development-notes*?v
After
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open gitlab-development-notes-20230204-2003 uHna-nu5Ss6wJeh8svLxLg 5 1 837 0 154.6kb 154.6kb
yellow open gitlab-development-notes-20230204-2004 dwx4oT0XSQig6i4bwEbaqA 5 1 100 0 1kb 1kb
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.