Switch Search Elasticsearch index to use default english stemmer
What does this MR do?
Related to #277390 (closed)
The index is currently setup to use the light_english
stemmer which is causing results to not be returned when searching for plural words. Example: if an issue title contains the word buttons
, a search for button
would not find the issue. There was no history (that I could find) about why light_english
was chosen ~4 years ago. The english
stemmer is the default (and recommended stemmer) for English in Elasticsearch so this MR switches to using the default.
Note: This requires a reindex to take affect and as a result will not affect existing indexes until administrators choose to reindex.
Added specs for the specific search which did not work in the associated issue.
Screenshots (strongly suggested)
I took some snapshots of the index size using bundle exec rake gitlab:elastic:test:index_size
to see what the affect would be on index size requirements:
light_english
stemmer)
Before index size (using ===== Size stats for index: gitlab-development =====
{"docs"=>{"count"=>350832, "deleted"=>0},
"store"=>{"size_in_bytes"=>262224696, "reserved_in_bytes"=>0}}
english
stemmer)
After index size (using default ===== Size stats for index: gitlab-development =====
{"docs"=>{"count"=>350832, "deleted"=>0},
"store"=>{"size_in_bytes"=>259723137, "reserved_in_bytes"=>0}}
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
-
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team