Skip to content

Switch Search Elasticsearch index to use default english stemmer

What does this MR do?

Related to #277390 (closed)

The index is currently setup to use the light_english stemmer which is causing results to not be returned when searching for plural words. Example: if an issue title contains the word buttons, a search for button would not find the issue. There was no history (that I could find) about why light_english was chosen ~4 years ago. The english stemmer is the default (and recommended stemmer) for English in Elasticsearch so this MR switches to using the default.

Note: This requires a reindex to take affect and as a result will not affect existing indexes until administrators choose to reindex.

Added specs for the specific search which did not work in the associated issue.

Screenshots (strongly suggested)

I took some snapshots of the index size using bundle exec rake gitlab:elastic:test:index_size to see what the affect would be on index size requirements:

Before index size (using light_english stemmer)

===== Size stats for index: gitlab-development =====
{"docs"=>{"count"=>350832, "deleted"=>0},
 "store"=>{"size_in_bytes"=>262224696, "reserved_in_bytes"=>0}}

After index size (using default english stemmer)

===== Size stats for index: gitlab-development =====
{"docs"=>{"count"=>350832, "deleted"=>0},
 "store"=>{"size_in_bytes"=>259723137, "reserved_in_bytes"=>0}}

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Terri Chu

Merge request reports

Loading