Use prefix search instead of ngrams for sha fields
What does this MR do?
Currently, SHAs are indexed using ngrams from 5 to 40 characters. This means that each SHA is split into 35 separate terms taking up a lot of storage. SHAs are quite unique from 4-5 characters on, so a simple prefix search will be sufficiently fast and as effective as ngrams with term matching.
This MR replaces current ngrams analyzers with prefix search.
gitlabhq_export.tar.gz
Testing different options on a project fromOptions | Size, MB | % | |
---|---|---|---|
ngrams | 899.1 | 100.00% | |
prefix search | 788.67 | 87.71% | -12.29% |
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
-
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by 🤖 GitLab Bot 🤖