Sanitize string provided to to_tsvector
What does this MR do and why?
Closes #428428 (closed).
We sanitize full text queries for non-allowed characters, but we don't do this for generating the search vector itself.
When we supply non-allowed characters to to_tsvector
, it can affect the resulting search vector.
Example: providing <gitlab>
string to to_tsvector
will result in an empty tsvector.
gitlabhq_production=# SELECT setweight(to_tsvector('english', '<gitlab>'), 'A');
setweight
-----------
(1 row)
By sanitising the non-allowed characters out, it allows for words surrounded by these non-allowed characters to be included in the search vector
gitlabhq_development=# SELECT setweight(to_tsvector('english', ' gitlab '), 'A');
setweight
-------------
'gitlab':1A
(1 row)
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Pre-change:
- Create an issue with the title
the <rain> is falling down
- Check the issue search data's search vector:
Issue.where(title:'the <rain> is falling down').first.search_data.search_vector
- The search vector does not contain the word
rain
- The search vector does not contain the word
- Search for the term
rain
using Basic Search in the project- There are no results in Issues
Post-change:
Assumes the issue already exists with the title as per pre-change steps.
- Update the issues's search data:
Issue.where(title:'the <rain> is falling down').first.update_search_data!
- Check the issue search data's search vector:
Issue.where(title:'the <rain> is falling down').first.search_data.search_vector
- The search vector includes the word
rain
- The search vector includes the word
- Search for the term
rain
using Basic Search in the project- There is a result matching the issue
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.