Add project check in index repair service
What does this MR do and why?
Related to #214601 (closed)
This iteration is adding an additional check for whether the project document exists in the index.
I decided to add this check after debugging an issue with customer code search coming up empty for a specific project. The code/blob documents were in the index but the project document was missing. When the project is missing from the index, it will cause the code search query to return no results since there is a parent_join
included for project searches in Elasticsearch.
The MR also includes some refactoring in the search index repair service:
- split checks into methods to make increase readability and allow adding/removing checks quickly
- add routing to all of the Elasticsearch queries to speed them up
Screenshots or screen recordings
N/A
How to set up and validate locally
- setup gdk for elasticsearch
- index your gdk data:
bundle exec rake gitlab:elastic:index
- enable Advanced Search in admin ui: http://gdk.test:3000/admin/application_settings/advanced_search
- enable the FF
search_index_integrity
:Feature.enable(:search_index_integrity)
- find a project with repository data and mark down the project id
- delete a project document from the index (replace the id
1
with the project id from step above)
curl --request POST \
--url http://localhost:9200/gitlab-development/_delete_by_query \
--header 'Content-Type: application/json' \
--data '{
"query": {
"bool": {
"must": [
{
"term": {
"type": {
"value": "project"
}
}
},
{
"term": {
"id": {
"value": 1
}
}
}
]
}
}
}'
- run the repair service from the rails console:
[1] pry(main)> project = Project.find(1)
Project Load (2.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = 1 LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:(pry):1:in `__pry__'*/
Route Load (0.6ms) SELECT "routes".* FROM "routes" WHERE "routes"."source_id" = 1 AND "routes"."source_type" = 'Project' LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:/app/models/concerns/routable.rb:141:in `block in full_attribute'*/
=> #<Project id:1 toolbox/gitlab-smoke-tests>>
[2] pry(main)> ::Search::IndexRepairService.execute(project)
Namespace Load (1.3ms) SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 22 LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:/app/models/project.rb:2911:in `root_namespace'*/
=> true
- verify in
log/elasticsearch.log
that the service logs that the project document is missing
{"severity":"WARN","time":"2023-03-30T18:42:52.897Z","correlation_id":null,"class":"Search::IndexRepairService","message":"project document missing from index","namespace_id":22,"root_namespace_id":22,"project_id":1}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Terri Chu