Add index integrity worker
What does this MR do and why?
Related to #214601 (closed)
Initial work to create an index integrity worker. This MR introduces:
- a new index integrity worker
- a new index repair service
- a new after_action for search controller (in EE context only)
- specs for everyone!
Screenshots or screen recordings
N/A- all backend work
How to set up and validate locally
get all blobs for a project
note: Replace project_id with the project you are working withcurl --request POST \
--url http://localhost:9200/gitlab-development/_search \
--header 'Content-Type: application/json' \
--cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
--data '{
"query": {
"bool": {
"must": [
{
"term": {
"type": {
"value": "blob"
}
}
},
{
"term": {
"project_id": {
"value": 7
}
}
}
]
}
}
}'
remove blob data for a project
note: Replace project_id with the project you are working withcurl --request POST \
--url http://localhost:9200/gitlab-development/_delete_by_query \
--header 'Content-Type: application/json' \
--cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
--data '{
"query": {
"bool": {
"must": [
{
"term": {
"type": {
"value": "blob"
}
}
},
{
"term": {
"project_id": {
"value": 7
}
}
}
]
}
}
}'
- make sure that gdk is setup for elasticsearch, the indexes are created/setup, and advanced search is enabled
- enable the feature flag:
Feature.enable(:search_index_integrity)
- perform a code search for one of the projects (I chose flightjs/flight)
- verify results come back
- verify how many blobs exist (use
get all blobs for a project
query above) against Elasticsearch instance (run on localhost:9200 in gdk) - delete all of those blobs from the index (use
remove blob data for a project
query above) and verify they are gone against Elasticsearch instance (run on localhost:9200 in gdk) - run a project search in gdk, verify no results
- verify the index integrity worker runs for the project:
gdk tail rails-background-jobs
2023-02-24_16:46:44.17683 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:46:44.176Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["[FILTERED]","7"],"class":"Search::IndexIntegrityWorker","jid":"a4ec773b907c8dc992937286","created_at":"2023-02-24T16:46:44.169Z","correlation_id":"01GT253QZTNSJYMDJ5YWXQH059","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.project":"flightjs/Flight","meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:9ee35784997d131608e87df5fe6834da84cd1db76ad5694e968d4bba514b5386","size_limiter":"validated","enqueued_at":"2023-02-24T16:46:44.175Z","job_size_bytes":8,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a4ec773b907c8dc992937286: start","job_status":"start","scheduling_latency_s":0.000779}
- verify the index repair services adds a log entry in the
elasticsearch.log
file
{"severity":"WARN","time":"2023-02-24T16:46:49.796Z","correlation_id":"01GT253XHS8ENKNJ51FQ7FNFHN","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}
- run a group search, verify no results
- verify the index integrity worker runs for the namespace and queues up a new worker for the project
2023-02-24_16:50:11.83565 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.835Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","[FILTERED]"],"class":"Search::IndexIntegrityWorker","jid":"a61110823e8d8580a50ac776","created_at":"2023-02-24T16:50:11.817Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:09ee9c6ffd8ed564b18d6501bbb59805758a83e5f8fae3df2f57d9a483697b0f","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.818Z","job_size_bytes":9,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a61110823e8d8580a50ac776: start","job_status":"start","scheduling_latency_s":0.014567}
2023-02-24_16:50:11.90737 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.906Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","7"],"class":"Search::IndexIntegrityWorker","jid":"1a0d4184e9609e27926d9bc6","created_at":"2023-02-24T16:50:11.863Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"Search::IndexIntegrityWorker","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:decdc9edc9ae65ad6e33020e998c181f40b6759e672bcf2a0b857f1a822a5707","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.864Z","job_size_bytes":6,"pid":22084,"message":"Search::IndexIntegrityWorker JID-1a0d4184e9609e27926d9bc6: done: 0.041452 sec","job_status":"done","scheduling_latency_s":0.000683,"gitaly_calls":1,"gitaly_duration_s":0.012291,"redis_calls":4,"redis_duration_s":0.000812,"redis_read_bytes":215,"redis_write_bytes":281,"redis_queues_calls":2,"redis_queues_duration_s":0.000289,"redis_queues_read_bytes":2,"redis_queues_write_bytes":186,"redis_repository_cache_calls":2,"redis_repository_cache_duration_s":0.000523,"redis_repository_cache_read_bytes":213,"redis_repository_cache_write_bytes":95,"elasticsearch_calls":1,"elasticsearch_duration_s":0.006586,"elasticsearch_timed_out_count":0,"db_count":4,"db_write_count":0,"db_cached_count":0,"db_replica_count":0,"db_primary_count":4,"db_main_count":4,"db_ci_count":0,"db_main_replica_count":0,"db_ci_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":0,"db_main_cached_count":0,"db_ci_cached_count":0,"db_main_replica_cached_count":0,"db_ci_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_ci_wal_count":0,"db_main_replica_wal_count":0,"db_ci_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_ci_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_ci_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.007,"db_main_duration_s":0.007,"db_ci_duration_s":0.0,"db_main_replica_duration_s":0.0,"db_ci_replica_duration_s":0.0,"cpu_s":0.016403,"worker_id":"sidekiq_0","rate_limiting_gates":[],"duration_s":0.041452,"completed_at":"2023-02-24T16:50:11.906Z","load_balancing_strategy":"primary_no_wal","db_duration_s":0.00312}
- verify the index integrity worker runs for the namespace:
gdk tail rails-background-jobs
{"severity":"INFO","time":"2023-02-24T16:50:11.845Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexIntegrityWorker","message":"enqueueing all projects for namespace","namespace_id":33}
- verify the index repair services adds a log entry in the
<gdk_dir>/gitlab/log/elasticsearch.log
file
{"severity":"WARN","time":"2023-02-24T16:50:11.905Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Terri Chu