Skip to content

Add index integrity worker

What does this MR do and why?

Related to #214601 (closed)

Initial work to create an index integrity worker. This MR introduces:

  • a new index integrity worker
  • a new index repair service
  • a new after_action for search controller (in EE context only)
  • specs for everyone!

Screenshots or screen recordings

N/A- all backend work

How to set up and validate locally

get all blobs for a project note: Replace project_id with the project you are working with
curl --request POST \
  --url http://localhost:9200/gitlab-development/_search \
  --header 'Content-Type: application/json' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
  --data '{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"type": {
							"value": "blob"
						}
					}
				},
				{
					"term": {
						"project_id": {
							"value": 7
						}
					}
				}
			]
		}
	}
}'
remove blob data for a project note: Replace project_id with the project you are working with
curl --request POST \
  --url http://localhost:9200/gitlab-development/_delete_by_query \
  --header 'Content-Type: application/json' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
  --data '{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"type": {
							"value": "blob"
						}
					}
				},
				{
					"term": {
						"project_id": {
							"value": 7
						}
					}
				}
			]
		}
	}
}'
  1. make sure that gdk is setup for elasticsearch, the indexes are created/setup, and advanced search is enabled
  2. enable the feature flag: Feature.enable(:search_index_integrity)
  3. perform a code search for one of the projects (I chose flightjs/flight)
  4. verify results come back
  5. verify how many blobs exist (use get all blobs for a project query above) against Elasticsearch instance (run on localhost:9200 in gdk)
  6. delete all of those blobs from the index (use remove blob data for a project query above) and verify they are gone against Elasticsearch instance (run on localhost:9200 in gdk)
  7. run a project search in gdk, verify no results
  8. verify the index integrity worker runs for the project: gdk tail rails-background-jobs
2023-02-24_16:46:44.17683 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:46:44.176Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["[FILTERED]","7"],"class":"Search::IndexIntegrityWorker","jid":"a4ec773b907c8dc992937286","created_at":"2023-02-24T16:46:44.169Z","correlation_id":"01GT253QZTNSJYMDJ5YWXQH059","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.project":"flightjs/Flight","meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:9ee35784997d131608e87df5fe6834da84cd1db76ad5694e968d4bba514b5386","size_limiter":"validated","enqueued_at":"2023-02-24T16:46:44.175Z","job_size_bytes":8,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a4ec773b907c8dc992937286: start","job_status":"start","scheduling_latency_s":0.000779}
  1. verify the index repair services adds a log entry in the elasticsearch.log file
{"severity":"WARN","time":"2023-02-24T16:46:49.796Z","correlation_id":"01GT253XHS8ENKNJ51FQ7FNFHN","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}
  1. run a group search, verify no results
  2. verify the index integrity worker runs for the namespace and queues up a new worker for the project
2023-02-24_16:50:11.83565 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.835Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","[FILTERED]"],"class":"Search::IndexIntegrityWorker","jid":"a61110823e8d8580a50ac776","created_at":"2023-02-24T16:50:11.817Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:09ee9c6ffd8ed564b18d6501bbb59805758a83e5f8fae3df2f57d9a483697b0f","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.818Z","job_size_bytes":9,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a61110823e8d8580a50ac776: start","job_status":"start","scheduling_latency_s":0.014567}
2023-02-24_16:50:11.90737 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.906Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","7"],"class":"Search::IndexIntegrityWorker","jid":"1a0d4184e9609e27926d9bc6","created_at":"2023-02-24T16:50:11.863Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"Search::IndexIntegrityWorker","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:decdc9edc9ae65ad6e33020e998c181f40b6759e672bcf2a0b857f1a822a5707","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.864Z","job_size_bytes":6,"pid":22084,"message":"Search::IndexIntegrityWorker JID-1a0d4184e9609e27926d9bc6: done: 0.041452 sec","job_status":"done","scheduling_latency_s":0.000683,"gitaly_calls":1,"gitaly_duration_s":0.012291,"redis_calls":4,"redis_duration_s":0.000812,"redis_read_bytes":215,"redis_write_bytes":281,"redis_queues_calls":2,"redis_queues_duration_s":0.000289,"redis_queues_read_bytes":2,"redis_queues_write_bytes":186,"redis_repository_cache_calls":2,"redis_repository_cache_duration_s":0.000523,"redis_repository_cache_read_bytes":213,"redis_repository_cache_write_bytes":95,"elasticsearch_calls":1,"elasticsearch_duration_s":0.006586,"elasticsearch_timed_out_count":0,"db_count":4,"db_write_count":0,"db_cached_count":0,"db_replica_count":0,"db_primary_count":4,"db_main_count":4,"db_ci_count":0,"db_main_replica_count":0,"db_ci_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":0,"db_main_cached_count":0,"db_ci_cached_count":0,"db_main_replica_cached_count":0,"db_ci_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_ci_wal_count":0,"db_main_replica_wal_count":0,"db_ci_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_ci_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_ci_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.007,"db_main_duration_s":0.007,"db_ci_duration_s":0.0,"db_main_replica_duration_s":0.0,"db_ci_replica_duration_s":0.0,"cpu_s":0.016403,"worker_id":"sidekiq_0","rate_limiting_gates":[],"duration_s":0.041452,"completed_at":"2023-02-24T16:50:11.906Z","load_balancing_strategy":"primary_no_wal","db_duration_s":0.00312}
  1. verify the index integrity worker runs for the namespace: gdk tail rails-background-jobs
{"severity":"INFO","time":"2023-02-24T16:50:11.845Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexIntegrityWorker","message":"enqueueing all projects for namespace","namespace_id":33}
  1. verify the index repair services adds a log entry in the <gdk_dir>/gitlab/log/elasticsearch.log file
{"severity":"WARN","time":"2023-02-24T16:50:11.905Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports

Loading