Backfill gitlab issue embeddings
What does this MR do and why?
Backfills gitlab group issue embeddings on gitlab.com: Issues updated within the last year from the gitlab-org/gitlab project.
The expected runtime is 5 hours.
The migration only runs on gitlab.com and is skipped for other instances.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Logs
{"severity":"INFO","time":"2024-06-11T11:05:51.509Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] executing migrate method","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:05:51.549Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.581Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.584Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"index_name":"gitlab-development-issues","batch_size":200,"message":"[Elastic::Migration: 20240610133559] Start backfilling fields"}
{"severity":"DEBUG","time":"2024-06-11T11:05:51.833Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"track_items","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.count":10,"meta.indexing.tracked_items_encoded":"[[1,\"Embedding|Issue|547|project_13\"],[2,\"Embedding|Issue|553|project_13\"],[3,\"Embedding|Issue|322|project_13\"],[4,\"Embedding|Issue|546|project_13\"],[5,\"Embedding|Issue|549|project_13\"],[6,\"Embedding|Issue|551|project_13\"],[7,\"Embedding|Issue|554|project_13\"],[8,\"Embedding|Issue|535|project_12\"],[9,\"Embedding|Issue|537|project_12\"],[10,\"Embedding|Issue|294|project_12\"]]"}
{"severity":"DEBUG","time":"2024-06-11T11:05:51.833Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"track_items","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.count":14,"meta.indexing.tracked_items_encoded":"[[1,\"Embedding|Issue|548|project_13\"],[2,\"Embedding|Issue|552|project_13\"],[3,\"Embedding|Issue|314|project_13\"],[4,\"Embedding|Issue|545|project_13\"],[5,\"Embedding|Issue|550|project_13\"],[6,\"Embedding|Issue|295|project_12\"],[7,\"Embedding|Issue|536|project_12\"],[8,\"Embedding|Issue|539|project_12\"],[9,\"Embedding|Issue|541|project_12\"],[10,\"Embedding|Issue|544|project_12\"],[11,\"Embedding|Issue|538|project_12\"],[12,\"Embedding|Issue|540|project_12\"],[13,\"Embedding|Issue|542|project_12\"],[14,\"Embedding|Issue|543|project_12\"]]"}
{"severity":"INFO","time":"2024-06-11T11:05:51.833Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"index_name":"gitlab-development-issues","documents_count":24,"message":"[Elastic::Migration: 20240610133559] Backfilling batch has been processed"}
{"severity":"INFO","time":"2024-06-11T11:05:51.842Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.872Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.875Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] updating with completed: false","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:05:51.922Z","class":"BackfillInitialEmbeddings","message":"[Elastic::Migration: 20240610133559] Setting migration_state to {\"remaining_count\":24}"}
{"severity":"INFO","time":"2024-06-11T11:05:51.950Z","class":"BackfillInitialEmbeddings","field_names":["embedding","embedding_version"],"remaining_count":24,"message":"[Elastic::Migration: 20240610133559] Checking the number of documents without fields"}
{"severity":"INFO","time":"2024-06-11T11:05:51.954Z","class":"Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillInitialEmbeddings] kicking off next migration batch","job_status":"running","queue":"default","jid":null}
{"severity":"INFO","time":"2024-06-11T11:06:11.752Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_start","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.records_count":10,"meta.indexing.first_score":1.0,"meta.indexing.last_score":10.0}
{"severity":"INFO","time":"2024-06-11T11:06:11.752Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_start","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.records_count":14,"meta.indexing.first_score":1.0,"meta.indexing.last_score":14.0}
{"severity":"INFO","time":"2024-06-11T11:07:00.400Z","message":"bulk_submitted","meta.indexing.body_size_bytes":393525,"meta.indexing.bulk_count":24,"meta.indexing.errors_count":0}
{"severity":"INFO","time":"2024-06-11T11:07:00.404Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexer_flushed","meta.indexing.search_flushing_duration_s":0.04783699999097735,"meta.indexing.search_indexed_bytes_per_second":8089}
{"severity":"INFO","time":"2024-06-11T11:07:00.447Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_end","meta.indexing.redis_set":"elastic:embedding:updates:0:zset","meta.indexing.records_count":10,"meta.indexing.first_score":1.0,"meta.indexing.last_score":10.0,"meta.indexing.failures_count":0,"meta.indexing.bulk_execution_duration_s":48.695372}
{"severity":"INFO","time":"2024-06-11T11:07:00.448Z","class":"Search::Elastic::ProcessEmbeddingBookkeepingService","message":"bulk_indexing_end","meta.indexing.redis_set":"elastic:embedding:updates:1:zset","meta.indexing.records_count":14,"meta.indexing.first_score":1.0,"meta.indexing.last_score":14.0,"meta.indexing.failures_count":0,"meta.indexing.bulk_execution_duration_s":48.695644}
How to set up and validate locally
- Change the skip condition on the migration to be false to simulate .com
- Change the group ids to groups in your local env with public issues
- Make sure you can generate embeddings (reference)
- Execute the migration:
Elastic::MigrationWorker.new.perform
Related to #456918 (closed)
Edited by Madelein van Niekerk