Allow Elasticsearch and OpenSearch specific mappings
Context
Create abstraction layer to support Elasticsear... (#454764 - closed) and Move embeddings from issues index to workitems ... (#476537 - closed) are being done together in this order:
-
!163009 (merged)
👈 this MR - !163946 (merged)
- !164059 (merged)
- #479776 (closed)
- #479778 (closed)
- #479777 (closed)
What does this MR do and why?
This MR adds a mapping for embeddings in the workitems index when the index is created from scratch. For existing indices, the migration will add the mapping.
It also introduces a helper function which can be called to check if Elasticsearch/OpenSearch runs and if it's above a certain minimum version. This check will be used for where there are divergent paths between Elaticsearch and OpenSearch.
Embeddings are only supported from Elasticsearch 8+ and for all OpenSearch versions. For OpenSearch we are setting the same HNSW settings as for Elasticsearch.
The embedding field is called embedding_0 in accordance with Support for multiple embedding models (#471983 - closed) which will be implemented later.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
(Optional) Elasticsearch:
- Create a workitem index from scratch:
::Gitlab::Elastic::Helper.default.create_standalone_indices(target_classes: [WorkItem])
- Note that the
embedding_0
field exists in the workitems index
::Gitlab::Elastic::Helper.default.get_mapping(index_name: "gitlab-development-work_items")
=> {"archived"=>{"type"=>"boolean"},
"assignee_id"=>{"type"=>"integer"},
"author_id"=>{"type"=>"integer"},
"confidential"=>{"type"=>"boolean"},
"created_at"=>{"type"=>"date"},
"description"=>{"type"=>"text", "analyzer"=>"code_analyzer"},
"due_date"=>{"type"=>"date"},
"embedding_0"=>{"type"=>"dense_vector", "dims"=>768, "index"=>true, "similarity"=>"cosine"},
(Optional) OpenSearch:
- Connect to opensearch
- Create a workitem index from scratch:
::Gitlab::Elastic::Helper.default.create_standalone_indices(target_classes: [WorkItem])
- Note that the
embedding_0
field exists in the workitems index
::Gitlab::Elastic::Helper.default.get_mapping(index_name: "gitlab-development-work_items")
=> {"archived"=>{"type"=>"boolean"},
"assignee_id"=>{"type"=>"integer"},
"author_id"=>{"type"=>"integer"},
"confidential"=>{"type"=>"boolean"},
"created_at"=>{"type"=>"date"},
"description"=>{"type"=>"text", "analyzer"=>"code_analyzer"},
"due_date"=>{"type"=>"date"},
"embedding_0"=>{"type"=>"knn_vector", "dimension"=>768, "method"=>{"engine"=>"nmslib", "space_type"=>"cosinesimil", "name"=>"hnsw", "parameters"=>{"ef_construction"=>100, "m"=>16}}},
Related to #454764 (closed)