Skip to content

Allow Elasticsearch and OpenSearch specific mappings

Madelein van Niekerk requested to merge 454764-opensearch-compatibility into master

Context

Create abstraction layer to support Elasticsear... (#454764 - closed) and Move embeddings from issues index to workitems ... (#476537 - closed) are being done together in this order:

  1. !163009 (merged) 👈 this MR
  2. !163946 (merged)
  3. !164059 (merged)
  4. #479776 (closed)
  5. #479778 (closed)
  6. #479777 (closed)

What does this MR do and why?

This MR adds a mapping for embeddings in the workitems index when the index is created from scratch. For existing indices, the migration will add the mapping.

It also introduces a helper function which can be called to check if Elasticsearch/OpenSearch runs and if it's above a certain minimum version. This check will be used for where there are divergent paths between Elaticsearch and OpenSearch.

Embeddings are only supported from Elasticsearch 8+ and for all OpenSearch versions. For OpenSearch we are setting the same HNSW settings as for Elasticsearch.

The embedding field is called embedding_0 in accordance with Support for multiple embedding models (#471983 - closed) which will be implemented later.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

(Optional) Elasticsearch:

  1. Create a workitem index from scratch: ::Gitlab::Elastic::Helper.default.create_standalone_indices(target_classes: [WorkItem])
  2. Note that the embedding_0 field exists in the workitems index
::Gitlab::Elastic::Helper.default.get_mapping(index_name: "gitlab-development-work_items")
=> {"archived"=>{"type"=>"boolean"},
 "assignee_id"=>{"type"=>"integer"},
 "author_id"=>{"type"=>"integer"},
 "confidential"=>{"type"=>"boolean"},
 "created_at"=>{"type"=>"date"},
 "description"=>{"type"=>"text", "analyzer"=>"code_analyzer"},
 "due_date"=>{"type"=>"date"},
 "embedding_0"=>{"type"=>"dense_vector", "dims"=>768, "index"=>true, "similarity"=>"cosine"},

(Optional) OpenSearch:

  1. Connect to opensearch
  2. Create a workitem index from scratch: ::Gitlab::Elastic::Helper.default.create_standalone_indices(target_classes: [WorkItem])
  3. Note that the embedding_0 field exists in the workitems index
::Gitlab::Elastic::Helper.default.get_mapping(index_name: "gitlab-development-work_items")
=> {"archived"=>{"type"=>"boolean"},
 "assignee_id"=>{"type"=>"integer"},
 "author_id"=>{"type"=>"integer"},
 "confidential"=>{"type"=>"boolean"},
 "created_at"=>{"type"=>"date"},
 "description"=>{"type"=>"text", "analyzer"=>"code_analyzer"},
 "due_date"=>{"type"=>"date"},
 "embedding_0"=>{"type"=>"knn_vector", "dimension"=>768, "method"=>{"engine"=>"nmslib", "space_type"=>"cosinesimil", "name"=>"hnsw", "parameters"=>{"ef_construction"=>100, "m"=>16}}},

Related to #454764 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading