Create abstraction layer to support Elasticsearch and OpenSearch
Since both OpenSearch and Elasticsearch will be supported for now, we want to create an abstraction layer which selects mappings, search code, etc. code based on whether ES or OS is used.
Solution validation
Where are there diverging paths between Elasticsearch and OpenSearch? Also between different versions of ES/OS.
- Index creation
- Specific mappings and/or settings in
*Config
class orTypes::
class
- Specific mappings and/or settings in
- Updating index
- Changing mapping: requires an ES migration which can be skipped and have different mappings
- Indexing documents (calling
.track!
)-
as_indexed_json
could be different - Sometimes
track!
should not be called if the index doesn't support a ref type
-
- Searching
- Search query could be different
- Administration
- Advanced Search admin page has different cluster connection options for OS vs. ES
So basically we have a few places that are likely to diverge:
- Mappings/settings during index creation
- Mapping updates in migrations
as_indexed_json
- Search queries
And then there might be places in code where we need checks for the platform used.
What else needs to be done in order to upgrade/remove the ES gems?
- TBD
How do we determine which path to serve?
The helper class has some methods around the platform used. For vectors we use Gitlab::Elastic::Helper.default.vectors_supported?(:elasticsearch)
which is info[:distribution] == 'elasticsearch' && info[:version].to_f >= 8
. Or we could use CurrentSettings.
How do we test on different versions and platforms?
QA tests. We run QA tests on different versions of OS and ES.
We also need to think about blobs/wikis. The json data is determined by the indexer so the indexer also would have diverging paths. We can pass extra options to the run command.
Implementation: inline if-else
Easiest would be to have a few methods in the ES helper similar to vectors_supported?
(which should be cached for performance) and we call these methods whenever there is a divergence.
Click to expand for example index mapping
def self.mappings
properties = {
type: { type: 'keyword' },
id: { type: 'integer' },
...
}
if helper.quantized_vectors_supported?(:elasticsearch)
properties[:embedding] = {
type: 'dense_vector',
dims: 768,
similarity: 'cosine',
index: true,
index_options: {
type: 'int8_hnsw'
}
}
elsif helper.vectors_supported?(:elasticsearch)
properties[:embedding] = {
type: 'dense_vector',
dims: 768,
similarity: 'cosine',
index: true
}
elsif helper.vectors_supported?(:opensearch)
properties[:embedding] = {
type: 'knn_vector',
dimension: 768,
method: {
name: 'hnsw'
}
}
end
{
dynamic: 'strict',
properties: properties
}
end
Click to expand for example mapping migration
class AddEmbeddingToIssues < Elastic::Migration
include Elastic::MigrationUpdateMappingsHelper
skip_if -> { !Gitlab::Elastic::Helper.default.vectors_supported? }
DOCUMENT_TYPE = Issue
private
def new_mappings
if helper.quantized_vectors_supported?(:elasticsearch)
{
embedding_2: {
type: 'dense_vector',
dims: 768,
similarity: 'cosine',
index: true,
index_options: {
type: 'int8_hnsw'
}
}
}
elsif helper.vectors_supported?(:elasticsearch)
{
embedding_0: {
type: 'dense_vector',
dims: 768,
similarity: 'cosine',
index: true
}
}
else
{
embedding_1: {
type: 'knn_vector',
dimension: 768,
method: {
name: 'hnsw'
}
}
}
end
end
end
Note that every different model/dimension/vector type has a different field name. This is in accordance to #471983 (closed).
Click to expand for example `as_indexed_json`
def as_indexed_json
data = {
routing: routing
}
if helper.quantized_vectors_supported?(:elasticsearch)
data["embedding_#{EmbeddingVersion.active.for_type(:elasticsearch, :quantized).id}"] = embedding
elsif helper.vectors_supported?(:elasticsearch)
data["embedding_#{EmbeddingVersion.active.for_type(:elasticsearch).id}"] = embedding
elsif helper.vectors_supported?(:opensearch)
data["embedding_#{EmbeddingVersion.active.for_type(:opensearch).id}"] = embedding
end
data
end
Con: we need to continue supporting older versions so the if statement will continue to grow until we decide to remove support for a version.
Also create an Architecture Design Document.