Change the TanukiBot's distance function
What does this MR do and why?
This is the first MR for https://gitlab.com/gitlab-org/gitlab/-/issues/410581+.
It changes TanukiBot's distance function from inner_product
to
cosine
per OpenAI docs recommendation.
Follow-up MR: Add index to embeddings (!122035 - closed)
Screenshots or screen recordings
After running the following commands, we got these results:
current_user = User.first; client = ::Gitlab::Llm::OpenAi::Client.new(current_user); question = 'What is Fork?'; embeddings_result = client.embeddings(input: question); question_embedding = embeddings_result['data'].first['embedding'];
Embedding::TanukiBotMvc.neighbor_for(question_embedding, limit: 7).pluck(:id)
Before | After |
---|---|
Using inner_product as distance function |
Using cosine as distance function |
[6909, 12665, 6913, 6910, 7125, 6912, 6825] |
[6909, 12665, 6913, 6910, 7125, 6912, 6825] |
How to set up and validate locally
N/A
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Bojan Marjanovic