Add similarity sort to search projects API (!64342) · Merge requests · GitLab.org / GitLab

Terri Chu requested to merge 332890-projects-api-find-projects-support-similarity-order into master Jun 17, 2021

What does this MR do?

~~Adjust similarity sort for projects to move exact matches on path or name to the top. This should improve the result ordering.~~ (removed this due to it not working as I expected)

Add similarity sort order by option to the Projects API. To use the new option, you must also send the search parameter. It only returns results for which users are authorized for. For logged in users, we will send the membership parameter set to true. For anonymous users, only public projects will be shown.

Update documentation and add new specs.

This will be used in a future MR by the Search project dropdown when searching projects when All Groups is selected in the group dropdown.

Database

As a non-admin

Note: tested with my user_id from GitLab.com

SQL

SELECT
    "projects".*,
    ROUND(CAST((( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."path", ''), 'gitlab') * CAST('1' AS numeric)) + ( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."name", ''), 'gitlab') * CAST('0.7' AS numeric)) + ( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."description", ''), 'gitlab') * CAST('0.2' AS numeric))) AS numeric), 2) AS similarity
FROM
    "projects"
    INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE
    "project_authorizations"."user_id" = 5708766
    AND (("projects"."path" ILIKE '%gitlab%'
            OR "projects"."name" ILIKE '%gitlab%')
        OR "projects"."description" ILIKE '%gitlab%')
    AND "projects"."pending_delete" = FALSE
ORDER BY
    ROUND(CAST((( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."path", ''), 'gitlab') * CAST('1' AS numeric)) + ( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."name", ''), 'gitlab') * CAST('0.7' AS numeric)) + ( /* gitlab/database/similarity_score */ SIMILARITY (COALESCE("projects"."description", ''), 'gitlab') * CAST('0.2' AS numeric))) AS numeric), 2) DESC,
    "projects"."id" DESC
LIMIT 20 OFFSET 0

Plan: https://explain.depesz.com/s/eazu

Cold cache:

Time: 1.637 s
  - planning: 13.891 ms
  - execution: 1.624 s
    - I/O read: 4.525 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 10624 (~83.00 MiB) from the buffer pool
  - reads: 11628 (~90.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 402 (~3.10 MiB)
  - writes: 0

Warm cache:

Time: 70.494 ms
  - planning: 7.821 ms
  - execution: 62.673 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 22174 (~173.20 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Screenshots (strongly suggested)

Using API call against my local gdk where I have data setup for this type of test (I do have two gitlab projects, one under GitLab.org group and one under Administrator group)

Before (using created_at desc sort)

❯ curl --request GET \
  --url 'http://localhost:3000/api/v4/projects?search=gitlab' \
  --header 'Private-Token: TOKEN' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' | jq '.[] .name'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 56408    0 56408    0     0  26925      0 --:--:--  0:00:02 --:--:-- 26925
"GitLab Development Kit"
"GitLab CI"
"gitlab core team"
"gitlab fox and hound"
"gitlab-experiment"
"gitlab-omnibus"
"gitlab-omnibus"
"GitLab Pry Byebye"
"GitLab"
"GitLab"
"GitLab Pry Byebug"
"GitLab fork and spoon"
"developer.gitlab.com"
"Monitoring"
"Gitlab Shell"
"Gitlab Test"

After (using similarity sort)

➜ curl --request GET \
  --url 'http://localhost:3000/api/v4/projects?search=gitlab&order_by=similarity' \
  --header 'Private-Token: TOKEN' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' | jq '.[] .name'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 56408    0 56408    0     0  26030      0 --:--:--  0:00:02 --:--:-- 26030
"GitLab"
"GitLab"
"GitLab CI"
"Gitlab Test"
"Gitlab Shell"
"gitlab-omnibus"
"gitlab-omnibus"
"gitlab core team"
"GitLab Pry Byebye"
"gitlab-experiment"
"GitLab Pry Byebug"
"gitlab fox and hound"
"developer.gitlab.com"
"GitLab fork and spoon"
"GitLab Development Kit"
"Monitoring"

Does this MR meet the acceptance criteria?

Conformity

I have included changelog trailers, or none are needed. (Does this MR need a changelog?)
I have added/updated documentation, or it's not needed. (Is documentation required?)
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
I have followed the style guides.
This change is backwards compatible across updates, or this does not apply.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
I have tested this MR in all supported browsers, or it's not needed.
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Edited Jul 21, 2021 by Mayra Cabrera

Add similarity sort to search projects API