Resolving cross joins on GroupUsersFinder
What does this MR do and why?
Resolving cross joins when Autocomplete::UsersFinder
calls Autocomplete::GroupUsersFinder
.
- This MR only addresses 1 of the 2 scenarios in the issue. There is still another scenario where
Autocomplete::GroupUsersFinder
is called, that will be addressed in a separate MR. We might create a separate issue for that.
Related issue: #420387 (closed)
IMPORTANT NOTE: This MR only addresses the Autocomplete in the Search bar on the top. For example, on the Group Merge Requests page. For example: https://gitlab.com/groups/gitlab-org/-/merge_requests. It doesn't affect the autocomplete while writing an issue description for example
What's going on in this MR?
As part of the Cells project, we are removing database cross joins between different database tables that should belong to different databases in the future. For this issue, we are removing a relatively complex autocomplete related query that involves both tables users
and members
. Both of these tables belong to different database schemas, which means they will end up on two different database tables in the future.
The old query used to issue a query that searches for users based on some query, that matches users emails
, username
or name
, and also belong to some group directly, or indirectly. By looking up the relations between users and groups via the members
table.
We have looked into different ways into resolving this, but we decided to go for the 4th suggested solution (see the thread), and releasing behind a new feature flag, so that we can measure the impact of the new change.
Answering some anticipated questions from the reviews:
What's the plan to rollout this FF?
- We will start by rolling out to a few developers in the grouptenant scale team so that they can test it for a while, then we release to the rest of the team later if no complaints. But this will be monitored closely via Kibana Logs, to see if the endpoints are responding slow when the feature flag is
true
. This is the rollout issue: #430268 (closed)
What do we proceed if it turns out that the new changes introduce a performance regression?
- We will rollout the MR, and look into other ways to solve this cross joins. Please see this thread
🤞
What do we proceed if it turns out that the new changes worked out as expected without performance regression? - We can consider extracting the logic included in the
Autocomplete::UsersFinder
into a new module, so that this idea of batch reduction can be reused on any other finders, to solve the cross join problem. - We will progress into resolving the other cross join in the addressed issue.
- We continue with the rollout of the FF, by removing the old code. Obviously.
How does these new queries solve the problem?
The original idea to solve the problem was to pluck
all the user_id'
from the Member
relation, and inject them into a new query to search for the users on the users
table. But doing so might risk sending a very big query to the database, in case the group has a very large number of users. Using unlimited pluck
is not advisable to solve these cross join problems. See here https://docs.gitlab.com/ee/development/database/multiple_databases.html#limited-pluck-followed-by-a-find
How this MR solves the problem is by splitting the range of users that belong to a group, directly or indirectly into several batches of size 1000
. We read the user_id
s from the members table via each_batch
, then do n queries on the users table with the old query, where n is number of batches. For each query we only return the first 20
users that match the query. The current query only returns the first 20 users, so we have a guarantee that from each batch, the users that don't make it to the top 20, will be excluded for sure.
We gather the top 20 users from each batch, then do 1 final query on them to sort them using the first original query.
New database queries
Notes
- Please don't consider the times of the queries, as much as the plans and the number of Shared Buffers hits/reads instead. As times can be affected by cached pages.
Autocomplete::Usersfinder
, which is currently on master.
For comparison, this is the old query that used to query users in - Existing query on
master
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76254
New queries introduced in this new MR.
- Getting user_ids in batches using
pluck_user_ids
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76251
- Getting users that match some query, within a lot of user_ids that are plucked from the previous query:
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76252
How to set up and validate locally
- Check out this branch
420387-fixing-cross-joins-in-autocomplete-finder
- Make sure to enable the feature flag via Rails Console
Feature.enable(:group_users_autocomplete_using_batch_reduction)
- Create or add a lot of users to a Group. To avoid adding more than 1000 of users, feel free to temporarily reduce the size of newly introduced constant
Autocomplete::UsersFinder::BATCH_SIZE
to a smaller size while testing. - Go to the merge requests page of the Group.
- In the top search bar, filter by the author. Try different users by
email
,username
ornickname
, and see that the autocomplete returns the expected results.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #420387 (closed)