Introduce batch updates/inserts when renaming name and path
What does this MR do and why?
This MR is aimed at bringing performance improvement to the group's path rename and name rename process.
This is what happens when a group's path and/or name is renamed.
Explanation
Every group and project has an associated Route
record. This route record stores the full path and the name of the group/project.
Whenever a group's name or path is updated, the full path and name of all it's descendants (ie, subgroups across all depths and projects within these subgroups) also need to be updated. For this, the Route
record of all it's descendants need to be updated.
As of today, this is done on an individual basis using the following code:
The code is pretty self-explanatory:
-
For each descendant route record, we:
-
update
route.path
if path has changed. -
update
route.name
if name has changed. -
we call
route.update_columns
with the changed values, includingroute.updated_at
-
if
path
has changed, we also create aredirect_routes
record for the old path, so that even when the old full path is accessed, it redirects correctly to the current path. For eg, ifhttps://gitlab.com/gitlab-org/gitlab
is path renamed tohttps://gitlab.com/gitlab-org/gitlab-ee
, aredirect_routes
entry will be created for pathgitlab.com/gitlab-org/gitlab
, so thathttps://gitlab.com/gitlab-org/gitlab
is still accessible and it redirects correctly with the flash message, like:
Performance problem
While the above system works well for renaming descendant paths and names, problem arises when the number of descendants of a particular group is really high in number.
For example, if a group has 50k descendant subgroups and projects, the above loop has to run 50k times, and on each loop we are executing an SQL UPDATE (to update the routes
record) and SQL INSERT query (to insert redirect_routes
record).
On such groups, this execution can lead to 500
error because it takes more than 60 seconds to execute these queries for 50k descendants, and by that time the web server will time out, causing a 500
.
Solution
The solution in this MR is aimed at preventing such timeouts by improving the performance while these INSERTs and UPDATEs happen, by batching them.
This MR batches these queries in batches of 100, and thus executes only a single query for update and insert for a batch of 100 descendants in one go, thus tremendously improving performance.
!139782 (comment 1698960193) measured this performance improvement on local GDK for a group with 56k descendants, and the new approach turned out to be a ~10x improvement over the existing approach.
More details have been included in the code comments.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #432065 (closed)