Skip to content

Introduce batch updates/inserts when renaming name and path

What does this MR do and why?

This MR is aimed at bringing performance improvement to the group's path rename and name rename process.

This is what happens when a group's path and/or name is renamed.

Untitled_Diagram.drawio__2_

Explanation

Every group and project has an associated Route record. This route record stores the full path and the name of the group/project.

Whenever a group's name or path is updated, the full path and name of all it's descendants (ie, subgroups across all depths and projects within these subgroups) also need to be updated. For this, the Route record of all it's descendants need to be updated.

As of today, this is done on an individual basis using the following code:

Screenshot_2023-12-27_at_12.45.35_PM

The code is pretty self-explanatory:

  • For each descendant route record, we:

  • update route.path if path has changed.

  • update route.name if name has changed.

  • we call route.update_columns with the changed values, including route.updated_at

  • if path has changed, we also create a redirect_routes record for the old path, so that even when the old full path is accessed, it redirects correctly to the current path. For eg, if https://gitlab.com/gitlab-org/gitlab is path renamed to https://gitlab.com/gitlab-org/gitlab-ee, a redirect_routes entry will be created for path gitlab.com/gitlab-org/gitlab, so that https://gitlab.com/gitlab-org/gitlab is still accessible and it redirects correctly with the flash message, like:

Screenshot_2023-12-27_at_12.49.59_PM

Performance problem

While the above system works well for renaming descendant paths and names, problem arises when the number of descendants of a particular group is really high in number.

For example, if a group has 50k descendant subgroups and projects, the above loop has to run 50k times, and on each loop we are executing an SQL UPDATE (to update the routes record) and SQL INSERT query (to insert redirect_routes record).

On such groups, this execution can lead to 500 error because it takes more than 60 seconds to execute these queries for 50k descendants, and by that time the web server will time out, causing a 500.

Solution

The solution in this MR is aimed at preventing such timeouts by improving the performance while these INSERTs and UPDATEs happen, by batching them.

This MR batches these queries in batches of 100, and thus executes only a single query for update and insert for a batch of 100 descendants in one go, thus tremendously improving performance.

!139782 (comment 1698960193) measured this performance improvement on local GDK for a group with 56k descendants, and the new approach turned out to be a ~10x improvement over the existing approach.

More details have been included in the code comments.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #432065 (closed)

Edited by Manoj M J

Merge request reports

Loading