Skip to content

Rake Task to truncate Legacy tables on Main and CI Database

Omar Qunsul requested to merge 366787-rake-task-to-truncate-legacy-tables into master

What does this MR do and why?

This MR introduced Rake task, to truncate the rest of the CI tables on the Main database, and the rest of the Main tables on the CI database.

#366787 (closed)

How do we truncate tables in this rake task?

Let's say that we want to truncate tables (table1, table2, table3, ... table9). We start by truncating the tables [table1, table2, table3], which don't depend on the other tables. Then we truncate the next 3 tables [table4, table5, table6]. But we still have to include the previously truncated 3 tables in the Truncate SQL statement, even if the tables have been truncated already.

We do the truncation in stages, to avoid any IO problems. Because the tables are huge. It might take 1 - 2 minutes to truncate some tables on Production. See here: gitlab-com/gl-infra/production#7511 (comment 1054799933)

How to set up and validate locally

  1. Make sure you are running in Decomposed Database Setup, which is the default option in GDK now. #356353 (closed)

  2. Lock the CI tables on the Main database, and the Main tables on the CI Database, using the rake task rake gitlab:db:lock_writes

Make sure that you cannot truncate a locked table using PSQL

gdk psql -c "truncate ci_pipelines"
ERROR:  cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "ci_builds" references "ci_pipelines".
HINT:  Truncate table "ci_builds" at the same time, or use TRUNCATE ... CASCADE.
  1. Now run the rake tasks to truncate the tables in DRY_RUN mode, see the commands that are going to run on both databases to truncate the tables.
DRY_RUN=true rake gitlab:db:truncate_legacy_tables:main

This will show truncation stages. In each stage it will truncate a new set of tables + all the tables that have been truncated in the previous stages. You can see the truncated tables in each stage with the message New tables to truncate:

Let's say that we want to truncate only the first 3 stages (around 15 tables). We can run the command

DRY_RUN=true UNTIL_TABLE=ci_unit_test_failures rake gitlab:db:truncate_legacy_tables:main

Remove the DRY_RUN=true to do the actual truncation.

If you don't specify UNTIL_TABLE, it will truncate all the tables. But 5 tables in each stage. To avoid truncating all the tables within 1 transaction.

If you wnat to change the number of the tables in each stage, you can pass the parameter to the rake task, for example:

DRY_RUN=true rake gitlab:db:truncate_legacy_tables:main[\30\]

This will truncate 30 new tables at least in each stage.

  1. The same for the Main Tables on the CI database
DRY_RUN=true rake gitlab:db:truncate_legacy_tables:ci

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #366787 (closed)

Edited by Omar Qunsul

Merge request reports

Loading