Update old temporary NuGet packages to processing
🌳 Context
We recently introduced a status
column to the Packages::Package
model and table to identify what state a package is in (default, processing, error).
One package type, NuGet, was previously depending on logic around the package name to determine if it was in the processing
state. When a user uploads a NuGet package, a package record gets created with the name 'NuGet.Temporary.Package'
. Then it eventually is updated to the actual package name after the internal package metadata has been processed. With the new column, when we create the package, we also set the status to processing
, and then update it to default
after the package has been fully processed.
When filtering packages, we previously would filter out packages where(name: 'NuGet.Temporary.Package')
, but now, we can simply filter where(status: :processing)
.
When the status
column was added, it was given a default value of default
, which is used for valid, usable packages. This means that NuGet 'NuGet.Temporary.Package'
packages that existed prior to that addition would have a default
status rather than processing
🔍 What does this MR do?
This MR updates NuGet packages with the temporary name that currently have the default
status to use the processing
status. This will allow us to follow up and remove the scopes and filters that were very specific to NuGet and just use the status filter instead.
A small side note, you may be wondering why we have so many older packages that are going to be in the
processing
state. It is likely that these packages are "stuck" in processing because they have failed due to errors. At a future time, we will be looking at if and how we can remove some of these stale packages, but for now, we just want to be sure they have the correct status so they are not accidentally returned when users are trying to search or install packages.
🐘 Database
Up migration output:
== 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: migrating ==== -- transaction_open?() -> 0.0001s -- index_exists?(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently}) -> 0.0352s -- execute("SET statement_timeout TO 0") -> 0.0053s -- add_index(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently}) -> 0.0360s -- execute("RESET ALL") -> 0.0021s == 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: migrated (0.0829s)== 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: migrating = == 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: migrated (0.0918s)
== 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: migrating = -- transaction_open?() -> 0.0000s -- indexes(:packages_packages) -> 0.0234s -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package"}) -> 0.0117s == 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: migrated (0.0404s)
Down migration output:
== 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: reverting = -- transaction_open?() -> 0.0000s -- index_exists?(:packages_packages, :id, {:where=>"package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0", :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package", :algorithm=>:concurrently}) -> 0.0112s == 20210513155635 RemoveTemporaryPackageIndexForNugetDataMigration: reverted (0.0127s)== 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: reverting = == 20210513155546 BackfillNugetTemporaryPackagesToProcessingStatus: reverted (0.0000s)
== 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: reverting ==== -- transaction_open?() -> 0.0000s -- indexes(:packages_packages) -> 0.0102s -- execute("SET statement_timeout TO 0") -> 0.0011s -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_index_packages_on_id_where_nuget_default_temp_package"}) -> 0.0050s -- execute("RESET ALL") -> 0.0011s == 20210513155447 AddTemporaryPackageIndexForNugetDataMigration: reverted (0.0191s)
Queries
This migration updates all packages_packages
records meeting this condition:
package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0
There are 5298 records that meet this condition. This number will not change, since new NuGet packages are assigned the processing
status when they are created. These are all older historic packages.
gitlabhq_production=> select count(*) from packages_packages where package_type = 4 and name = 'NuGet.Temporary.Package' and status = 0;
count
-------
5298
(1 row)
First we create a new index to speed up all queries used in the data migration:
CREATE INDEX CONCURRENTLY index_packages_on_id_where_nuget_default
ON packages_packages USING btree(id)
WHERE package_type = 4
AND status = 0
AND name = 'NuGet.Temporary.Package'
This took 3.808s
on postgres.ai
Now we use EachBatch
to iterate over the relation. With a batch of 100, these are the queries it will use:
-
Find lower bound ID of the batch - Explain Plan -
2.4ms
SELECT id FROM packages_packages WHERE package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0 ORDER BY id ASC LIMIT 1;
-
Find upper bound ID of the batch - Explain Plan -
1ms
SELECT "packages_packages"."id" FROM packages_packages WHERE package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0 AND id >= 126508 ORDER BY "packages_packages"."id" ASC LIMIT 1 OFFSET 100;
-
Update the batch - Explain Plan -
31.3ms
UPDATE packages_packages SET status = 2 WHERE package_type = 4 AND name = 'NuGet.Temporary.Package' AND status = 0 AND id >= 126508 AND id < 288784;
Estimated total migration time:
5298 records / 100 per batch = 53 batches
53 batches * (2.4ms + 1ms + 31.3ms) = 1874ms => ~2s
☑ Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Related to #324206 (closed)