Skip to content

Add background migration to fix packages_sizes in project statistics

David Fernandez requested to merge 378433-fix-incoherent-packages-size into master

🚀 Context

In the project statistics object, we have a metric called packages_size that is the aggregation (sum) of all the package files size.

In #363010 (closed), we discovered that some packages_size seemed to be "out of sync". For example, a package_size of 10MB but the sum of all package files would be 5KB. The presumed root cause of this desync is concurrent updates: the packages_size metric could be updated in a burst. During that burst race conditions could appear and make some updates getting lost.

To solve this, we switched the packages_size metric updates to a "buffered" counter. In very short words, instead of updating the database column directly we use a redis entry where we incrby. Then, once in a while (each 10 minutes max), a job is responsible to move (or "flush") the updates from redis to the database. This way, redis acts as a "protection layer" for the database.

That's all nice but what about the existing out of sync data? Well, we don't have a lot of choices here, we need background migrations to detect incoherent packages_size and fix them. That's Background migration to fix incorrect statistic... (#378433 - closed).

We originally had this plan of having 2 background migrations to loop over different sets of project statistics but the database review spotted a possibility of merging the 2 migrations into a single one. This allows to have a more reliable migration as we have even less chances to miss a project statistic that might be updated between the 2 migrations. This MR is the combination of:

🔬 What does this MR do and why?

  • Introduce a background migration to deal with incoherent packages_size on project statistics.
  • Introduce 1 temporary index to support that migration.
  • Add the related specs.

📺 Screenshots or screen recordings

None

How to set up and validate locally

Let's create some packages for the first project. In a rails console:

def fixture_file_upload(*args, **kwargs)
  Rack::Test::UploadedFile.new(*args, **kwargs)
end

10.times { FactoryBot.create(:npm_package, project: Project.first) }

Wait at least 10 minutes, so that updates to packages_size are flushed.

Now, check the original packages_size and update it to make it incoherent.

Project.first.statistics.packages_size # 4096000 note this.

Project.first.statistics.update!(packages_size: 50)

Let's run the migration:

rails db:migrate

Wait for at least 10 minutes and check the packages_size again. It should be fixed.

Project.first.statistics.packages_size # 4096000 as it was originally

The background migration is working as expected 🎉

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database review

Migrations up

main: == 20230119123908 AddTemporarySizeIndexToPackageFiles: migrating ==============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0105s
main: -- index_exists?(:packages_package_files, [:package_id, :size], {:where=>"size IS NOT NULL", :name=>"tmp_idx_package_files_on_non_zero_size", :algorithm=>:concurrently})
main:    -> 0.0152s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0012s
main: -- add_index(:packages_package_files, [:package_id, :size], {:where=>"size IS NOT NULL", :name=>"tmp_idx_package_files_on_non_zero_size", :algorithm=>:concurrently})
main:    -> 0.0071s
main: -- execute("RESET statement_timeout")
main:    -> 0.0011s
main: == 20230119123908 AddTemporarySizeIndexToPackageFiles: migrated (0.0712s) =====

main: == 20230119123937 QueueFixIncoherentPackagesSizeOnProjectStatistics: migrating 
main: == 20230119123937 QueueFixIncoherentPackagesSizeOnProjectStatistics: migrated (0.0108s) 

Migrations down

main: == 20230119123937 QueueFixIncoherentPackagesSizeOnProjectStatistics: reverting 
main: == 20230119123937 QueueFixIncoherentPackagesSizeOnProjectStatistics: reverted (0.0674s) 

main: == 20230119123908 AddTemporarySizeIndexToPackageFiles: reverting ==============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1881s
main: -- indexes(:packages_package_files)
main:    -> 0.0114s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"tmp_idx_package_files_on_non_zero_size"})
main:    -> 0.0055s
main: -- execute("RESET statement_timeout")
main:    -> 0.0006s
main: == 20230119123908 AddTemporarySizeIndexToPackageFiles: reverted (0.2232s) =====

Queries

There is no UPDATE query as the update will go through the buffered counter.

🏎 Background migration execution time analysis for gitlab.com

We have 28465981 project statistics.

With batch (job) size of 17 000 rows, we will need 28465981 / 17000 = 1675 jobs (batches).

We're using a delay of 2 minutes, so we will need:

Total time = 1675 jobs * 2 minute delay = 3350min => ~2.33 days

to complete the background migration.

Edited by David Fernandez

Merge request reports

Loading