Pre-generate package dependencies
What does this MR do and why?
The aim of this MR is to get rid of a loop over Packages::DependencyLink
when generating package's metadata.
Why? Because for a case when a package has 1k
versions and each version has about 100
dependencies we have to loop over 100k
dependency links. If we'll be able to not load all dependency links for every package and avoid looping through them, but rather load aggregated data - grouped dependency id by dependency type, it'll significantly speed up the metadata generation and the metadata endpoint in general.
So this MR introduced two hashes dependencies
and dependency_ids
.
The first dependencies
is supposed to hold dependencies with the required attributes and play a cache role between batches of packages.
It looks like <dependency id> : { <dependency name> : <dependency version_pattern> }
The second dependency_ids
keeps the relation between a package and its dependencies and looks like:
<package id> : { <dependency type> => [<dependency 1 id>, <dependency 2 id>, ...], ... }
Then when generating package's metadata we could use those two hashes to build up package's dependencies and avoid a loop through package's dependency links.
Screenshots or screen recordings
Benchmarks
To benchmark the service I prepared the following data locally:
# 2723 package versions. Yes, this is a real case.
# 348545 package dependency links.
# 129 package dependencies for every package version.
# generate_metadata_ips.rb
require 'benchmark/ips'
require_relative 'config/environment'
Benchmark.ips do |x|
x.report('Packages::Npm::GenerateMetadataService#execute') do
name = 'XXX'
packages = Packages::Package.where(name: name)
Packages::Npm::GenerateMetadataService.new(name, packages).execute
end
end
Before
➜ gitlab git:(392448-generate-p...) ✗ ruby generate_metadata_ips.rb
Warming up --------------------------------------
Packages::Npm::GenerateMetadataService#execute
1.000 i/100ms
Calculating -------------------------------------
Packages::Npm::GenerateMetadataService#execute
0.156 (± 0.0%) i/s - 1.000 in 6.425440s
After
➜ gitlab git:(392448-generate-p...) ✗ ruby generate_metadata_ips.rb
Warming up --------------------------------------
Packages::Npm::GenerateMetadataService#execute
1.000 i/100ms
Calculating -------------------------------------
Packages::Npm::GenerateMetadataService#execute
1.284 (± 0.0%) i/s - 7.000 in 5.527915s
I wasn't quite sure about the benchmarks and created the screen recordings:
before
It ends up with Timeout
error after 60s
after
Quite fast
How to set up and validate locally
-
The feature is behind the feature flag. Given that, the first step is to enable it:
Feature.enable(:npm_optimize_metadata_generation)
-
Create a package with dependencies:
def fixture_file_upload(*args, **kwargs) Rack::Test::UploadedFile.new(*args, **kwargs) end p = FactoryBot.create(:npm_package, project: Project.first, name: 'test') FactoryBot.create(:packages_dependency) do |d| FactoryBot.create(:packages_dependency_link, package: p, dependency: d) end
-
Query package's metadata
$ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:3000/api/v4/projects/<project_id>/packages/npm/test"
The server should return generated package's metadata
Database analysis
For all database query analysis I've used the existing package that has 2742
versions with 326399
dependency links and 129
dependencies.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #392448 (closed)