Update occurrences rather than appending new ones
What does this MR do and why?
Describe in detail what your merge request does and why.
Related to: #373781 (closed)
The initial implementation of sbom_occurrences
involved tracking all
the dependencies in all of the pipelines. When we enabled the feature on
production, the table grew far too quickly (almost 1 million rows per
hour). In order to allow us to ship an MVC, we're going to take a more
minimal approach. When new SBoM reports are ingested, we'll create new
records, update the existing ones, and delete the no longer present
ones. The sbom_occurrences
table will only hold the current state of
the default branch.
💾
Database Aside from the migrations, there are two new queries introduced in ee/app/services/sbom/ingestion/post_ingestion_service.rb
.
Finding the not_present_occurrences
:
SELECT
"sbom_occurrences".*
FROM
"sbom_occurrences"
WHERE
"sbom_occurrences"."project_id" = 2
AND "sbom_occurrences"."id" IN (
SELECT
"sbom_occurrences"."id"
FROM
"sbom_occurrences"
WHERE
"sbom_occurrences"."project_id" = 2
AND "sbom_occurrences"."id" NOT IN (
...
)
)
Then deleting them in batches.
🔼
Migrating up These migrations delete data. Since the feature is currently behind a feature flag,
it's easier for us to delete all of the data in the sbom_occurrences
table and start anew
than it is to attempt to migrate that data to the new schema. These deletions won't be visible
to users. This will result in ~3 million rows being deleted from sbom_occurrences
in production,
and ~1000 rows in staging.
bin/rails db:migrate:main RAILS_ENV=test
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: migrating =================
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.5687s
main: -- indexes(:sbom_occurrences)
main: -> 0.0039s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_ingestion_attributes"})
main: -> 0.0032s
main: -- execute("RESET statement_timeout")
main: -> 0.0005s
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: migrated (0.5882s) ========
main: == 20221209174157 TruncateSbomOccurrences: migrating ==========================
main: -- execute("TRUNCATE sbom_occurrences")
main: -> 0.0040s
main: == 20221209174157 TruncateSbomOccurrences: migrated (0.0043s) =================
main: == 20221212192452 AddUuidColumnToSbomOccurrences: migrating ===================
main: -- add_column(:sbom_occurrences, :uuid, :uuid, {:null=>false})
main: -> 0.0018s
main: == 20221212192452 AddUuidColumnToSbomOccurrences: migrated (0.0021s) ==========
main: == 20221212192527 IndexSbomOccurrencesOnUuid: migrating =======================
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0009s
main: -- index_exists?(:sbom_occurrences, :uuid, {:unique=>true, :name=>"index_sbom_occurrences_on_uuid", :algorithm=>:concurrently})
main: -> 0.0024s
main: -- add_index(:sbom_occurrences, :uuid, {:unique=>true, :name=>"index_sbom_occurrences_on_uuid", :algorithm=>:concurrently})
main: -> 0.0017s
main: == 20221212192527 IndexSbomOccurrencesOnUuid: migrated (0.0102s) ==============
🔽
Migrating down bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221212192527
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221212192452
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221209174157
bin/rails db:migrate:down:main RAILS_ENV=test VERSION=20221209174132
main: == 20221212192527 IndexSbomOccurrencesOnUuid: reverting =======================
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0009s
main: -- indexes(:sbom_occurrences)
main: -> 0.0043s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_uuid"})
main: -> 0.0033s
main: -- execute("RESET statement_timeout")
main: -> 0.0005s
main: == 20221212192527 IndexSbomOccurrencesOnUuid: reverted (0.0203s) ==============
main: == 20221212192452 AddUuidColumnToSbomOccurrences: reverting ===================
main: -- remove_column(:sbom_occurrences, :uuid, :uuid, {:null=>false})
main: -> 0.0018s
main: == 20221212192452 AddUuidColumnToSbomOccurrences: reverted (0.0097s) ==========
main: == 20221209174157 TruncateSbomOccurrences: reverting ==========================
main: == 20221209174157 TruncateSbomOccurrences: reverted (0.0051s) =================
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: reverting =================
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.5789s
main: -- index_exists?(:sbom_occurrences, [:project_id, :component_id, :component_version_id, :source_id, :commit_sha], {:unique=>true, :name=>"index_sbom_occurrences_on_ingestion_attributes", :algorithm=>:concurrently})
main: -> 0.0048s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0010s
main: -- add_index(:sbom_occurrences, [:project_id, :component_id, :component_version_id, :source_id, :commit_sha], {:unique=>true, :name=>"index_sbom_occurrences_on_ingestion_attributes", :algorithm=>:concurrently})
main: -> 0.0064s
main: -- execute("RESET statement_timeout")
main: -> 0.0009s
main: == 20221209174132 RemoveSbomOccurrencesUniqueIndex: reverted (0.6062s) ========
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
-
Enable the
cyclonedx_sbom_ingestion
feature flag using the rails console:Feature.enable(:cyclonedx_sbom_ingestion)
-
Create a new project from a template, use the NodeJS/Express template.
-
Create a
.gitlab-ci.yml
file with this configuration:include: - template: Security/Dependency-Scanning.gitlab-ci.yml
-
Verify that the
gemnasium-dependency_scanning
outputs agl-sbom-npm-npm.cdx.json
artifact -
Connect to the DB with
gdk psql
. Run this query:select name, version, purl_type from sbom_components inner join sbom_component_versions on sbom_components.id = sbom_component_versions.component_id inner join sbom_occurrences on sbom_component_versions.id = sbom_occurrences.component_version_id where pipeline_id = YOUR_PIPELINE_ID;
-
Remove dependencies from package.json / package-lock.json
-
Run the query again; Removed dependencies should be deleted
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.