Ingest advisory and affected package data into the database
Why are we doing this work
The work in this issue covers adding data imported from the export bucket (or directory) into the database.
Relevant links
- license package metadata ingestion:
- issue for the sync service from which ingestion will be invoked: Add worker to trigger package metadata advisory... (#370780 - closed)
Non-functional requirements
- Documentation: n/a
- Feature flag: n/a
- Performance: n/a
- Testing: n/a
Proposal
Add db ingestion service which can bulk insert imported data in a manner similar to PackageMetadata::Ingestion::CompressedPackage::IngestionService
.
The worker can be broken up into 2 tasks:
- Task to upsert advisory data.
- Task to upsert affected package data using the id from task 1 as the foreign key.
Ingesting data objects
Data objects are an abstraction shared between database ingestion and json data import to structure imported data for use in instantiating models. The ingestion service will be invoked with a list of AdvisoryDataObjects
which will be turned into into PackageMetadata::Advisory
and PackageMetadata::AffectedPackage
model instances.
The structure of the emitted data objects will be:
module PackageMetadata
class AdvisoryDataObject
attr_accessor
:uuid, :source, :published_date, :title, :description, :cvss_v2, :cvss_v3, :urls, :identifiers,
:affected_packages
end
end
affected_packages
is a list of PackageMetadata::AffectedPackageDataObject
with the following structure:
module PackageMetadata
class AffectedPackageDataObject
attr_accessor
:purl_type, :package_name, :distro_version, :solution, :affected_range, :fixed_versions,
:pm_advisory_id
end
end
PackageMetadata::AdvisoryDataObject
to PackageMetadata::Advisory
Transforming These PackageMetadata::AdvisoryDataObject
fields have a 1-to-1 mapping to the model:
title
description
cvss_v2
cvss_v3
published_date
urls
identifiers
advisory_xid
source_xid
affected_packages
is a list of PackageMetadata::AffectedPackageDataObjects
affected by this advisory.
See the exporter's data format description for more info.
AffectedPackageDataObject
to PackageMetadata::AffectedPackage
Transforming PackageMetadata::AdvisoryDataObject.affected_packages
stores a list of data objects of type PackageMetadata::AffectedPackageDataObject
which correspond to the advisory. Note this list will only hold 1 package for advisories with source
type glad
.
These PackageMetadata::AffectedPackageDataObject
fields have a 1-to-1 mapping to the model:
affected_range
solution
fixed_versions
package_name
purl_type
pm_advisory_id
is set on each affected package after the package model has been stored in the database and its id is available. It provides the foreign key.
Implementation plan
-
Update the following models to support bulk upsert via BulkInsertSafe
(example).PackageMetadata::Advisory
PackageMetadata::AffectedPackage
-
Add PackageMetadata::Ingestion::Advisory::IngestionService
.-
#execute
is the entrypoint and is called with a list of instances of typePackageMetadata::AdvisoryDataObject
.
-
-
Add PackageMetadata::Ingestion::Advisory::AdvisoryIngestionTask
.-
#execute
is the entrypoint and is called with a list of data objects of typePackageMetadata::AdvisoryDataObject
. - Create a list of
PackageMetadata::Advisory
instances instantiated from the corresponding data objects. - Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
- Upsert using
PackageMetadata::Advisory.bulk_upsert!
. - For each inserted advisory set
PackageMetadata::AffectedPackageDataObject.pm_advisory_id
to theid
returned from the query. Affected package data objects corresponding to the inserted advisory are underPackageMetadata::AdvisoryDataObject.affected_packages
.
-
-
Add PackageMetadata::Ingestion::Advisory::AffectedPackageIngestionTask
-
#execute
is the entrypoint and is called with a list of data objects of typePackageMetadata::AffectedPackageDataObject
. - Create a list of
PackageMetadata::AffectedPackage
instances instantiated from the corresponding data objects. - Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
- Upsert using
PackageMetadata::AffectedPackage.bulk_upsert!
.
-
Note: CompressedPackage::IngestionService is an example of bulk-upserting using 2 tasks.