Update checkpoint data_types
What does this MR do and why?
Existing checkpoints in an instance's database have their checkpoints set to the advisories
data_type. This is incorrect, the data_type should be licenses. This has the negative result of PackageMetadata::SyncService not finding the last data checkpointed and starting sync from scratch.
The sync is non-destructive so no data is lost as part of this bug, but re-syncing the entire dataset is unnecessary and is costly in terms of resrouces.
Bug timeline
-
Add fields to Checkpoint (!118939 - merged) is applied
- adds
data_type
column - adds
Enums::PackageMetadata::DATA_TYPES
which is{ advisories: 1, licenses: 2 }
- sets existing checkpoint entries to
data_type=advisories
(or1
) - but this does not affect sync since the unique key on checkpoints only uses
purl_type
- adds
-
Add package metadata ingestion for version form... (!120027 - merged) is applied
- changes unique key from
purl_type
to(data_type, version_format, purl_type)
- next time sync runs it is looking for checkpoints with
data_type=licences
ordata_type=2
as well aspurl_type
- query changes from
select * from pm_checkpoints where purl_type = X
toselect * from pm_checkpoints where purl_type = X and data_type = 2 ...
- query changes from
- the correct (mislabeled) checkpoint is found, so a new one is created starting at
sequence: 0
,chunk: 0
- changes unique key from
- after this MR is applied
- the "mislabeled" checkpoints are now
data_type=2
and will be found the next time sync runs
- the "mislabeled" checkpoints are now
Example checkpoint lifecycle
How to set up and validate locally
- Add
advisories
checkpoints manually:gdk psql -c 'insert into pm_checkpoints(sequence:1111, chunk: 0, data_type: 2, purl_type: 1, version_format: 1)'
- Run migration, the data above should be removed.
- Run ingestion, wait for at least one new checkpoint to appear.
- Verify that the checkpoints are of type
licenses
with the int value being1
.
How to run ingestion
Run ingestion via rails runner
ingest.rb
: ingest.rb
Run this via: bundle exec rails runner ingest.rb
Sync progress can be see in log/application_json.log
where the sync url is indicated.
Note: The PM_SYNC_INDEV
environment flag controls whether sync runs in the development environment. It is false
by default. Ensure you can sync via export PM_SYNC_INDEV=true
before running ingest.rb
.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #414977 (closed)