Clean up orphan Software licenses
This is a follow up from https://gitlab.com/gitlab-org/gitlab/-/issues/337329
We store Software licenses in the shared table. If the user adds a new unknown in SPDX license, it stores there forever.
When we address issues with cleaning up newly added licenses, we need to clean up what we have in software_licenses
table
Proposed implementation plan
- Create a background migration to clean up orphan licenses
- Get all licenses in
software_licenses
that havespdx_identifier: nil
and does not belong to any projects.
SoftwareLicense
.where(spdx_identifier: nil)
.where.not(
id: SoftwareLicensePolicy.where(
software_license_id: SoftwareLicense.where(spdx_identifier: nil)
)
.pluck(:software_license_id))
- Remove those licenses
Verification steps
- Access the postgres.ai as described here
- Copy the command
EXPLAIN SELECT * FROM "software_licenses"
WHERE "software_licenses"."spdx_identifier" IS NULL
AND "software_licenses"."id" NOT IN (
SELECT
"software_license_policies"."software_license_id"
FROM
"software_license_policies")
- Check for the number of rows returned in the Plain with execution section as described here
- Verify the actual row count is 0.
Edited by Marcos Rocha