Migration to fix duplicate software licenses in license policies table
What does this MR do and why?
We have duplicated software licenses in software_licenses
table as identified in issue for the same spdx_identifer
with different name
values.
Major cause of the duplication was a very old backfill migration whose context is not clear !17004 (diffs)
software_license_policies
referencing duplicated software_licenses
have to be fixed before the duplicates can be deleted.
The migration in this MR:
- Finds duplicated licenses.
- Identifies the original license by matching the license name against the official name in https://spdx.org/licenses/licenses.json.
- Updates all
software_license_policies
referencing duplicated licenses to use the original license. - Deletes duplicated licenses.
This MR corrects data in software_license_policies
table for records that have a duplicated software_license_id
column.
This is the first of two migrations planned to address the issue.
- backend database DB migration to update
software_license_policies
table and replace the duplicatesoftware_license_id
with the original license and delete duplicated license. Ignoring duplicated licenses where no original license could be found- backend database Create unique index for
software_licenses
table onspdx_identifier
for not null spdx_identifier in case no duplicates are left.
Database
Queries
Migration
> bundle exec rake db:migrate VERSION=20230608133450
main: == [advisory_lock_connection] object_id: 228180, pg_backend_pid: 69105
main: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrating =
main: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrated (47.4008s)
main: == [advisory_lock_connection] object_id: 228180, pg_backend_pid: 69105
ci: == [advisory_lock_connection] object_id: 232400, pg_backend_pid: 69225
ci: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrating =
ci: -- The migration is skipped since it modifies the schemas: [:gitlab_main].
ci: -- This database can only apply migrations in one of the following schemas: [:gitlab_ci, :gitlab_internal, :gitlab_shared].
ci: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrated (0.0056s)
ci: == [advisory_lock_connection] object_id: 232400, pg_backend_pid: 69225
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #395776 (closed)