Update licenses schema limits
What does this MR do and why?
Update PackageMetadata::Package.licenses
json schema to account for outliers after last ingestion test: #409732 (comment 1384402080).
How to reproduce the errors
Checkout license validation script repo: https://gitlab.com/ifrenkel/license-schema-validation
To create the list of all validation errors: bundle exec ruby check_schema.rb
(you can supply the schema url to this script - e.g. master
vs this branch).
To summarize errors: bundle exec ruby analyze_errors.rb
This is a screenshot with more detail on the validation errors (generated by the scripts above):
purl type | err type | num times seen | max val | avg memsize (kb) | max mem size (kb) | total mem size (kb) | location in schema
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
go | maxItems | 5852 | 23724 | 164.24 | 197.09 | 328.48 | /definitions/versions
go | maxItems | 363 | 116 | 0.63 | 0.95 | 3.76 | /definitions/license_ids
go | maxItems | 11 | 20 | 0.16 | 0.20 | 0.66 | /definitions/non_default_licenses
maven | maxItems | 3908 | 2469 | 4.79 | 25.98 | 38.36 | /definitions/versions
npm | maxItems | 5183 | 11440 | 19.16 | 131.40 | 191.59 | /definitions/versions
npm | null | 1 | 256 | 0.41 | 0.41 | 0.41 | /definitions/lowest_version/oneOf/0
npm | null | 1 | 256 | 0.41 | 0.41 | 0.41 | /definitions/highest_version/oneOf/0
npm | maxLength | 2 | 256 | 0.41 | 0.41 | 0.41 | /definitions/version
nuget | maxItems | 2800 | 2647 | 13.35 | 25.98 | 80.07 | /definitions/versions
packagist | maxItems | 1234 | 1199 | 4.65 | 11.56 | 41.84 | /definitions/versions
packagist | maxItems | 1 | 13 | 0.14 | 0.14 | 0.14 | /definitions/license_ids
pypi | maxItems | 816 | 1902 | 7.38 | 17.33 | 51.66 | /definitions/versions
rubygem | maxItems | 490 | 1125 | 5.19 | 11.56 | 46.71 | /definitions/versions
rubygem | maxItems | 11 | 12 | 0.13 | 0.13 | 0.13 | /definitions/license_ids
Histograms
In order to visualize the distribution of errors (mostly maxItems
) grouped by error type and schema location, run: bundle exec ruby histogram.rb
error type: maxItems (schema location: /definitions/license_ids)
0..50: ******* (342)
50..100: * (31)
100..150: * (2)
error type: maxItems (schema location: /definitions/non_default_licenses)
0..50: * (11)
error type: maxItems (schema location: /definitions/versions)
50..100: ************************************************************************************************************************************************************************************************************************************* (11429)
100..150: ************************************************************************************* (4200)
150..200: *********************************** (1703)
200..250: *************** (730)
250..300: ************ (553)
300..350: ******** (354)
350..400: ***** (205)
400..450: **** (182)
450..500: *** (117)
500..550: *** (135)
550..600: ** (67)
600..650: * (44)
650..700: ** (55)
700..750: * (49)
750..800: * (17)
800..850: * (37)
850..900: * (32)
900..950: * (24)
950..1000: * (34)
1000..1050: * (21)
1050..1100: * (26)
1100..1150: * (15)
1150..1200: * (23)
1200..1250: * (10)
1250..1300: * (19)
1300..1350: * (7)
1350..1400: * (6)
1400..1450: * (10)
1450..1500: * (7)
1500..1550: * (13)
1550..1600: * (6)
1600..1650: * (6)
1650..1700: * (9)
1700..1750: * (4)
1750..1800: * (6)
1800..1850: * (4)
1850..1900: * (1)
1900..1950: * (5)
1950..2000: * (7)
2000..: *** (111)
error type: maxLength (schema location: /definitions/version)
250..300: * (2)
error type: null (schema location: /definitions/highest_version/oneOf/0)
250..300: * (1)
error type: null (schema location: /definitions/lowest_version/oneOf/0)
250..300: * (1)
Percentiles
For percentiles by error type: bundle exec ruby percentile.rb
error type: maxItems (schema location: /definitions/license_ids)
percentile 0.5, value: 15.0
percentile 0.75, value: 31.0
percentile 0.95, value: 89.0
percentile 0.99, value: 89.0
percentile 0.999, value: 115.62599999999998
error type: maxItems (schema location: /definitions/non_default_licenses)
percentile 0.5, value: 13.0
percentile 0.75, value: 18.0
percentile 0.95, value: 19.5
percentile 0.99, value: 19.9
percentile 0.999, value: 19.990000000000002
error type: maxItems (schema location: /definitions/versions)
percentile 0.5, value: 90.0
percentile 0.75, value: 143.0
percentile 0.95, value: 430.8999999999978
percentile 0.99, value: 1297.1800000000003
percentile 0.999, value: 5603.826000000081
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #408901 (closed)