Port FindLicense to Go
FindLicense is currently handled by the Gitaly-Ruby sidecar, and should be ported to Go for performance reasons. This could be handled by shelling out to Git, leveraging Git2Go, or another way.
Implementation options
So far we've found a few options which might be used to detect the license of a project.
github.com/go-enry/go-license-detector
The use of this package was implemented in !3797 (merged). It was partially rolled out with a feature flag, but we had to due to a mismatch in license names between that package and the Licensee gem used in GitLab rails. (More details in create-stage#12914 (closed))
Due to that, we decided to move all license detection and parsing to Gitaly and phase out the Licensee gem on rails side. This effort is tracked in &7874 (closed).
Shortcomings:
So while this package seem to work, it's missing info: it only provides the name, and we need more details from the license DB. We've filed an issue on the upstream project, but it didn't get any traction yet.
As Pavlo pointed out we could go fetch the db from the same SPDX source as the package does, but then we'd need to build that part ourself.
github.com/google/licenseclassifier
https://github.com/google/licenseclassifier is another package I've found which is able to detect a license.
Shortcomings:
In contrary to the previous package this provides (AFAICT) we need, but it does not locate the license file on it's own. So if we want to use this package, we'd need to implement that ourself.
github.com/go-enry/go-license-detector + github.com/google/licenseclassifier
Both projects seems to fill the gap from one another.
Shortcomings:
I'm worried we'll be getting ourself in the same trouble as we've had with go-license-detector
+ Licensee
.
go.elastic.co/go-licence-detector
This is another package I've found, built around github.com/google/licenseclassifier. It seems to have some code to locate the license file.
Shortcomings:
While it has code to locate the license file, it doesn't seem to support bare git repos. So I'm not sure it provides any value.
github.com/awslabs/yesiscan
I wasn't able to fully grasp what the features of this package are. But it's interesting to see it supports detecting a license in a git repo. I think it also can provide all the license details. I think it also uses github.com/google/licenseclassifier under the hood.
Shortcomings:
TBD? (documentation isn't clear)