Report number of lines per language in repository charts
epic
Promoted toDescription
As of now, repository charts report percentage of language in the repo. First, it is not obvious how this percentage is computed (number of files? number of lines? bytes? what about comments? libraries?). Second, I would love to see some absolute data (number of files, lines, bytes).
Proposal
As a user I would like to see the number of lines of code per language. Ideally, excluding blank lines and comments, but that is optional.
Documentation blurb
As for use cases:
- better understand the repo structure
- if this is your repo, being able to report the number of lines of your main language
- this is one of the metric employers would like to know (I personally was surprised by this question on interview and could not clearly respond)
- all those use cases for general repo graphs (like pie chart of languages)
Details
Just so its stated explicitly, the language bar on the projects overview page is based on bytes. Iteration over each blob to count the number of lines will be quite expensive and to make it performant on gitlab.com scale will be quite the challenge
😄 Bytes are chosen as Git stores the size of each blob with its name. So if a blob has the pathpath/to/file.rb
it can take the extension and detect it's Ruby. It already has the number of bytes, so it can move on.
Lines however is harder, as now you'd have to either iterate each blob each time, or be clever with caching combined with diffing, which in turn might lead to race conditions.
That all being said, this would require a new RPC to Gitaly, and gitaly-proto changes. Happy to review MRs there!
🐱
Potential Workarounds
- Run
scc
in a GitLab CI pipeline: https://github.com/boyter/scc to generate SLOC/etc. reports - If API use is acceptable (instead of UI), parts of its output stats can be stored as custom attributes on the project: https://docs.gitlab.com/ee/api/custom_attributes.html#set-custom-attribute