Visibility into which dependency config file classes produce frequent errors and require updating
Summary
In !162313 (merged), we introduced the ConfigFiles::Base
class where the intention is for each dependency manager config file type to be represented by a child class. The child class contains the parsing logic to extract a list of libraries and their versions from the file.
Originally, this class was designed to send an event to Sentry when a parsing error occurs. The idea is that if we frequently see the same errors for a particular Config File class across multiple projects, then it would indicate that the parsing logic needs to be updated.
However, after rolling out the feature flag to 5%, it was observed that the Sentry errors were already occurring more frequently than expected. Moreover it had these issues:
- The same error event would occur every time a new commit was merged to the default branch of the same projects.
- This frequency is a concern since Sentry has had issues with event load in the past.
- Sentry does not provide an easy way to visualize the frequency of errors (# of unique projects) per Config File class.
- Their dashboard/widget feature only allows grouping by certain preset metrics and possibly custom tags. So it might work if we send in a custom tag (not "extra" field) named
project_id
, but I don't think we should utilize this feature since it's uncommon at GitLab.
- Their dashboard/widget feature only allows grouping by certain preset metrics and possibly custom tags. So it might work if we send in a custom tag (not "extra" field) named
Given above, I believe we need to use a different error tracking approach that can handle volume. Consider these options:
-
Log more granular errors into Kibana and create a visualization for the error data.
-
Utilize Internal Events.
Proposal
We will proceed with logging more granular errors in Kibana. This may allow us to later integrate it as a visualization in Grafana (and possibly considered as part of our error budget).
- Refactor the error handling in
ConfigFiles::Base
.
- Create a separate Error class for each unique error message. This is so that we can record the Error class instead of the error message string (the latter is verbose and much more likely to change).
- Append the entire error object to
@errors
instead of just the error message.
- Log each individual parsing error.
- Consider ignoring the error if it's just "file empty" as it's not actually an indicator of invalid parsing logic.
- Ensure recorded metrics can be grouped together by these dimensions at minimum:
- ConfigFile class name
- Error message class.
- Create a new visualization in Elastic that shows the number of unique projects per ConfigFile class/error class over time.