Usage data counter interface
Harden Usage Ping - Consolidate all counters into four main counters with fail safes
- Add comment for usage_data.rb on the usage
Problem
In its current state, the entire usage ping payload breaks if there is an uncaught error. example. avg_cycle_analytics
was giving an uncaught error: !26381 (merged). This was fixed in 12.9
Result
Teams can add metrics to usage ping without the breaking the entire payload.
Proposal V1
Propose the MVC required to ensure usage ping does not break due to uncaught errors.
Per @jeromezng 's comment here:
Isolation will ensure robustness while parallelization ensures speed. Currently robustness is more important than speed (GitLab.com usage ping takes about 11 hours to run and with query optimizations this is reduced to ~6 hours).
MVC for robustness:
- Have a defined list of usage pings in usage_data.rb or a yaml file
- Cron job cycles through this entire list sequentially
- Each job calls a get_counter(attributes, etc) method, which can fail individually.
- Each get_counter method saves counter result to database
- Each get_counter method sends an atomic payload to Versions OR we wait for all get_counter methods to finish then query the database to build and send a single large payload to Versions.
- We can optionally parallelize by breaking this into three jobs: usage_activity_by_stage, usage_activity_by_stage_monthly, other counters
Other ideas we discussed:
- The idea about having a "Defined list of usage pings" in a separate database isn't a great option as it introduces state which needs to be configured. I'd rather have counters be stateless and defined in source control.
- The idea to be able to set_counters via chatops is something we can explore in the future, but also requires configuration with varying states.
Proposal V2
12.10: Consolidate all counters into four main counters with fail safes. Create an example of add_usage_data
method.
- Consolidate all counters into four main counters. We've already added ~90% to these: Batch Count and Distant Count.
- The four main counters will be: Batch Count, Distinct Count, Redis Count, Alternative Count.
- Convert non-batch counters to batch counters: #208923 (closed)
- Redis Counters: this includes anything that uses redis, this however may be a "russian doll" where usage pings calculations are temporarily stored in Redis / multiple usage pings. We will need to spend time tracing this.
- Alternative Counter: this includes anything miscellanous such as https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb#L26
- The four main counters will have their own
rescue
fail safe. Similar to what is currently done in Batch Count and Distinct Count. - We will then have an
add_usage_data
which is used to append data to the JSON payload. -
add_usage_data
method which will wrap all four main counters with rescue fail safes. - For 12.10,
add_usage_data
will have four examples, one for each of the main counters. - Work with stage teams to reimplement their counters using the four main counter methods:
- Jira Usage https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb#L198
- cycle_analytics
13.0: Expand add_usage_data
to all 400 counters
- Expand the
add_usage_data
from four examples to all 400 counters.