Draft: Preliminary work on filtering based on additional properties
What does this MR do and why?
Notice that I do not intent to merge this MR.
The MR is created for the purpose of getting feedback on the metric definition format and new Redis key format needed to support filters. This MR contains no code changes.
The background for why we need filters are described in #435338 (closed)
In this MR I:
- Defined a new event with two additional properties. It looks like this:
--- description: Packaged pushed to the registry internal_events: true action: push_package_to_registry identifiers: - project - namespace - user additional_properties: label: description: The name of the package type property: description: The auth type. Either 'guest', 'user' or 'deploy_token' product_section: ci product_stage: package product_group: package_registry milestone: '17.0' introduced_by_url: TODO distributions: - ce - ee tiers: - free - premium - ultimate
- Modified the metric schema, for Internal Events metrics, to allow the filter definition described below.
- Migrated ~30 metrics (both
Redis
andRedisHLL
) in the micro framework build around the package repository. I migrated all metrics fordeploy_token
, a couple foruser
and all of the more general total count metrics. All metric are defined on the newpush_package_to_registry
event. - Added entries to
usage_data_counters/hll_redis_key_overrides.yml
andusage_data_counters/total_counter_redis_key_overrides.yml
to show how this would look when we want to reuse existing Redis keys.
The files usage_data_counters/hll_redis_key_overrides.yml
and usage_data_counters/total_counter_redis_key_overrides.yml
provide quite a few examples of how I imagine the Redis key could look like for these filtered metrics.
Metric definition
The different options we considered were originally discussed in this thread.
Here are a few examples of how different types of metrics could use a filter:
All time total count with a filter defined on one additional property:
[snip]
time_frame: all
data_source: internal_events
events:
- name: push_package_to_registry
filter:
label: terraform_module
[snip]
Logic interpretation: label == "terraform_module"
Unique count of users with a filter defined on multiple properties:
[snip]
time_frame: 28d
data_source: internal_events
events:
- name: push_package_to_registry
unique: user.id
filter:
label: terraform_module
property: deploy_token
[snip]
Logic interpretation: label == "terraform_module" && property == "deploy_token"
Unique count of users on multiple event (the same event in this case). A filter is defined for each event:
[snip]
- name: push_package_to_registry
unique: user.id
filter:
label: conan
property: deploy_token
- name: push_package_to_registry
unique: user.id
filter:
label: generic
property: deploy_token
- name: push_package_to_registry
unique: user.id
filter:
label: helm
property: deploy_token
[snip]
Logic interpretation: label == "conan" && property == "deploy_token" || label == "generic" && property == "deploy_token" || label == "helm" && property == "deploy_token"
Redis key format
I went the verbose way and encode the entire filter in the Redis key so you can read it fairly easy:
The two examples above would be stored under the following Redis keys, if nothing is done to override them:
{event_counters}_push_package_to_registry-[label:terraform_module]
{hll_counters_push_package_to_registry-[label:terraform_module,property:deploy_token]-user
The property names in the filter part of the key are sorted lexicographically to prevent ambiguity.
Related to #435338 (closed)