Backend: Component usage instrumentation
Summary
It is important to measure the number of times each component has been used on a monthly basis. This metric will allow us to:
- Measure component popularity
- Allow users to filter by popularity
Component Usage Instrumentation [2024-05-09 UPDATE]
To clarify: There are two different approaches we're taking for component usage instrumentation that serve different purposes. They are:
1. Monitor component usage trends on Tableau
We do this by sending data via Internal Events. Usage data is tracked with Snowplow or Service Ping and eventually ends up in our data warehouse. This is the data we can visualize in Tableau (#454912 (closed)).
2. Allow users to sort/filter Catalog projects by usage popularity
This is where #452545 (closed) comes into play. This aggregation work is for the purpose of completing Frontend: Show usage statistics and sort option... (#434333 - closed). It is not required for monitoring usage on Tableau.
Component Usage Definition
- "Component usage" = The number of unique projects that included a component in a pipeline within a given time period.
- "Included" specifically means when the
include:component
keyword is used. - Only components that have associated metadata records in
catalog_resource_components
are tracked. In other words, they must be released/versioned components in the CI/CD Catalog.
Proposal
Task 1: Create a partitioned component usage data table and implement the tracking logic.
- Note: We evaluated existing GitLab analytics instrumentation services and they do not fully support our requirements for both
GitLab.com
and self-managed instances. (See !144932 (comment 1783721915).)
Task 2: Implement a daily worker that evaluates the # unique of projects that used each component in the last 30 days.
- This effort has been moved to its own issue (post-GA): #452545 (closed)
Task 3: Implement Internal Events for the purpose of monitoring usage trends in Tableau.