Add a definition for all snowplow events emitted by GitLab Backend
Problem
There are events that are being send to our Snowplow collector but don't have an event definition. This makes these events essentially undocumented, which creates problems for event discovery as well as to estimate migration efforts and similar.
The "undocumented" events for the backend can be found via (based on the assumption that those don't emit a page_url_path
):
SELECT event_action
FROM prod.common_mart.mart_behavior_structured_event
WHERE behavior_at > CURRENT_DATE - 7
AND event_action NOT IN (SELECT action FROM SREHM_PREP.PUBLIC.EVENT_DEFINITIONS)
AND page_url_path IS NULL
AND app_id = 'gitlab'
GROUP BY 1
NOTE: SREHM_PREP.PUBLIC.EVENT_DEFINITIONS
is a manual upload of the CSV export from metrics.gitlab.com/events so it needs to be manually kept up-to-date otherwise the above query might include already defined events. It should be accessible by anyone with the SNOWFLAKE_ANALYST
role.
Alternatively you can create your own version of the table by:
- go to metrics.gitlab.com/events and click the export button.
- In Snowflake navigate to Data > Add Data
- Select a warehouse and select
[your_username]_prep
Database - Under
File format
clickView options
- Under
Header
selectFirst line contains header
- Then import the data and you can access it in your own table.
Desired Outcome
All events emitted by the GitLab backend are documented.
Potential Solution
- Build a Snowflake query that can check
tier
and/oridentifiers
data based on the event name
Then, for each event:
- Manually find the MR that introduced the event [we could try to automize it with a script, but it would still require manual confirmation]
- Based on the MR, write a description
- Write down the MR metadata into attributes like
milestone
,product_group
[it's possible that this could be automatized] - Fill out tier & identifiers based on the Snowflake data and/or the MR diff
Some of the events might require talking with the team that introduced them to fill out attributes like description
, unclear product_group
s etc