Add background worker to push funnel configurations to configurator endpoint
Problem to solve
With Create new configurator endpoint and migration ... (#438033 - moved) we will have added a new configurator endpoint to store funnel definitions in the Clickhouse database. We now need to push the projects funnel definitions to this endpoint, so Cube can make use of them.
Proposed solution
As laid out in #427860 (comment 1688776812) and #438035 (comment 1730480472), when a new merge or push event takes place within the project's repo, we should:
- Trigger a background worker to retrieve the funnels from
.gitlab/analytics/funnels/
. - Validate each of the funnels to make sure it meets our schema requirements. Ignoring any which don't meet them.
- Send the funnels data needed by the configurator endpoint to meet the endpoint requirements.
Examples
Example patch
Index: config/sidekiq_queues.yml
===================================================================
diff --git a/config/sidekiq_queues.yml b/config/sidekiq_queues.yml
--- a/config/sidekiq_queues.yml (revision HEAD)
+++ b/config/sidekiq_queues.yml (revision Staged)
@@ -559,6 +559,8 @@
- 1
- - product_analytics_post_push
- 1
+- - product_analytics_sync_funnels
+ - 1
- - project_cache
- 1
- - project_destroy
Index: ee/app/services/ee/git/branch_push_service.rb
===================================================================
diff --git a/ee/app/services/ee/git/branch_push_service.rb b/ee/app/services/ee/git/branch_push_service.rb
--- a/ee/app/services/ee/git/branch_push_service.rb (revision HEAD)
+++ b/ee/app/services/ee/git/branch_push_service.rb (revision Staged)
@@ -11,6 +11,7 @@
enqueue_zoekt_indexing
enqueue_update_external_pull_requests
enqueue_product_analytics_event_metrics
+ enqueue_product_analytics_sync_funnels
super
end
@@ -24,6 +25,13 @@
::ProductAnalytics::PostPushWorker.perform_async(project.id, newrev, current_user.id)
end
+ def enqueue_product_analytics_sync_funnels
+ return unless project.product_analytics_enabled?
+ return unless default_branch?
+
+ ::ProductAnalytics::SyncFunnelsWorker.perform_async(project.id, newrev, current_user.id)
+ end
+
def enqueue_elasticsearch_indexing
return unless should_index_commits?
Index: ee/app/workers/product_analytics/sync_funnels_worker.rb
===================================================================
diff --git a/ee/app/workers/product_analytics/sync_funnels_worker.rb b/ee/app/workers/product_analytics/sync_funnels_worker.rb
new file mode 100644
--- /dev/null (revision Staged)
+++ b/ee/app/workers/product_analytics/sync_funnels_worker.rb (revision Staged)
@@ -0,0 +1,28 @@
+# frozen_string_literal: true
+
+module ProductAnalytics
+ class SyncFunnelsWorker
+ include ApplicationWorker
+
+ data_consistency :sticky
+ feature_category :product_analytics_data_management
+ idempotent!
+
+ def perform(project_id, newrev, user_id)
+ @project = Project.find_by_id(project_id)
+ @commit = @project.repository.commit(newrev)
+ @user_id = user_id
+
+ sync_funnels
+ end
+
+ private
+
+ def sync_funnels
+ funnel_files = @commit.deltas.select { |delta| delta.old_path.start_with?(".gitlab/analytics/funnels/") }
+ new_files = funnel_files.select { |file| file.new_file }
+ renamed_files = funnel_files.select { |file| file.renamed_file }
+ deleted_files = funnel_files.select { |file| file.deleted_file }
+ end
+ end
+end
Index: ee/app/workers/all_queues.yml
===================================================================
diff --git a/ee/app/workers/all_queues.yml b/ee/app/workers/all_queues.yml
--- a/ee/app/workers/all_queues.yml (revision HEAD)
+++ b/ee/app/workers/all_queues.yml (revision Staged)
@@ -1686,6 +1686,15 @@
:weight: 1
:idempotent: true
:tags: []
+- :name: product_analytics_sync_funnels
+ :worker_name: ProductAnalytics::SyncFunnelsWorker
+ :feature_category: :product_analytics_data_management
+ :has_external_dependencies: true
+ :urgency: :low
+ :resource_boundary: :unknown
+ :weight: 1
+ :idempotent: true
+ :tags: []
- :name: project_import_schedule
:worker_name: ProjectImportScheduleWorker
:feature_category: :source_code_management
Index: ee/spec/services/ee/git/branch_push_service_spec.rb
===================================================================
diff --git a/ee/spec/services/ee/git/branch_push_service_spec.rb b/ee/spec/services/ee/git/branch_push_service_spec.rb
--- a/ee/spec/services/ee/git/branch_push_service_spec.rb (revision HEAD)
+++ b/ee/spec/services/ee/git/branch_push_service_spec.rb (revision Staged)
@@ -215,13 +215,23 @@
end
with_them do
- it 'enqueues the worker if appropriate' do
+ it 'enqueues the post push worker if appropriate' do
if called
expect(::ProductAnalytics::PostPushWorker).to receive(:perform_async).once
else
expect(::ProductAnalytics::PostPushWorker).not_to receive(:perform_async)
end
+ subject.execute
+ end
+
+ it 'enqueues the sync funnels worker if appropriate' do
+ if called
+ expect(::ProductAnalytics::SyncFunnelsWorker).to receive(:perform_async).once
+ else
+ expect(::ProductAnalytics::SyncFunnelsWorker).not_to receive(:perform_async)
+ end
+
subject.execute
end
end
Example data from commit deltas
# New file
=> [#<Gitlab::Git::Diff:0x00000002944fd0d0
@a_mode="0",
@b_mode="100644",
@deleted_file=false,
@diff="",
@expanded=true,
@generated=nil,
@new_file=true,
@new_path=".gitlab/analytics/funnels/gdk_tracking_dashboards.yaml",
@old_path=".gitlab/analytics/funnels/gdk_tracking_dashboards.yaml",
@overflow=false,
@renamed_file=false>]
# Updated file
=> [#<Gitlab::Git::Diff:0x00000002cf23bf50
@a_mode="100644",
@b_mode="100644",
@deleted_file=false,
@diff="",
@expanded=true,
@generated=nil,
@new_file=false,
@new_path=".gitlab/analytics/funnels/gdk_tracking_dashboards.yaml",
@old_path=".gitlab/analytics/funnels/gdk_tracking_dashboards_1.yaml",
@overflow=false,
@renamed_file=true>]
# Deleted file
=> [#<Gitlab::Git::Diff:0x00000002949f69b0
@a_mode="100644",
@b_mode="0",
@deleted_file=true,
@diff="",
@expanded=true,
@generated=nil,
@new_file=false,
@new_path=".gitlab/analytics/funnels/gdk_tracking_dashboards.yaml",
@old_path=".gitlab/analytics/funnels/gdk_tracking_dashboards.yaml",
@overflow=false,
@renamed_file=false>]
3️⃣
Implementation plan - - Create a new background worker and add it to sidekiq.
- Trigger the new background worker within
::EE::Git::BranchPushService
in the same way asenqueue_product_analytics_event_metrics
.- The worker should only be run when:
- Product analytics is enabled.
- The commit has been applied to the default branch.
- The worker should only be run when:
- The new background worker should find any funnels that were changed by the commit and:
- Validate them against the funnels' schema, ignoring any that are invalid.
- Determine if the project being edited is being used by other projects within the same group for custom dashboards storage.
- If it is, it should collate the list of projects it affects, and send those project IDs to the configurator endpoint instead.
- Send the funnels data needed by the configurator endpoint to meet the endpoint requirements.
- This should use the existing funnels model.
-
note: The funnels model currently has a bug where if you're using a custom dashboard location, it
500
errors because we're not using@config_project
at https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/product_analytics/funnel.rb#L46-46.
- Update and add tests.
Proposed endpoint input
{
"funnels": [
{
"name": 'Funnel 1',
"schema": "[ {\"name\": \"completed_purchase\",\"sql\": \" SELECT\n (SELECT max(derived_tstamp) FROM snowplow_events) as x,\n windowFunnel(3600)(toDateTime(derived_tstamp), page_urlpath = '/page1.html', page_urlpath = '/page2.html', page_urlpath = '/page3.html') as step\n FROM gitlab_project_25.snowplow_events\n\",\"steps\": [\"page_urlpath = '/page1.html'\",\"page_urlpath = '/page2.html'\",\"page_urlpath = '/page3.html'\"]}]",,
"state": "created"
},
{
"name": 'Funnel 2',
"state": "deleted"
},
{
"name": 'Funnel 3',
"schema": "FULL NEW SCHEMA OBJECT",
"state": "updated"
},
{
"name": 'Funnel 5',
"schema": "FULL NEW SCHEMA OBJECT",
"state": "updated",
"previous_name": "Funnel 4"
},
]
}
Edited by Halil Coban