Service for counting contributors in a group
What does this MR do and why?
This MR implements a service that gives the unique number of contributors (user-ids) within a given group and its subgroups. The feature uses the optional ClickHouse database. To implement the contributor count, we used the same query rules in the materialized view as the PG based contribution graph query.
Note: data will be populated "by hand" on PRD. I already prepared the table on STG.
Query:
SELECT count(distinct author_id) AS contributor_count
FROM (
SELECT
argMax(author_id, contributions.updated_at) AS author_id
FROM contributions
WHERE startsWith(path, '9970/')
AND "contributions"."created_at" >= '2017-01-01'
AND "contributions"."created_at" <= '2018-12-30'
GROUP BY id
) contributions
| Expression ((Projection + Before ORDER BY)) |
|---------------------------------------------------------------------------------------------------------------|
| Aggregating |
| Expression ((Before GROUP BY + (Projection + Before ORDER BY))) |
| Aggregating |
| Expression (Before GROUP BY) |
| ReadFromMergeTree (gitlab_clickhouse_main_staging.contributions) |
| Indexes: |
| MinMax |
| Keys: |
| created_at |
| Condition: and((created_at in (-Inf, 17895]), (created_at in [17167, +Inf))) |
| Parts: 7/28 |
| Granules: 2062/2382 |
| Partition |
| Keys: |
| toYear(created_at) |
| Condition: and((toYear(created_at) in (-Inf, 2018]), (toYear(created_at) in [2017, +Inf))) |
| Parts: 7/7 |
| Granules: 2062/2062 |
| PrimaryKey |
| Keys: |
| path |
| created_at |
| Condition: and((path in ['9970', '9971')), and((created_at in (-Inf, 17895]), (created_at in [17167, +Inf)))) |
| Parts: 5/7 |
| Granules: 8/2062 |
How to validate locally
Enable FFs:
Feature.enable(:clickhouse_data_collection)
Feature.enable(:event_sync_worker_for_click_house)
- Ensure that you're on ultimate
- Ensure that CH is configured: https://docs.gitlab.com/ee/development/database/clickhouse/clickhouse_within_gitlab.html
- For prepping the DB schema you can invoke:
bundle exec rake gitlab:clickhouse:migrate
- If your GDK is seeded, then you probably have some events records, you can sync them to CH:
ClickHouse::EventsSyncWorker.new.perform
- The service should return a count
described_class.new(
group: Group.find(1),
current_user: User.find(2),
from: Date.new(2020, 1, 1),
to: Date.new(2023, 11, 29)
).execute
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #432067 (closed)
Edited by Pedro Pombeiro