Group-level DORA metrics API
What does this MR do?
In the previous MR, we added project-level DORA metrics API support. This MR is to extend the capability to group-level. The API interface is same with project-level API. It simply extracts the subject projects from the given group and execute the same query for daily metrics.
Given that we re-designed this feature for resolving previous performance concerns, the new API improved the timing drastically. You can see the actual query performance below.
A few notes:
- This MR is behind
dora_daily_metrics
feature flag, which is disabled by default. - In favor of this new API, the old APIs will be deprecated due to the performance issue.
- You can see the full PoC in this MR.
- Related #291746 (closed)
Manual QA
Metric: Deployment Frequency
Sample payload (Local)
> # Daily
> curl -H 'Private-token: XXX' "http://local.gitlab.test:8181/api/v4/groups/134/dora/metrics?metric=deployment_frequency&start_date=2021-03-01"
[{"2021-03-01":0},{"2021-03-02":1},{"2021-03-03":0},{"2021-03-04":0},{"2021-03-05":0},{"2021-03-06":0},{"2021-03-07":0},{"2021-03-08":0},{"2021-03-09":2}]
> # Monthly
> curl -H 'Private-token: XXX' "http://local.gitlab.test:8181/api/v4/groups/134/dora/metrics?metric=deployment_frequency&start_date=2021-03-01&interval=monthly"
[{"2021-03-01":3}]
> # All
> curl -H 'Private-token: XXX' "http://local.gitlab.test:8181/api/v4/groups/134/dora/metrics?metric=deployment_frequency&start_date=2021-03-01&interval=all"
3
Sample query
Conditions:
- Date range: 3 months
- Target group: https://gitlab.com/gitlab-org
- Interval: daily
- Executor: database-lab
- Using warm cache for
Group.all_projects
in order to reduce the noise of slow I/O on database-lab. See more details in the slack discussion.
Dummy data insertion (because the data collection has just began and production doesn't have enough date)
DO
$do$
DECLARE
d date;
BEGIN
FOR d in (select generate_series( '2020-12-01'::date, '2021-03-01', '1 day' )::date) LOOP
INSERT INTO dora_daily_metrics (environment_id, date, deployment_frequency, lead_time_for_changes_in_seconds)
VALUES (1178942, d, 10, 5);
END LOOP;
END
$do$;
Query
SELECT date, SUM(deployment_frequency) AS data
FROM "dora_daily_metrics"
WHERE "dora_daily_metrics"."environment_id" IN (
SELECT "environments"."id" FROM "environments" WHERE "environments"."project_id" IN (
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 9970)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")
)
SELECT id FROM "base_and_descendants" AS "namespaces"
)
) AND "environments"."tier" = 0 AND (tier IS NOT NULL)
) AND "dora_daily_metrics"."date" BETWEEN '2020-12-01' AND '2021-03-01'
GROUP BY "dora_daily_metrics"."date"
ORDER BY date ASC
Plan
- w/cold cache https://explain.depesz.com/s/Zd63H
- w/warm cache https://explain.depesz.com/s/Rh3i
Timing (w/ cold cache)
Time: 238.196 ms
- planning: 4.799 ms
- execution: 233.397 ms (estimated* for prod: 0.012...0.239 s)
- I/O read: 227.539 ms
- I/O write: N/A
Shared buffers:
- hits: 927 (~7.20 MiB) from the buffer pool
- reads: 240 (~1.90 MiB) from the OS file cache, including disk I/O
- dirtied: 1 (~8.00 KiB)
- writes: 0
A few notes:
- While it takes 222.721 msec on the index scan with
index_dora_daily_metrics_on_environment_id_and_date
, it mostly spends time onI/O Timings: read=218.642
. So the timing really depends on the I/O speed on the PostgreSQL instance.database-lab
is typically having a slow I/O.
Timing (w/ warm cache)
Time: 6.843 ms
- planning: 2.735 ms
- execution: 4.108 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 1292 (~10.10 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
recursive_namespace_lookup_as_inner_join
Bonus: with
Timing (w/ warm cache)
Time: 6.880 ms
- planning: 3.019 ms
- execution: 3.861 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 245 (~1.90 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Bonus: Query plan in production replica (not db-lab)
https://explain.depesz.com/s/LF1n
Planning Time: 2.067 ms
Execution Time: 23.167 ms
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?- [-] I have included a changelog entry.
-
I have not included a changelog entry because it's behind a feature flag.
- [-] Documentation (if required) => TO be added at feature flag removal.
- [-] Code review guidelines
- [-] Merge request performance guidelines
- [-] Style guides
-
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team