Skip to content

Use the new VSA query backend when loading records

Adam Hegyi requested to merge 335391-vsa-records-endpoint into master

What does this MR do and why?

This change implements the value stream analytics records endpoint to optionally use the aggregated backend. The new backend provides much better performance and we plan to enable it by default in 15.0.

SA runs a few different queries:

  • Median (already implemented)
  • Count (already implemented)
  • Average (will be implemented as a follow-up)
  • Related records (this MR)

image

Implementation

We have a central class that builds VSA queries: Gitlab::Analytics::CycleAnalytics::DataCollector (this will go away at some point). Within this class, we optionally call the new queries by invoking Gitlab::Analytics::CycleAnalytics::Aggregated::DataCollector.

The scopes and the base query builder is tested within the MR. The ee/spec/lib/gitlab/analytics/cycle_analytics/data_collector_spec.rb test files have been modified to test both cases (current and new). This test file runs various high-level tests related to VSA.

How to set up and validate locally

  1. Enable the feature
    Feature.enable(:use_vsa_aggregated_tables)
  2. Seed a new VSA project
    SEED_CYCLE_ANALYTICS=true SEED_VSA=true FILTER=cycle_analytics rake db:seed_fu
  3. The seed script prints the project path, copy it and navigate to the project.
  4. Go to the group.
  5. Go to Analytics > Value Stream
  6. Open the top right dropdown and Create new Value Stream
  7. Add a name and save.
  8. Start rails console and aggregate the data
    group = Group.find(x)
    Analytics::CycleAnalytics::DataLoaderService.new(group:group, model: Issue).execute
    Analytics::CycleAnalytics::DataLoaderService.new(group:group, model: MergeRequest).execute
  9. Load the VSA page again.
  10. Inspecting the records endpoint requests, we should see that the _stage_events tables are being used.

Database

Record loading query example: https://explain.depesz.com/s/WJEa

It's faster than the current queries (current: about 1s). I'm planning to optimize it further as a follow up using this technique.

It needs a bit more logic since the technique cannot be applied on all queries, for example when we add filters. The query performs well on the project level: https://explain.depesz.com/s/b54r

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #335391 (closed)

Edited by Adam Hegyi

Merge request reports

Loading