Skip to content

Changes candidate table to Keyset Pagination

Eduardo Bonet requested to merge mlops/add-keyset-pagination-to-candidates into master

What does this MR do and why?

Follow up from !108242 (merged), changes page pagination to keyset pagination

Screenshots

Before after
image image

Database

Fetching second page, with query, ordering by metric, limit 30:

SELECT
    ml_candidates.*,
    latest.value AS metric_value
FROM
    "ml_candidates"
    INNER JOIN ( SELECT DISTINCT ON (candidate_id, name)
            *
        FROM
            "ml_candidate_metrics"
        WHERE
            "ml_candidate_metrics"."name" = 'accuracy'
        ORDER BY
            candidate_id,
            name,
            id DESC) latest ON latest.candidate_id = ml_candidates.id
WHERE
    "ml_candidates"."experiment_id" = 13
    AND (ml_candidates.name LIKE '%andidate%')
    AND ('7.1' IS NULL
        AND "latest"."value" IS NULL
        AND "ml_candidates"."id" < 125
        OR '7.1' IS NOT NULL
        AND "latest"."value" IS NULL
        OR '7.1' IS NOT NULL
        AND "latest"."value" < '7.1'
        OR '7.1' IS NOT NULL
        AND "latest"."value" = '7.1'
        AND "ml_candidates"."id" < 125)
ORDER BY
    "latest"."value" DESC,
    "ml_candidates"."id" DESC
LIMIT 31

Execution Plan https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14866/commands/51923

Limit  (cost=12.28..12.29 rows=1 width=114) (actual time=2.656..2.659 rows=0 loops=1)
   Buffers: shared hit=9 read=1
   I/O Timings: read=2.584 write=0.000
   ->  Sort  (cost=12.28..12.29 rows=1 width=114) (actual time=2.655..2.657 rows=0 loops=1)
         Sort Key: latest.value DESC, ml_candidates.id DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=9 read=1
         I/O Timings: read=2.584 write=0.000
         ->  Nested Loop  (cost=9.21..12.27 rows=1 width=114) (actual time=2.621..2.623 rows=0 loops=1)
               Buffers: shared hit=3 read=1
               I/O Timings: read=2.584 write=0.000
               ->  Index Scan using index_ml_candidates_on_experiment_id_and_iid on public.ml_candidates  (cost=0.14..3.16 rows=1 width=106) (actual time=2.620..2.620 rows=0 loops=1)
                     Index Cond: (ml_candidates.experiment_id = 13)
                     Filter: (ml_candidates.name ~~ '%andidate%'::text)
                     Rows Removed by Filter: 0
                     Buffers: shared hit=3 read=1
                     I/O Timings: read=2.584 write=0.000
               ->  Subquery Scan on latest  (cost=9.07..9.10 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=0)
                     Filter: ((latest.value IS NULL) OR (latest.value < '7.1'::double precision) OR (latest.value = '7.1'::double precision))
                     Rows Removed by Filter: 0
                     I/O Timings: read=0.000 write=0.000
                     ->  Unique  (cost=9.07..9.08 rows=1 width=90) (actual time=0.000..0.000 rows=0 loops=0)
                           I/O Timings: read=0.000 write=0.000
                           ->  Sort  (cost=9.07..9.08 rows=1 width=90) (actual time=0.000..0.000 rows=0 loops=0)
                                 Sort Key: ml_candidate_metrics.candidate_id, ml_candidate_metrics.id DESC
                                 I/O Timings: read=0.000 write=0.000
                                 ->  Seq Scan on public.ml_candidate_metrics  (cost=0.00..9.06 rows=1 width=90) (actual time=0.000..0.000 rows=0 loops=0)
                                       Filter: (ml_candidate_metrics.name = 'accuracy'::text)
                                       Rows Removed by Filter: 0
                                       I/O Timings: read=0.000 write=0.000

How to set up and validate locally

  1. Enable the feature flag

    echo "Feature.enable(:ml_experiment_tracking)" | bundle exec rails c
  2. Create an Experiment and 10 candidates

    # Added variables
    user_id = 1
    project_id = 1
    
    exp = Ml::Experiment.create!(name: "Awesome Gitlab Experiment", user_id: user_id, project_id: project_id)
    100.times.each { |i| exp.candidates.create!(user_id: user_id, start_time: 0, name: "candidate_#{i}") }
    exp.candidates.each_with_index { |c, i| c.metrics.create!(name: "auc", value: i*0.1 , tracked_at: Time.zone.now, step: 1)} 
    exp.candidates.each_with_index { |c, i| c.metrics.create!(name: "accuracy", value: i* 0.1+0.1 , tracked_at: Time.zone.now, step: 1)} 
    exp.candidates.each { |c| c.params.create!(name: "algorithm", value: ["LogisticRegression", "DecisionTree"].sample )}
  3. Navigate to <your project>/-/ml/experiments and click on "Awesome Gitlab Experiment"

  4. Verify that pagination uses cursor now

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Eduardo Bonet

Merge request reports

Loading