Add index on events table on project_id, created_at to support Group::AnalyticsController (!17056) · Merge requests · GitLab.org / GitLab FOSS

Gregory Stark requested to merge index-for-groups-analytic-controller-4899 into master Feb 12, 2018

Group::AnalyticsController does a bunch of queries for recent events on a set of projects. For each one we were previously doing a nested loop to retrieve all events for those projects but we only need the last week so for long-lived projects it was very inefficient. Add an index on project_id,created_at to allow fetching only the recent events. It's still an unnecessarily large number of queries each of which does a nested loop from many projects which can have many events but it looks like this will be quite a bit faster, at least for projects older than a month or two.

Note that Group::AnalyticsController is EE-only but I don't see any compelling reason to have the schema diverge here.

stark@tweedle:~/gitlab/gdk/gitlab-development-kit/gitlab$ rake db:migrate
== 20180212121259 IndexForGroupsAnalyticController4899: migrating =============
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:events, [:project_id, :created_at], {:algorithm=>:concurrently})
   -> 0.2084s
== 20180212121259 IndexForGroupsAnalyticController4899: migrated (0.2091s) ====

== 20180212121259 IndexForGroupsAnalyticController4899: reverting =============
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0008s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>[:project_id, :created_at]})
   -> 0.0156s
== 20180212121259 IndexForGroupsAnalyticController4899: reverted (0.0172s) ====

The new plan:

gitlabhq_production=# explain (analyze,buffers) SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
                                                                                                                                   QUERY PLAN                                                                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=121.69..121.71 rows=1 width=12) (actual time=0.740..0.740 rows=0 loops=1)
   Group Key: events.author_id
   Buffers: shared hit=560
   ->  Sort  (cost=121.69..121.69 rows=1 width=4) (actual time=0.740..0.740 rows=0 loops=1)
         Sort Key: events.author_id
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=560
         ->  Nested Loop  (cost=1.00..121.68 rows=1 width=4) (actual time=0.729..0.729 rows=0 loops=1)
               Buffers: shared hit=560
               ->  Index Scan using index_projects_on_namespace_id on projects  (cost=0.43..38.59 rows=18 width=4) (actual time=0.013..0.137 rows=113 loops=1)
                     Index Cond: (namespace_id = 9970)
                     Buffers: shared hit=103
               ->  Index Scan using stark_tmp_issue_4899 on events  (cost=0.57..4.61 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=113)
                     Index Cond: ((project_id = projects.id) AND (created_at > '2018-02-05 00:00:00+00'::timestamp with time zone))
                     Filter: (((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=457
 Planning time: 1.238 ms
 Execution time: 0.792 ms
(19 rows)

The old plan was (note the different "Index Cond" on index_events_on_project_id_and_id lacking the created_at column:

gitlabhq_production=# explain SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
                                                                                                                                                                      QUERY PLAN                                                                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=39008.12..39008.14 rows=1 width=12)
   Group Key: events.author_id
   ->  Sort  (cost=39008.12..39008.13 rows=1 width=4)
         Sort Key: events.author_id
         ->  Nested Loop  (cost=1.00..39008.11 rows=1 width=4)
               ->  Index Scan using index_projects_on_namespace_id on projects  (cost=0.43..30.34 rows=19 width=4)
                     Index Cond: (namespace_id = 9970)
               ->  Index Scan using index_events_on_project_id_and_id on events  (cost=0.57..2051.45 rows=1 width=8)
                     Index Cond: (project_id = projects.id)
                     Filter: ((created_at > '2018-02-05 00:00:00+00'::timestamp with time zone) AND ((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
(10 rows)

Database Checklist

When adding migrations:

Updated db/schema.rb
Added a down method so the migration can be reverted
Added the output of the migration(s) to the MR body
Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When adding or modifying queries to improve performance:

Included data that shows the performance improvement, preferably in the form of a benchmark
Included the output of EXPLAIN (ANALYZE, BUFFERS) of the relevant queries

When adding foreign keys to existing tables:

Included a migration to remove orphaned rows in the source table before adding the foreign key
Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

Ordered columns based on the Ordering Table Columns guidelines
Added foreign keys to any columns pointing to data in other tables
Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs

When removing columns, tables, indexes or other structures:

Removed these in a post-deployment migration
Made sure the application no longer uses (or ignores) these structures

General Checklist

Changelog entry added, if necessary
Documentation created/updated
API support added
Tests added for this feature/bug
Review
- Has been reviewed by Backend
- Has been reviewed by Database
Conform by the merge request performance guides
Conform by the style guides
Squashed related commits together

Edited Feb 15, 2018 by Gregory Stark

Add index on events table on project_id, created_at to support Group::AnalyticsController

Database Checklist

General Checklist

Merge request reports