Add index on events table on project_id, created_at to support Group::AnalyticsController
Group::AnalyticsController does a bunch of queries for recent events on a set of projects. For each one we were previously doing a nested loop to retrieve all events for those projects but we only need the last week so for long-lived projects it was very inefficient. Add an index on project_id,created_at to allow fetching only the recent events. It's still an unnecessarily large number of queries each of which does a nested loop from many projects which can have many events but it looks like this will be quite a bit faster, at least for projects older than a month or two.
Note that Group::AnalyticsController is EE-only but I don't see any compelling reason to have the schema diverge here.
stark@tweedle:~/gitlab/gdk/gitlab-development-kit/gitlab$ rake db:migrate
== 20180212121259 IndexForGroupsAnalyticController4899: migrating =============
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- add_index(:events, [:project_id, :created_at], {:algorithm=>:concurrently})
-> 0.2084s
== 20180212121259 IndexForGroupsAnalyticController4899: migrated (0.2091s) ====
== 20180212121259 IndexForGroupsAnalyticController4899: reverting =============
-- transaction_open?()
-> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
-> 0.0008s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>[:project_id, :created_at]})
-> 0.0156s
== 20180212121259 IndexForGroupsAnalyticController4899: reverted (0.0172s) ====
The new plan:
gitlabhq_production=# explain (analyze,buffers) SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=121.69..121.71 rows=1 width=12) (actual time=0.740..0.740 rows=0 loops=1)
Group Key: events.author_id
Buffers: shared hit=560
-> Sort (cost=121.69..121.69 rows=1 width=4) (actual time=0.740..0.740 rows=0 loops=1)
Sort Key: events.author_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=560
-> Nested Loop (cost=1.00..121.68 rows=1 width=4) (actual time=0.729..0.729 rows=0 loops=1)
Buffers: shared hit=560
-> Index Scan using index_projects_on_namespace_id on projects (cost=0.43..38.59 rows=18 width=4) (actual time=0.013..0.137 rows=113 loops=1)
Index Cond: (namespace_id = 9970)
Buffers: shared hit=103
-> Index Scan using stark_tmp_issue_4899 on events (cost=0.57..4.61 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=113)
Index Cond: ((project_id = projects.id) AND (created_at > '2018-02-05 00:00:00+00'::timestamp with time zone))
Filter: (((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
Rows Removed by Filter: 0
Buffers: shared hit=457
Planning time: 1.238 ms
Execution time: 0.792 ms
(19 rows)
The old plan was (note the different "Index Cond" on index_events_on_project_id_and_id
lacking the created_at
column:
gitlabhq_production=# explain SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=39008.12..39008.14 rows=1 width=12)
Group Key: events.author_id
-> Sort (cost=39008.12..39008.13 rows=1 width=4)
Sort Key: events.author_id
-> Nested Loop (cost=1.00..39008.11 rows=1 width=4)
-> Index Scan using index_projects_on_namespace_id on projects (cost=0.43..30.34 rows=19 width=4)
Index Cond: (namespace_id = 9970)
-> Index Scan using index_events_on_project_id_and_id on events (cost=0.57..2051.45 rows=1 width=8)
Index Cond: (project_id = projects.id)
Filter: ((created_at > '2018-02-05 00:00:00+00'::timestamp with time zone) AND ((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
(10 rows)
Database Checklist
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted -
Added the output of the migration(s) to the MR body -
Added tests for the migration in spec/migrations
if necessary (e.g. when migrating data)
When adding or modifying queries to improve performance:
-
Included data that shows the performance improvement, preferably in the form of a benchmark -
Included the output of EXPLAIN (ANALYZE, BUFFERS)
of the relevant queries
When adding foreign keys to existing tables:
-
Included a migration to remove orphaned rows in the source table before adding the foreign key -
Removed any instances of dependent: ...
that may no longer be necessary
When adding tables:
-
Ordered columns based on the Ordering Table Columns guidelines -
Added foreign keys to any columns pointing to data in other tables -
Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs
When removing columns, tables, indexes or other structures:
-
Removed these in a post-deployment migration -
Made sure the application no longer uses (or ignores) these structures
General Checklist
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together