Metadata query `SELECT .. FROM pg_type` returns 20% of all tuples from GitLab.com's Postgres cluster
While investigating another issue I noticed that query: 2814304093225873107
utilizes a great deal of resources on our postgres instances.
Query
The query is as follows:
gitlabhq_production=# select * from pg_stat_statements where queryid=2814304093225873107;
-[ RECORD 1 ]-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
userid | 16384
dbid | 16401
queryid | 2814304093225873107
query | SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype +
| FROM pg_type as t +
| LEFT JOIN pg_range as r ON oid = rngtypid +
| WHERE +
| t.typname IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40)+
| OR t.typtype IN ($41, $42, $43) +
| OR t.typinput = $44::regprocedure +
| OR t.typelem != $45
calls | 7819708
total_time | 28423904.384813692
min_time | 2.545977
max_time | 1065.5387460000002
mean_time | 3.6349061096418893
stddev_time | 0.8468267154966969
rows | 6295112473
shared_blks_hit | 5192924371
shared_blks_read | 225
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 17.147496
blk_write_time | 0
An example with parameters bound looks like this, and returns about 900 rows
SELECT
t.oid,
t.typname,
t.typelem,
t.typdelim,
t.typinput,
r.rngsubtype,
t.typtype,
t.typbasetype
FROM
pg_type AS t
LEFT JOIN pg_range AS r ON oid = rngtypid
WHERE
t.typname IN ('int2', 'int4', 'int8', 'oid', 'float4', 'float8', 'text', 'varchar', 'char', 'name', 'bpchar', 'bool', 'bit', 'varbit', 'timestamptz', 'date', 'money', 'bytea', 'point', 'hstore', 'json', 'jsonb', 'cidr', 'inet', 'uuid', 'xml', 'tsvector', 'macaddr', 'citext', 'ltree', 'line', 'lseg', 'box', 'path', 'polygon', 'circle', 'interval', 'time', 'timestamp', 'numeric')
OR t.typtype IN ('r', 'e', 'd')
OR t.typinput = 'array_in(cstring,oid,integer)'::regprocedure
OR t.typelem != 0
Explain plan:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.41..321.24 rows=1289 width=86) (actual time=0.022..3.367 rows=809 loops=1)
Merge Cond: (t.oid = r.rngtypid)
Buffers: shared hit=669
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.28..313.25 rows=1289 width=82) (actual time=0.016..3.148 rows=809 loops=1)
Filter: ((typname = ANY ('{int2,int4,int8,oid,float4,float8,text,varchar,char,name,bpchar,bool,bit,varbit,timestamptz,date,money,bytea,point,hstore,json,jsonb,cidr,inet,uuid,xml,tsvector,macaddr,citext,ltree,line,lseg,box,path,polygon,circle,interval,time,timestamp,numeric}'::name[])) OR (typtype = ANY ('{r,e,d}'::"char"[])) OR ((typinput)::oid = '750'::oid) OR (typelem <> '0'::oid))
Rows Removed by Filter: 1907
Buffers: shared hit=667
-> Index Scan using pg_range_rngtypid_index on pg_range r (cost=0.13..4.72 rows=6 width=8) (actual time=0.003..0.006 rows=6 loops=1)
Buffers: shared hit=2
Planning Time: 0.286 ms
Execution Time: 3.453 ms
(11 rows)
Cost
Between 20% and 30% of all rows returned from postgres are for this query.
It is called at a sustained rate of around 8 reqs/second on each postgres node:
Ask
This appears to be static, or relatively static data.
Do we need to query it at this rate? Could we reduce the rate or only do this on startup