Skip to content

Capture queries during rspec using pg auto_explain

Matt Kasa requested to merge 414325-record-auto-explain into master

What does this MR do and why?

Captures each unique sql query with an example query plan during rspec testing, deduplicates them, and creates an artifact containing this data in the new rspec:merge-auto-explain-logs job.

Note: Never runs when pipeline:expedite is present, and currently only runs when the pipeline:record-queries label is present.

This is accomplished by the following procedure:

  1. We enable the auto_explain module in postgres containers that are at least pg14, and configure it to log a plan and query text for every query to postgres's log, using auto_explain's json output format.
  2. At the end of each spec run, we mount the postgres log file as a table, using postgres's file_fdw feature. Note that accessing the postgres log file as a table is an example in the file_fdw documentation!
  3. We find the auto_explain entries by querying this log file for data that looks like json (specifically data that starts with {), since auto_explain is the only system writing json to the log file.
  4. We deduplicate these queries locally in the job by query_id, a cheap hashed representation of the query text that is not portable between rspec jobs or if a table is dropped then recreated.
  5. We calculate a fingerprint for each deduplicated query. This is a hash of the normalized query text that does not depend on the postgres instance running, and is comparable across jobs. We do this in each rspec job because it takes a few minutes if done during the merge process, but is very fast spread across all rspec jobs.
  6. We write an artifact for each database and each job - a gzipped ndjson file with rows that have query, plan, fingerprint, normalized (where normalized is a normalized form of the query, with constants replaced with placeholders).
  7. We merge and deduplicate each of these artifacts, producing a final artifact of ~5.8MB containing roughly 18,000 queries.

This artifact is the first step in enabling a few different advanced analyses of database queries:

  • We can diff it to detect new queries introduced in an MR, and decide if a database review is necessary based on this change.
  • We can inject changes to the database schema, and see how query plans react to those changes, for example to identify queries incompatible with partitioning.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #414325 (closed)

Edited by Simon Tomlinson

Merge request reports

Loading