ApplicationRecord#with_fast_statement_timeout forces known expensive queries to run on the postgres primary
ApplicationRecord#with_fast_statement_timeout
was introduced in !44184 (merged) and is intended to be used to force a short timeout on a potentially expensive but low value query.
The change, illustrates how it is used 517a05c7
On the MR list page, we like to display how many issues were found from the filtered search query in total. However, especially when the filter includes conditions on the MR title or description, this can be very expensive to calculate, and involve reading gigabytes of text data from the database.
As long as the data is already in the page cache, this usually finishes within the 15-second timeout on GitLab.com, but if the database cache is cold, a statement timeout is the usual occurrence.
More generally, it's not very clever to spend so much time calculating a piece of information with marginal value.
This MR applies a shorter limit to the counting statements and provides for graceful fallback to a '?' value, with a nice tooltip, if the query times out. This means we're able to view the results in a reasonable time, rather than the page taking a long time to load, or not loading at all.
Forcing traffic onto the primary instance
Unfortunately using this method has a side-effect that it degrades the scalability of the system. This is because using this method opens a transaction which will force the query to run on the primary replica. Additionally, any subsequent queries in the same request will also take place on the primary.
Putting these known expensive queries onto the primary, even with a shorter timeout, is likely to cause saturation issues
Queries 8891767326730827970
, -772977607269957440
, 4130712539834136307
all come from IssuableFinder#count_by_state
, and as this thanos query shows, they are all running exclusively on the primary.