Refine long running migration thresholds in the Guard worker
🔥 Problem
From gitlab.com observations, we know that the pre import step is heavier than the import step. The execution of the pre import step is bigger that the import step.
The Guard job is responsible to detect long running migrations on states pre_importing
, pre_import_done
, importing
.
The problem is that we're using the same threshold for all the states, which is 10.minutes
for now.
We saw a pre import that was executed in 11 minutes and the Guard caught the migration as long running and canceled it.
🚒 Solution
We can't simply bump the threshold to a large value because we want to keep the importing
duration as low as possible (that's when the image is in read only mode). So the solution is to split the threshold in 3:
- One for
pre_importing
- One for
pre_import_done
- One for
importing
. This was should be as small as possible.
Bonus: put these thresholds as application settings so that we don't need MRs to update them.