Skip to content

Increase DestroyAllExpiredService loop limits

drew stachon requested to merge increase-artifact-expiration-pace into master

What does this MR do?

This merge request increases the number of artifacts we expire using our newer, performant expiration code.

Initially, for operational stability, we used limits of 1k and 10k artifacts per execution. To do this, we specify that that each service execution fetches either 10 or 100 batches of 100 artifacts at a time.

Now that we're operating at the higher loop limit of 100 without any operational issues at all, I'm proposing we increase the limits to 100 and 500 respectively. The lower rate will be 10k artifacts, exactly what we're doing now, and the upper limit will be 50k artifacts.

We currently expire 10k artifacts in a bit under a minute, so 50k should still be within the 5 minute timeout that the service is constrained by. But we'll be removing artifacts at a much faster pace. If it turns out to be too aggressive in some particular way, we can turn it right back down to the way it's operating now.

The feature flag being switched yesterday:

image

and the current execution logs.

And why?

On January 4th, we estimated that there were approximately 44m artifacts directly marked as unlocked that we can remove quickly. At our current upper bound, those records will take about 21 days to remove, plus additional time for records that expire in that span of time. Multiplying our removal rate 5x is a meaningful improvement in how quickly we can move through the broader artifact clean process in &7311

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by drew stachon

Merge request reports

Loading