Restore missing container repositories under existing projects (part 1/2)
Context
This is related to Restore missing container repositories under ex... (&9619). The intent is to perform a data repair to restore missing container repositories under existing projects.
The high-level strategy for the data repair is described here. The actual implementation plan was detailed here and split into two parts. This issue is for the implementation of part 1/2.
Implementation
Requirements
To make this happen we'll need a few assets:
-
New temporary table with columns
project_id
(FK forprojects
),missing_count
(int
),status
(text
), andupdated_at
. For brevity, we'll refer to this table ast
. -
A limited capacity worker to perform the data repair analysis.
-
An application setting to control the max concurrency for the worker (default to
2
). -
A feature flag to enable/disable the worker execution.
Logic
The background job should do the following work:
-
"Loop over" (cron scheduling) all projects that do not appear in
t
(i.e. skip those that were already analyzed); -
For each project
P
:-
Query the container registry for the list of non-empty (at least one tag) repositories under
P
's full path. This should be done by calling the new List Sub Repositories API. -
For each repository
R
in the returned list (paginated response):-
Check if
R
exists on the Rails side (container_repositories
table); -
If it is missing, increment a counter of "missing repositories" for
P
.
-
-
Once done iterating over repositories under
P
, insert a row int
forP
.t.missing_count
should be set to the value of the above counter.Note: As we'll be looping over all
projects
(millions of rows) and inserting a record for each int
(same quantity), it can be advisable to perform a bulk insert. In this case, we can stash inserts for up toN
P
s and only then flush them to the database. However, because we'll be doing1+N
network requests to the registry for eachP
, we must ensure that we flush any stashed inserts in case an exception occurs (e.g. network timeout). Otherwise, when the worker resumes it will pickP
s that were already analyzed but not recorded due to a previous failure.
-
t.missing_count
will allow us to:
-
Identify how many missing repositories were found per project and in total. This will be used to assess the scale of the problem and fine-tune the approach for part 2/2 (the actual data repair);
-
Act as the filter for projects so that we can narrow down the data repair loop in part 2/2 to repositories whose
t.missing_count > 0
.