Skip to content

Update exception handling during package asset cleanup

David Fernandez requested to merge 363010-improve-logs into master

🌵 Context

Package files are linked to physical files on object storage and those take file space.

As such, there is a project statistic called packages_size that aggregates the total size of all package files linked to a given Project.

On an other hand, package files are not destroyed directly by destructive actions (such as a DELETE request on the package files Rest API). Instead, package files are "marked" as pending_destruction and a background cleanup job takes care of destroying them.

packages_size is updated incrementally which means that when a package file is added or removed, the statistic value is updated with a delta (adding or removing the package file size).

In #363010 (closed), we have reports that some projects have their packages_size statistic wrong. The metric is not longer equal to the total size of all package files.

After some analysis, we found that we might have an issue between transactions and swallowing exceptions.

This MR simply moves exception handling outside of a transaction block and improves logging.

🔬 What does this MR do and why?

  • In Packages::CleanupArtifactWorker,
    • move the rescue StandardError outside the transaction block.
    • log any exception to create a log trail that will help with debugging #363010 (closed).
  • Update the related spec.

Please note that Packages::CleanupArtifactWorker is also used by the Dependency Proxy to also cleanup blobs and manifests objects. Since the changes here are more about increasing observability and having an accurate rollback during exceptions, there is no user facing change. That is why we didn't include a changelog.

🖼 Screenshots or screen recordings

n / a

How to set up and validate locally

It's not easy to trigger an exception when deleting a package file, so the easiest thing to do here is to manually trigger an error.

  1. Update Packages::CleanupArtifactWorker#perform_work to:
    def perform_work
      return unless artifact
    
      begin
        artifact.transaction do
          log_metadata(artifact)
    
          artifact.destroy!
          raise 'KABOOM!'
        end
      rescue StandardError => exception
        unless artifact&.destroyed?
          artifact&.update_column(:status, :error)
        end
    
        Gitlab::ErrorTracking.log_exception(
          exception,
          class: self.class.name
        )
      end
    
      after_destroy
    end
  2. Create a new project
  3. Upload a generic file:
    curl --header "PRIVATE-TOKEN: <pat>" \
      --upload-file <path to a text file with a dummy content> \
      "http://gdk.test:8000/api/v4/projects/<project_id>/packages/generic/my_package/0.0.1/file2.txt"
  4. In a rails console mark the package file as pending destruction:
    Packages::PackageFile.last.pending_destruction!
  5. Run the cleanup job:
    Packages::CleanupPackageFileWorker.perform_with_capacity
  6. Check the exceptions logs, you should see:
    {
       "severity": "ERROR",
       "time": "2022-07-12T12:41:40.432Z",
       "correlation_id": "5cf1daad87b5c465e869db3fc5760f4f",
       "exception.class": "RuntimeError",
       "exception.message": "KABOOM!",
       "extra.class": "Packages::CleanupPackageFileWorker",
       // other fields ...
    }

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports

Loading