Skip to content

Change digest column type from text to bytea

João Pereira requested to merge db-digest-as-bytea into database

Problem

While working on #61 (closed) I found out how inefficient it is (in terms of disk space) to save the digest of manifests, manifest_lists, layers and manifest_configurations using the PostgreSQL type text.

There is also a development guideline that recommends storing hashes as binary.

Solution

We should use a bytea type to store digest hashes, in hex format. Doing so we can save 50% in storage space (proof).

We’re currently saving digests as a string in the form of <algorithm>:<hex>, like sha256:2a19abe16897652cdf93a45501809f31f336226fbe28055bbde7ab36018cc42a. To be able to save the digest in hex format, we can’t save it with the algorithm prefix, we can only save the hex itself.

We should rename the existing digest column to digest_hex and change its type from text tobytea. In this column we only store the hex portion of the digest and use SHA256 as the algorithm when (de)serializing.

The models' Digest attribute should change from string to digest.Digest. The later is already used everywhere else in the code, so it makes sense to use it here as well. The encoding and decoding should be handled transparently at the CRUD service layer level and we should still be able to assign to it a string in the format of <algorithm>:<hex> (as digest.Digest is of type string).

Regarding the algorithm, the registry specification says:

While the algorithm does allow one to implement a wide variety of algorithms, compliant implementations should use sha256.

So we always use SHA256 as algorithm. Regardless, we should leave room for different algorithms to be used in future. If we ever have that need, we can add a new digest_algorithm column to store that information.

Related to &2313 (closed).

Edited by João Pereira

Merge request reports

Loading