Change digest column type from text to bytea
Problem
While working on #61 (closed) I found out how inefficient it is (in terms of disk space) to save the digest
of manifests
, manifest_lists
, layers
and manifest_configurations
using the PostgreSQL type text
.
There is also a development guideline that recommends storing hashes as binary.
Solution
We should use a bytea
type to store digest hashes, in hex format. Doing so we can save 50% in storage space (proof).
We’re currently saving digests as a string in the form of <algorithm>:<hex>
, like sha256:2a19abe16897652cdf93a45501809f31f336226fbe28055bbde7ab36018cc42a
. To be able to save the digest in hex format, we can’t save it with the algorithm prefix, we can only save the hex itself.
We should rename the existing digest
column to digest_hex
and change its type from text
tobytea
. In this column we only store the hex portion of the digest and use SHA256 as the algorithm when (de)serializing.
The models' Digest
attribute should change from string
to digest.Digest
. The later is already used everywhere else in the code, so it makes sense to use it here as well. The encoding and decoding should be handled transparently at the CRUD service layer level and we should still be able to assign to it a string in the format of <algorithm>:<hex>
(as digest.Digest
is of type string
).
Regarding the algorithm, the registry specification says:
While the algorithm does allow one to implement a wide variety of algorithms, compliant implementations should use sha256.
So we always use SHA256 as algorithm. Regardless, we should leave room for different algorithms to be used in future. If we ever have that need, we can add a new digest_algorithm
column to store that information.
Related to &2313 (closed).