Object Storage Direct Object Uploader
The Direct Object Uploader is a component that works with Extended CarrierWave to upload objects to Object Storage.
The bulk of the upload work is done by Workhorse, to avoid unicorn processes being locked up during the upload stage.
Workhorse will save incoming uploads directly to the object storage backend and then make an API call with a reference to the temporary file to the ruby backend.
All components which use the Extended CarrierWave should be able to use the Direct Object Uploader.
At present, only LFS Object Storage uses Extended CarrierWave, so this will be a good first component to move across to Direct Object Storage.
Before attachments, CI Artifacts/traces or other uploads can be moved to direct object storage using this component, they need to support extended carrierwave first.
API sequence diagrams
Here follow 2 diagrams explaining interactions during LFS and artifacts uploading.
Those diagrams do not have the same level of details, they can be used as starting points for implementation.
LFS must be a simple subset of Artifacts and not 2 different flows.
LFS
sequenceDiagram
participant r as user
participant w as gitlab-workhorse
participant u as gitlab-unicorn
participant os as Object Storage
activate r
r->>+w: git push with LFS object
w->>+u: authorize
Note over u,os: Presigning URLs for CarrierWave cache files
u->>+os: pre-sign PutObject
os-->>-u: presigned_url
u->>+os: pre-sign RemoveObject
os-->>-u: presigned_url
u-->>-w: presigned_urls
w->>+os: PutObject
os-->>-w: result
Note over w,os: Now we hijack the request body with object path and other metadata
w->>+u: proxy to finalize_upload
u->>+os: copy cache object to his final localtion
os-->>-u:
u-->>-r:
deactivate r
Note over w,os: now we can delete cache file
w->>+os: RemoveObject
os-->>-w:
deactivate w
Artifacts
This is the artifact upload, which is way more complex than LFS.
The diagram is not completely accurate, it's just to explain interactions and API calls
sequenceDiagram
participant r as gitlab-runner
participant w as gitlab-workhorse
participant u as gitlab-unicorn
participant s as sidekiq
participant os as Object Storage
r->>+w: upload_artifact
Alt request has Content-Length || Object Storage is Google Cloud Storage
w->>+u: authorize
u->>+os: pre-sign PutObject
os-->>-u: presigned_url
u-->>-w: presigned_url
Note over w,os: Only on GCS we can upload without Content-Length (we need to use chunked-encoding)
w->>+os: PutObject
os-->>-w: result
else
Loop every 10MB of file
Note over w: Write parts to disk
w->>+u: authorize
u->>+os: pre-sign PutObject
os-->>-u: presigned_url
u-->>-w: presigned_url
w->>+os: PutObject
os-->>-w: result
end
end
w->>+u: upload summary
Opt more than one part
u-Xs: merge_parts
end
u-->>-w:
w-->>-r: operation result
Opt more than one part
Note over s,os: Following API calls are not supported by GCS. In any case we will never upload different parts on GCS.
activate s
s->>+os: CreateMultipartUpload
os-->>-s: UploadId
Loop each part
Note over s,os: No download needed, we are referring to already uploaded parts in bucket.
s->>+os: UploadPartCopy
os-->>-s: ETag
end
s->>+os: CompleteMultipartUpload
Note over os: ObjectStorage perform the merge
os-->>-s:
Loop each part
s->>+os: RemoveObject
os-->>-s:
end
deactivate s
end
Provider support
Provider | chunked-encoding |
MultiPart/UploadPart - copy |
---|---|---|
Google CS | yes | no |
AWS S3 | no | yes |
minio | no | yes |
ceph | no | it should, once ceph!20002 will be merged - for more details see ceph#22729 |
DO spaces | no | no. They claim to support MultiPart Upload, but in their API doc there's no UploadPart - Copy
|