Add gzip writer to CsvBuilder
What does this MR do and why?
Related: #414937 (comment 1461641827)
This MR extends the CsvBuilder with a Gzip class where we can write a collection to a gzipped csv file.
Reasoning: we plan to send large volume of data to ClickHouse and instead of building large INSERT
string in memory, we'll attempt to leverage their CSV FORMAT
functionality: https://clickhouse.com/docs/en/integrations/data-formats/csv-tsv
The uploading will happen via an HTTP call where the CH server receives the compressed file and ingests the data.
Example usage:
scope = Issue.order(:updated_at, :id)
iterator = Gitlab::Pagination::Keyset::Iterator.new(scope: scope)
max_records = 10
record_count = 0
enumerator = Enumerator.new do |yielder|
iterator.each_batch(of: 5) do |batch|
batch.each do |row|
yielder << row
record_count += 1
if record_count == max_records
# maybe store the keyset cursor here
end
end
break if record_count == max_records
end
end
CsvBuilder::Gzip.new(enumerator, { title: -> (row) { row.title.upcase }, id: :id }).render do |tempfile|
puts tempfile.path
puts `zcat #{tempfile.path}`
end
The iteration will be controlled outside of the CSV library, at some point we might need to stop the processing and continue later. (this means a new csv file of course).
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.