Commit graph

3 commits

Author SHA1 Message Date
Florian Klink
46964f6d8f fix(users/flokli/archaeology): don't use file but column compression
Clickhouse also has column compression, configurable with the
output_format_parquet_compression_method setting.

It defaults to lz4, and the previous setting got a a zstd-compressed
parquet file with lz4 data.

Set output_format_parquet_compression_method to zstd instead, and sort
by timestamp before assembling the parquet file.

The existing files were updated to the same format with the following query:

```
SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd'
```

Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-11-11 19:49:13 +00:00
Florian Klink
7075aacc9c feat(users/flokli/archeology): show clickhouse-local progress
This behaviour might change (or not), see https://github.com/ClickHouse/
ClickHouse/pull/42003, but as of now, a `--progress` will provide some
progress.

Change-Id: I4891b6e2f96f2656858e71f88a226d24f0d45dc3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10005
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-11-11 12:24:23 +00:00
Florian Klink
5ae49c5ccf feat(users/flokli/archeology): init parse-bucket-logs
Change-Id: I096b6fed8c73ddd5a417f5183cc113356ffd98c9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9983
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00