Commit graph

7 commits

Author SHA1 Message Date
Florian Klink
abe099b6ba fix(users/flokli/archeology/parse_bucket_logs): fix regex and skip
It seems the regex is not perfect, it choked on a single log line:

```
Nov 13 03:10:19 archeology-ec2 59nkrwmih3ywaxrgxqj79pn395fs6m17-parse-bucket-logs-continuously[11105]: Code: 117. DB::Exception: Line "d57bd890fbd1ae16625bdb8168064125e013198099b7e1b3c24878a4d03c3ab8 nix-cache [12/Nov/2023:09:13:02 +0000] xxx.xx.xxx.xxx - VB7SJVZ108DSSN67 REST.POST.OBJECT index.html "POST /index.html HTTP/1.1" 405 MethodNotAllowed 348 - 4 - "-" "Mozilla/5.0 (Macintosh;                 Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML,                 like Gecko) Chrome/39.0.2171.95 Safari/537.36" - 0bFdGKbi0n9JHXU1a2hijcJwmYdc6lG2xgbdozc3wS6mlUkBE7ssrQCHIDdOLebo78o2cGbhivY= - ECDHE-RSA-AES128-GCM-SHA256 - nix-cache.s3.amazonaws.com TLSv1.2 - -" doesn't match the regexp.: (in file/uri log/2023-11-12-10-19-50-80805A702ECF65EB): (at row 5)
```

This was due to the user-agent field. The regex is now fixed.

The request itself is fun (someone trying to POST an index.html to the
bucket), and we should probably filter this on the Fastly side already,
not via IAM,

In any case, there's no point failing to parse if a single line doesn't
match the regex - we can just skip them.

For the sake of completeness, logs for that day have been reprocessed
and reuploaded.

Change-Id: Id98a7167a381cda06d150ad5118ee9e70ead277e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10034
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-11-14 21:46:09 +00:00
Florian Klink
c93086848f feat(users/flokli/archeology): add AWS config to shell
This allows using awscli inside a shell.

Clickhouse AWS SSO integration still seems broken unfortunately, even
with https://github.com/ClickHouse/ClickHouse/pull/54347 included in
our bump - it seems it's coming up with another token file path than the
AWS SDK:

> SSOCredentialsProvider: Unable to open token file on path: /home/flokli/.aws/sso/cache/da39a3ee5e6b4b0d3255bfef95601890afd80709.json

This is the sha1sum of the sso_start_url, not the sha1sum of the
session-name (nixos / f2f059b8b7298f1ad52636d67cef8b719aa83bf5).

Change-Id: Ia1bdec03c4f269a7415c42c90c1f4fd3d928f770
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10012
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
2023-11-12 22:43:26 +00:00
Florian Klink
46964f6d8f fix(users/flokli/archaeology): don't use file but column compression
Clickhouse also has column compression, configurable with the
output_format_parquet_compression_method setting.

It defaults to lz4, and the previous setting got a a zstd-compressed
parquet file with lz4 data.

Set output_format_parquet_compression_method to zstd instead, and sort
by timestamp before assembling the parquet file.

The existing files were updated to the same format with the following query:

```
SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd'
```

Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-11-11 19:49:13 +00:00
Florian Klink
aaf53614b3 fix(users/flokli/archeology): make clickhouse use ambient AWS creds
Rather than picking up from clickhouse-specific config files, this gets
it to pick up from the ambient environment, which is closer to (but not
the same as) the AWS default credentials chain.

Change-Id: I9c498c231974ed345c3e3d354ec230052b4d0ff2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10006
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00
Florian Klink
7075aacc9c feat(users/flokli/archeology): show clickhouse-local progress
This behaviour might change (or not), see https://github.com/ClickHouse/
ClickHouse/pull/42003, but as of now, a `--progress` will provide some
progress.

Change-Id: I4891b6e2f96f2656858e71f88a226d24f0d45dc3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10005
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-11-11 12:24:23 +00:00
Florian Klink
314d1facd4 feat(users/flokli/archeology): add shell
Change-Id: Ic34fefdaac82fd1e23d248f2e5fec282384b8fc0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9984
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00
Florian Klink
5ae49c5ccf feat(users/flokli/archeology): init parse-bucket-logs
Change-Id: I096b6fed8c73ddd5a417f5183cc113356ffd98c9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9983
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00