Commit graph

4 commits

Author SHA1 Message Date
Florian Klink
e4adca0880 feat(users/flokli/nixos/archeology-ec2): automate bucket log parsing
This adds a `parse-bucket-logs.{service,timer}`, running once every
night at 3AM UTC, figuring out the last time it was run and parsing
bucket logs for all previous days.

It invokes the `archeology-parse-bucket-logs` script to produce
a .parquet file with the bucket logs in `s3://nix-cache-log/log/` for
that day (inside a temporary directory), then on success uploads the
produced parquet file to
`s3://nix-archeologist/nix-cache-bucket-logs/yyyy-mm-dd.parquet`.

Change-Id: Ia75ca8c43f8074fbaa34537ffdba68350c504e52
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10011
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
2023-11-12 16:46:06 +00:00
Florian Klink
281cb93ba8 feat(users/flokli/nixos/archeology-ec2): add parse-bucket-logs
This adds a `archeology-parse-bucket-logs` CLI tool to `$PATH`.

It can be invoked like this:

```
archeology-parse-bucket-logs http://nix-cache-log.s3.amazonaws.com/log/2023-11-10-00-* bucket_logs_2023-11-10-00.pq.zstd
````

… and will produce a zstd-compressed Parquet file for (roughly) that
time range.

As the EC2 instance credentials don't give access to the logs bucket
(yet), other AWS credentials need to be provided.

This can be accomplished by using "AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY", "AWS_SESSION_TOKEN" from
"Option 2: Manually add a profile to your AWS credentials file (Short-
term credentials)" in AWS IAM Identity Center.

Processing logs for a one-hour range takes a minute or two, the
resulting zstd-compressed Parquet file is around 40-80M in size.

Processing logs for a whole day takes some 25mins, due to the sheer
amount of data (12 GB of raw log data, distributed among 450k individual
files, 20Mio log lines), but at least clickhouse isn't able to parse the
resulting parquet file back in:

> Code: 36. DB::Exception: IOError: Couldn't deserialize thrift: MaxMessageSize reached

For future automation tasks, it's probably better to run this once an
hour, and further join the data later on.

Change-Id: I6c8108c0ec17dc8d4e2dbe923175553325210a5c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10007
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00
Florian Klink
9e2f1f4583 refactor(users/flokli): move common stuff to archeology profile
Change-Id: I8470c0a2416c0c397e009affb44f8c7a852cd526
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9837
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2023-10-30 12:15:35 +00:00
Florian Klink
71fa4110fa feat(users/flokli): add archeology-ec2
This add the EC2 box config to the repo.

Change-Id: Id7a888a2cfbf1454cd9f9465018df377e14b4e9f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9836
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-30 09:31:49 +00:00