All we do is constructing some strings, and checking if from_addr
succeeds or not.
This can be written in a much more concise way using test_case.
Use lazy_static to provide temporary directories.
Also add some more grpc-related test cases.
Change-Id: Ia310dd01f617f7628f1e7e21304ac70da2ab3534
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10027
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
These gRPC PathInfoService tests were actually not too useful in here,
what we're mostly testing is the channel construction, so move it to
there.
Change-Id: Ic8c07558a1b28b46f863d5c39bcaa3a79cea007a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10024
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
We don't gain much from making this part of the trait, it's still up to
`tvix_store::pathinfoservice::from_addr` to do most of the construction.
Move it out of the trait and into the specific *Service impls directly.
This allows further refactorings in followup CLs.
Change-Id: I99b93ef4acd83637a2f4888a1e586f1ca96390dc
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10022
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
It seems the regex is not perfect, it choked on a single log line:
```
Nov 13 03:10:19 archeology-ec2 59nkrwmih3ywaxrgxqj79pn395fs6m17-parse-bucket-logs-continuously[11105]: Code: 117. DB::Exception: Line "d57bd890fbd1ae16625bdb8168064125e013198099b7e1b3c24878a4d03c3ab8 nix-cache [12/Nov/2023:09:13:02 +0000] xxx.xx.xxx.xxx - VB7SJVZ108DSSN67 REST.POST.OBJECT index.html "POST /index.html HTTP/1.1" 405 MethodNotAllowed 348 - 4 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - 0bFdGKbi0n9JHXU1a2hijcJwmYdc6lG2xgbdozc3wS6mlUkBE7ssrQCHIDdOLebo78o2cGbhivY= - ECDHE-RSA-AES128-GCM-SHA256 - nix-cache.s3.amazonaws.com TLSv1.2 - -" doesn't match the regexp.: (in file/uri log/2023-11-12-10-19-50-80805A702ECF65EB): (at row 5)
```
This was due to the user-agent field. The regex is now fixed.
The request itself is fun (someone trying to POST an index.html to the
bucket), and we should probably filter this on the Fastly side already,
not via IAM,
In any case, there's no point failing to parse if a single line doesn't
match the regex - we can just skip them.
For the sake of completeness, logs for that day have been reprocessed
and reuploaded.
Change-Id: Id98a7167a381cda06d150ad5118ee9e70ead277e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10034
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
I no longer work at ReadySet.
Change-Id: Idc19e2d68846551b6cd94f84594712692ebe35a9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9976
Tested-by: BuildkiteCI
Autosubmit: grfn <grfn@gws.fyi>
Reviewed-by: grfn <grfn@gws.fyi>
This packages up my keyboard firmware used for the Keychron K6 Pro.
We add a custom keymap to the `keyboards/keychron/k6_pro/ansi/rgb/
keymaps` directory, a copy from the `default` one (with a modified
`keymap.c`), and then build that as a makefile target.
`via` is *disabled*, as their keybindings take priority over keymap.c.
Luckily, only `qmk` seems to be sufficient to build it.
A simple `:flash` target/script is provided as well, it relies on some
udev rules set in the global system
(`hardware.keyboard.qmk.enable = true`).
Change-Id: I9f7a7a992e13516c32033127f94e37aec62d6b67
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10020
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This allows using awscli inside a shell.
Clickhouse AWS SSO integration still seems broken unfortunately, even
with https://github.com/ClickHouse/ClickHouse/pull/54347 included in
our bump - it seems it's coming up with another token file path than the
AWS SDK:
> SSOCredentialsProvider: Unable to open token file on path: /home/flokli/.aws/sso/cache/da39a3ee5e6b4b0d3255bfef95601890afd80709.json
This is the sha1sum of the sso_start_url, not the sha1sum of the
session-name (nixos / f2f059b8b7298f1ad52636d67cef8b719aa83bf5).
Change-Id: Ia1bdec03c4f269a7415c42c90c1f4fd3d928f770
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10012
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
* update wasm-bindgen in all Rust-wasm projects
* remove stable overlays that work again in unstable
* add texlive to stable overlays (see linked nixpkgs PR)
* bump tdlib to 1.8.18, new minimum for telega.el
Change-Id: Ib8e202de7dfbc35115fda31d0a98b6314b2adf17
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10010
Tested-by: BuildkiteCI
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
This was previously only in my Telegram channel, but it might as well
be on the blog itself.
Change-Id: I301ebeaa4dd1875f3858cee5259a5c689b950790
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10009
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This adds a `parse-bucket-logs.{service,timer}`, running once every
night at 3AM UTC, figuring out the last time it was run and parsing
bucket logs for all previous days.
It invokes the `archeology-parse-bucket-logs` script to produce
a .parquet file with the bucket logs in `s3://nix-cache-log/log/` for
that day (inside a temporary directory), then on success uploads the
produced parquet file to
`s3://nix-archeologist/nix-cache-bucket-logs/yyyy-mm-dd.parquet`.
Change-Id: Ia75ca8c43f8074fbaa34537ffdba68350c504e52
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10011
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
Clickhouse also has column compression, configurable with the
output_format_parquet_compression_method setting.
It defaults to lz4, and the previous setting got a a zstd-compressed
parquet file with lz4 data.
Set output_format_parquet_compression_method to zstd instead, and sort
by timestamp before assembling the parquet file.
The existing files were updated to the same format with the following query:
```
SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd'
```
Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
This adds a `archeology-parse-bucket-logs` CLI tool to `$PATH`.
It can be invoked like this:
```
archeology-parse-bucket-logs http://nix-cache-log.s3.amazonaws.com/log/2023-11-10-00-* bucket_logs_2023-11-10-00.pq.zstd
````
… and will produce a zstd-compressed Parquet file for (roughly) that
time range.
As the EC2 instance credentials don't give access to the logs bucket
(yet), other AWS credentials need to be provided.
This can be accomplished by using "AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY", "AWS_SESSION_TOKEN" from
"Option 2: Manually add a profile to your AWS credentials file (Short-
term credentials)" in AWS IAM Identity Center.
Processing logs for a one-hour range takes a minute or two, the
resulting zstd-compressed Parquet file is around 40-80M in size.
Processing logs for a whole day takes some 25mins, due to the sheer
amount of data (12 GB of raw log data, distributed among 450k individual
files, 20Mio log lines), but at least clickhouse isn't able to parse the
resulting parquet file back in:
> Code: 36. DB::Exception: IOError: Couldn't deserialize thrift: MaxMessageSize reached
For future automation tasks, it's probably better to run this once an
hour, and further join the data later on.
Change-Id: I6c8108c0ec17dc8d4e2dbe923175553325210a5c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10007
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Rather than picking up from clickhouse-specific config files, this gets
it to pick up from the ambient environment, which is closer to (but not
the same as) the AWS default credentials chain.
Change-Id: I9c498c231974ed345c3e3d354ec230052b4d0ff2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10006
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This behaviour might change (or not), see https://github.com/ClickHouse/
ClickHouse/pull/42003, but as of now, a `--progress` will provide some
progress.
Change-Id: I4891b6e2f96f2656858e71f88a226d24f0d45dc3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10005
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
This appears in the cache.nixos.org dataset.
Change-Id: I2eadafe8441e0132a448828026553da2dc7c12aa
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9994
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This appears in the cache.nixos.org dataset.
Change-Id: I35921f7ef148f6681081a4e371abb8c9cc98854d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9993
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Rather than having our own error type, just make decoding errors use
the same common error type.
Change-Id: Ie2c86972f3745c695253adc3214444ac0ab8db6e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9995
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This appears in the cache.nixos.org dataset.
Change-Id: I055b60b9950a1a6a36c1b0576b957e11e1d4264b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9990
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Do this upgrade whenever someone is actually interested in the children
of a directory, but that directory doesn't contain a more detailed
listing. This is much more predictable, and removes a bunch of confusing
code from the inode tracker itself.
Change-Id: Ib3a13694d6d5d22887d2d04ae429592137f39cb4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9982
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Have it return libc::ENOENT errors rather than an Option<…>.
Also avoid having to traverse inode_data multiple times, by synthesizing
the Arc<…> on our own in the insert case. In that case, the data is
quite small, so cloning it is faster than traversing a second time.
Change-Id: I7ab14bac8bb23859ed8d166a12070d4f4749b6d4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9981
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
As already established in the two previous CLs, these two pieces of code
where doing the same.
Move to a get_directory_children helper.
Change-Id: Id6876f0c34f3f40a31a22d59a2cdbfef39e2d8de
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9980
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Very similar to the previous CL
Change-Id: I0df07ddca742b7b9485d48771c8d295dc3aa7136
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9979
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Code after this big match block only cares about parent_digest and
children, so there's no need to do another inode_tracker.get in there.
This also allows removing another if let block, right after, as we don't
need to destructure parent_data anymore.
Change-Id: I68fbbe3304194670caee5a453722369afa4e77ea
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9978
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This makes it much harder to keep the read lock around for too long, and
the code a bit easier to understand.
Change-Id: I7d99c85cadd433cad444b8edd34e2c43d7eaf5a8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9977
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Not a single call site actually makes use of the Vec.
Change-Id: I6cf31073c9f443d1702a21937a0c3938c2c643b8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9988
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This small tool formats A-Term in a more readable format. It's a lossy
conversion for non-valid UTF-8 environment values.
Change-Id: I65a51054d7faf528321bc2d9fc4425180a7813f5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9970
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>