Commit graph

36 commits

Author SHA1 Message Date
Florian Klink
b12ea8d786 fix(users/flokli/nixos-tvix-cache): use escapeSystemdExecArgs
escapeSystemdExecArgs is the function that should be used to escape
Exec* service lines.

See a72b1b3c65/nixos/lib/utils.nix (L122-L128)

Reported-By: matrix:u/lukas:luflosi.de
Change-Id: Ia3a628db221a30310154c060a6e29ccb2c94c352
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12930
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Ilan Joselevich <personal@ilanjoselevich.com>
2024-12-30 10:30:04 +00:00
Florian Klink
4dce88e997 feat(users/flokli/nixos-tvix-cache): increase scraping interval
This provides more resolution in the dashboards.

Change-Id: I06e7260250e58fe62bbda41b67d84e0c5cacfbd2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12927
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Reviewed-by: Jörg Thalheim <joerg@thalheim.io>
Autosubmit: flokli <flokli@flokli.de>
2024-12-27 18:14:40 +00:00
Florian Klink
7dfe147c4d fix(users/flokli/nixos-tvix-cache): bump trace size limit
We produce traces bigger than what tempo accepts by default, causing
traces to be rejected with TRACE_TOO_LARGE and to then be incomplete.

Bump the max size.

Change-Id: I8caa245d14db683853485ee5625c9662ea51ce29
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12926
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2024-12-27 16:00:58 +00:00
Florian Klink
9fdf6b3cd1 docs(users/flokli/nixos-tvix-cache): don't use mkForce
There's no need to mkForce anything in that list.
Nix reads nix-cache-info to determine priority.

Change-Id: I08797ed25348f52f5696f80558d206b73d20dead
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12925
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2024-12-27 15:50:23 +00:00
Florian Klink
b65d40261b fix(users/flokli/nixos-tvix-cache): drop private bind mounts
The mount didn't get applied for some reason, explicitly configure the
path.

Change-Id: Ie41eb3c1d5f6416493211fb77709aaeecf61edf0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12924
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2024-12-27 15:50:23 +00:00
Florian Klink
95e8a0a801 fix(users/flokli/nixos-tvix-cache): set timeInterval for metrics DS
The data source defaults to 15s of time interval. As alloy scrapes every
60s only, this causes watching dashboards with a smaller time range to
just not show any data, like the CPU graph being empty for a time range
< last 12h.

Fix by setting time interval to 60s.

Co-Authored-By: WilliButz <willibutz@posteo.de>
Change-Id: Ife306b2fda968654cad818a82f99e0011819be3c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12923
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-12-27 13:37:36 +00:00
Florian Klink
b36f2e3a32 fix(users/flokli/nixos-tvix-cache): BindPaths is serviceConfig
Putting this into UnitConfig won't work, so the bind mount didn't
happen, causing the blobs to be created on the SSD too.

This was already deployed and the data migrated over.

Change-Id: Ie30c8f458cdad8b764817a48a048ec3ca3c18e64
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12922
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-12-27 12:54:29 +00:00
Florian Klink
b2a2225b8b feat(users/flokli/nixos-tvix-cache): put metadata on SSD
Move the Directory and PathInfo storage to the SSD, and only bind-mount
the blob storage from the HDD.

This should improve IO for random access.

Change-Id: Icf9408a879dec8a52541953682ffac25b31e73d3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12921
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-12-27 11:58:01 +00:00
zimbatm
88e65b5c33 fix(users/flokli/nixos-tvix-cache): bump nginx read timeout
This is a bandaid until we have a proper fix.

Change-Id: Id9f0bab5f309a7796c1efee23071013618c6dd12
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12896
Autosubmit: Jonas Chevalier <zimbatm@zimbatm.com>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2024-12-20 17:04:40 +00:00
Florian Klink
856886f01d chore(users/flokli/nixos-tvix-cache): switch to Mimir
VictoriaMetrics doesn't seem to "normalize" timeseries and label names,
which causes breakage in Grafana Dashboards querying label values.

Reported in VictoriaMetrics/VictoriaMetrics#7744.

Change-Id: I3397c4fd5911c9a3503d058c77c26e0db9300f36
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12867
Tested-by: BuildkiteCI
Reviewed-by: Jonas Chevalier <zimbatm@zimbatm.com>
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: flokli <flokli@flokli.de>
2024-12-04 22:37:23 +00:00
Florian Klink
ae76eaa761 feat(users/flokli/nixos-tvix-cache): re-enable http2
With nar-bridge supporting zstd content-encoding, we don't need the
nginx zstd module and can re-enable http2.

We also need to propagate the Accept-Encoding sent by the client to
nar-bridge, so it actually knows it can send zstd.

This reduces the time measured in the microbenchmark from ~13s to this:

```
hyperfine 'rm -rf /tmp/cache; nix copy --from https://nixos.tvix.store/ --to "file:///tmp/cache?compression=none" /nix/store/jlkypcf54nrh4n6r0l62ryx93z752hb2-firefox-132.0'
Benchmark 1: rm -rf /tmp/cache; nix copy --from https://nixos.tvix.store/ --to "file:///tmp/cache?compression=none" /nix/store/jlkypcf54nrh4n6r0l62ryx93z752hb2-firefox-132.0
  Time (mean ± σ):      4.880 s ±  0.207 s    [User: 4.661 s, System: 2.377 s]
  Range (min … max):    4.700 s …  5.274 s    10 runs
```

Change-Id: Id092307423636163ae95ef87ec8fa558b83ce0bb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12835
Reviewed-by: Jörg Thalheim <joerg@thalheim.io>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Ilan Joselevich <personal@ilanjoselevich.com>
2024-11-24 18:34:04 +00:00
Florian Klink
cb85e87376 refactor(users/flokli/nixos-tvix-cache): absorb otlpcollector into alloy
We don't need a separate instance of opentelemetry-collector, alloy can
also do this job for us.

Change-Id: I1b671ba57d70b080f7db112e1afcfe2e0cbdd13e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12829
Reviewed-by: flokli <flokli@flokli.de>
Reviewed-by: Jonas Chevalier <zimbatm@zimbatm.com>
Tested-by: BuildkiteCI
2024-11-23 09:44:36 +00:00
Florian Klink
09b343864a fix(users/flokli/nixos-tvix-cache): bump max_traces_per_user
These are quite bursty, and I've seen messages about getting rate
limited.

Change-Id: I73058140957cb5718971fa432c003c2d1b0305e3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12828
Tested-by: BuildkiteCI
Reviewed-by: Ilan Joselevich <personal@ilanjoselevich.com>
2024-11-23 09:44:36 +00:00
zimbatm
e58e6f6e16 feat(users/flokli/nixos/nixos-tvix-cache): also collect system metrics
Use grafana-alloy to collect system metrics.

Change-Id: I592e64ca722701d4f12e69a531a434b54954955a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12827
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-11-23 09:41:53 +00:00
Florian Klink
6f1d059c7d feat(users/flokli/nixos/nixos-tvix-cache): collect metrics
This enables routing of metrics to an instance of VictoriaMetrics, and
configures opentelemetry-collector to route metrics there.

Change-Id: If765191a4cc70ddcaad821d45132b96a10a12148
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12812
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Jonas Chevalier <zimbatm@zimbatm.com>
2024-11-23 09:40:21 +00:00
Florian Klink
52a8e47ac1 feat(users/flokli/nixos/nixos-tvix-cache): init
This is a fetch-through mirror of cache.nixos.org, hosted by NumTide.

The current machine is a SX65 Hetzner dedicated server with 4x22TB SATA disks,
and 2x1TB NVMe disks.

The goals of this machine:

 - Exercise tvix-store and nar-bridge code
 - Collect usage metrics (see https://nixos.tvix.store/grafana)
 - Identify bottlenecks
 - Replace cache.nixos.org?

Be however aware that there's zero availability guarantees. Since Tvix doesn't
support garbage collection yet, we either will delete data or order a bigger
box.

Change-Id: Id24baa18cae1629a06caaa059c0c75d4a01659d5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12811
Tested-by: BuildkiteCI
Reviewed-by: Jonas Chevalier <zimbatm@zimbatm.com>
Reviewed-by: flokli <flokli@flokli.de>
2024-11-23 09:40:21 +00:00
Florian Klink
923ed3532d fix(users/flokli/nixos): add source_up
This got lost somehow, but is necessary to keep `mg` in `$PATH`.

Change-Id: I2100d68225284bfe825bcc5ab01628891ebd09a3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12810
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: flokli <flokli@flokli.de>
2024-11-22 12:33:34 +00:00
Florian Klink
fef2fdcf8e feat(users/flokli/nixos): add awscli to shell
Change-Id: I665ba215de12fad58b91604700c09a87444ac3ac
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12381
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: flokli <flokli@flokli.de>
2024-08-28 16:02:41 +00:00
sterni
6e2c143756 chore(3p/sources): Bump channels & overlays
- agenix has not been updated
  (https://github.com/ryantm/agenix/pull/241).

- Re-enable now fixed dependency of flokli/archeology-ec2.

Change-Id: I4e0399e5b5dbaf5e504076e029013f165dd4d191
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11363
Autosubmit: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2024-04-06 18:04:14 +00:00
Florian Klink
f7cdbbef45 chore(users/flokli/nixos): restore archeology-ec2 build
Temporarily remove parquet-tools, so this builds again.

Can be re-added once https://github.com/NixOS/nixpkgs/pull/301032 landed
upstream.

Change-Id: Ie74f014eb8158d5f529a5f1c55788a4edc5c805d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11347
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
2024-04-03 11:19:34 +00:00
Florian Klink
1dad7a144d chore(users/flokli/nixos): drop archeology nixos config
The machine at Hetzner is gone, only the one in EC2 remains.

Change-Id: Ia7266d56ef1174267b95086c51e6d80015c2f905
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10711
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: flokli <flokli@flokli.de>
2024-01-30 15:04:40 +00:00
edef
1091b1e623 feat(users/flokli/archeology): install parquet-tools
Change-Id: I64cd83fbce920eeabace5b49ef623c033d98a8be
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10000
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2023-11-14 16:17:33 +00:00
edef
f467f0d801 feat(users/flokli/archeology): install DuckDB
Change-Id: I76bc20711c7e59d184659db134ba224cfcd7f6cb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9999
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-11-14 16:17:33 +00:00
edef
423ab20f43 feat(users/flokli/archeology): turn on task_delayacct
More ClickHouse perf stats ^_^

Change-Id: I4f6882b1a6c1ebfed9a430e62ca634a141cd1cf1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9998
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2023-11-13 14:53:41 +00:00
Florian Klink
e4adca0880 feat(users/flokli/nixos/archeology-ec2): automate bucket log parsing
This adds a `parse-bucket-logs.{service,timer}`, running once every
night at 3AM UTC, figuring out the last time it was run and parsing
bucket logs for all previous days.

It invokes the `archeology-parse-bucket-logs` script to produce
a .parquet file with the bucket logs in `s3://nix-cache-log/log/` for
that day (inside a temporary directory), then on success uploads the
produced parquet file to
`s3://nix-archeologist/nix-cache-bucket-logs/yyyy-mm-dd.parquet`.

Change-Id: Ia75ca8c43f8074fbaa34537ffdba68350c504e52
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10011
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
2023-11-12 16:46:06 +00:00
Florian Klink
281cb93ba8 feat(users/flokli/nixos/archeology-ec2): add parse-bucket-logs
This adds a `archeology-parse-bucket-logs` CLI tool to `$PATH`.

It can be invoked like this:

```
archeology-parse-bucket-logs http://nix-cache-log.s3.amazonaws.com/log/2023-11-10-00-* bucket_logs_2023-11-10-00.pq.zstd
````

… and will produce a zstd-compressed Parquet file for (roughly) that
time range.

As the EC2 instance credentials don't give access to the logs bucket
(yet), other AWS credentials need to be provided.

This can be accomplished by using "AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY", "AWS_SESSION_TOKEN" from
"Option 2: Manually add a profile to your AWS credentials file (Short-
term credentials)" in AWS IAM Identity Center.

Processing logs for a one-hour range takes a minute or two, the
resulting zstd-compressed Parquet file is around 40-80M in size.

Processing logs for a whole day takes some 25mins, due to the sheer
amount of data (12 GB of raw log data, distributed among 450k individual
files, 20Mio log lines), but at least clickhouse isn't able to parse the
resulting parquet file back in:

> Code: 36. DB::Exception: IOError: Couldn't deserialize thrift: MaxMessageSize reached

For future automation tasks, it's probably better to run this once an
hour, and further join the data later on.

Change-Id: I6c8108c0ec17dc8d4e2dbe923175553325210a5c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10007
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-11-11 12:24:23 +00:00
Florian Klink
c37c9cc770 feat(users/flokli/nixos): add direnv support
Expose `deps` separately, add a direnv with PATH_add for it to bring
tooling into $PATH.

Change-Id: I432cd2b082cad89e08bef78dc4653e10e137cd6b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9842
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2023-10-30 14:32:18 +00:00
Florian Klink
5b03bebce8 feat(users/flokli/nixos): use lazydeps for shell
Avoid having to re-enter the shell whenever the config is changed.

Change-Id: Ib9f6bb4075e29acaeb4863d64c017695ca85b60b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9841
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-30 12:15:36 +00:00
Florian Klink
5d2789d1ad feat(users/flokli/archeology): add awscli, htop, kitty terminfo
Change-Id: Ib7ae1871a5d0b16a68c79b68e7e79fd302da79bd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9840
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-30 12:15:36 +00:00
Florian Klink
f9323d5aa5 feat(users/flokli): use nix-copy-closure instead of nix copy
nix copy seems to stall on the EC2 box for unknown reasons.

Change-Id: I30639a52758814968d3b54d716522fb88db80cfe
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9839
Reviewed-by: flokli <flokli@flokli.de>
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2023-10-30 12:15:35 +00:00
Florian Klink
d83b574be7 feat(users/flokli): expose readTree targets to CI
Change-Id: I3ea801b47267f4c985c2ab5cb1b79b2659894307
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9838
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2023-10-30 12:15:35 +00:00
Florian Klink
9e2f1f4583 refactor(users/flokli): move common stuff to archeology profile
Change-Id: I8470c0a2416c0c397e009affb44f8c7a852cd526
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9837
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2023-10-30 12:15:35 +00:00
Florian Klink
71fa4110fa feat(users/flokli): add archeology-ec2
This add the EC2 box config to the repo.

Change-Id: Id7a888a2cfbf1454cd9f9465018df377e14b4e9f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9836
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-30 09:31:49 +00:00
Florian Klink
2513ddd2b7 feat(users/flokli/nixos): add deploy script
This adds a deploy-archeology script.

I tried getting morph to work first, but passing it a
depot.ops.nixos.nixosFor seems to be very hard - the NixOS module system
doesn't like the arguments it's called with.

Replace morph with a 3 line bash script, which assumes your ssh_config
contains config for an `archeology` host.

Change-Id: I2bf694c60ded39c201efbbb899f3b5512aa4d0f2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9835
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: edef <edef@edef.eu>
2023-10-29 17:49:25 +00:00
Florian Klink
85cd7d64fe refactor(users/flokli): drop some optional args and complexity
Change-Id: Ifdcac829ede4ec469a7ce1b608e78bae11f2766b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9834
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: edef <edef@edef.eu>
2023-10-29 17:49:24 +00:00
Florian Klink
48b6242313 feat(users/flokli/nixos): init archeology
Change-Id: Ic31cb8030179ff37b1cc3d3d9241e2582cfe3e5e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9833
Tested-by: BuildkiteCI
Reviewed-by: edef <edef@edef.eu>
Autosubmit: flokli <flokli@flokli.de>
2023-10-29 16:31:35 +00:00