This fix was recommended by Buildkite and is explained in the comment.
Change-Id: I3f1c1c07cba0b417857d69c021c8af4750d645c4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4334
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
The number of jobs in the depot pipeline is reaching the limits of the
Buildkite backend's ability for a single pipeline upload. Based on a
conversation with their support my understanding is that this has to
do with internal locking mechanisms at Buildkite.
To work around this, we can instead chunk the pipeline into several
smaller chunks that are uploaded serially.
This commit introduces logic to chunk the pipeline accordingly. The
chunk size chosen is 256 for now (a multiple of our number of agents,
which is useful if we can get builds from the first chunk to start
before the next ones are uploaded).
Note that this chunk size is significantly below even the current
number of targets (~460 as of this commit), but choosing a lower chunk
size might alleviate problems we've been seeing with timeouts during
pipeline uploads.
Change-Id: I77030aaf8b874c330218b78c77d15216e13b9af7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4332
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
Autosubmit: tazjin <mail@tazj.in>
https://cl.tvl.fyi/c/depot/+/4264 did move merging config with secrets
into ExecStart=, which is tracked in an RFE upstream:
https://github.com/systemd/systemd/issues/19604#issuecomment-989279884
We didn't link to this so far, neither in the commit message, nor in a
comment.
Let's add a comment, so people know when we can undo this.
Change-Id: I7bed370b671093bb876592b4dccd562f1c256cd2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4326
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <mail@tazj.in>
Reviewed-by: grfn <grfn@gws.fyi>
We can gcroot the derivation files and drop this step, but have
elected not to do so for the moment, see cl/3436.
Change-Id: I993a1f3921e9f21e18fa260e76d3dd15ffa556bd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4327
Tested-by: BuildkiteCI
Autosubmit: sterni <sternenseemann@systemli.org>
Reviewed-by: tazjin <mail@tazj.in>
By default besadii will set the `Verified` label in Gerrit. This adds
a config option to set a different label instead if desired.
Co-authored-by: Vincent Ambo <mail@tazj.in>
Change-Id: I254159e46994e01182987ed5e5e26e27c57f46ce
Currently in NixOS configuration using agenix secrets there is no
build time validation of secret paths - things fail at runtime (system
activation).
To prevent that, this CL makes the secrets part of the tree based on
the same configuration file used by agenix itself.
This guards against:
* agenix secrets.nix definition for a non-existent file
* age.secrets value in a NixOS config for a non-existent secret
Change-Id: I5b191dcbd5b2522566ff7c38f8a988bbf7679364
... okay, this is like the 5th error related to something with this
and file paths. Need to write some validation logic.
Change-Id: I4314818aa1bc25b8cf7bd3593850d3836ccb867c
Git only allows binary names prefixed with `git-credential-` if the
path to the helper is not absolute.
Why? Who knows.
Change-Id: I216b2a621f62a73f05e21def7ec8016b29ede892
Currently this functionality is provided by a shell script stored in
/etc/secrets (which has the password value hardcoded).
This needs to happen in a separate commit from the one that changes
the pipeline to avoid breaking it (it needs to be deployed first).
Change-Id: I680754c828ccefbacfcf0d5c813a4bc19493ba4c
We already checked this in, but this commit adds the configuration for
making use of it.
There are two copies of besadii's JSON configuration with different
permissions.
Note that the buildkite-graphql-token path needs to be updated in
static-pipeline.yml, but this needs to happen in a separate commit
after deploy because the pipeline will break otherwise.
Change-Id: I6fab4bf1a2e679df7cf76521e2b53bd9dadbac62
... this option really is a pitfall! The list of programs is now the
same as in the upstream module, plus curl and jq.
Change-Id: I29edae4b2400a2724f62df9efa1dc184a8b0af5f
The DynamicUser + Group configuration does not work as planned, thus
the systemd LoadCredentials feature is used instead which makes the
file (which itself is only readable by root) available in a
memory-backed location only readable by the service.
The secret is only available to `ExecStart` commands, so units using
this feature can not be used with pre/post units and the like if those
commands need secrets.
To accommodate this, the merge of configuration files has been moved
into the service launch script, which is now the ExecStart= process.
For details take a look at https://www.freedesktop.org/software/systemd/man/systemd.exec.html#LoadCredential=ID:PATH
Change-Id: I693fe5677cc0d63c7aa485c2c7472457c5262166
It turns out the lib.mkAfter call doesn't behave as expected -
only *some* of the packages that are defaulted end up in the $PATH.
I suspect this is actually something else, e.g. these packages are
always added for some reason or another, and the option is completely
overridden every time.
Change-Id: I854c7198520d82b00e6338ed0fe653836226dc6d
Turns out that the type of this option is not concatenative and it
replaces the packages needed to run Buildkite if set.
Change-Id: I9f52572bc165bccdd8c6518cfdf7b8967f7a50d0
The irccat module uses DynamicUser, so to grant permission to it a new
group has been added for irccat.
I have some vague memory of DynamicUser + Group not behaving as one
would expect, but we'll see what happens.
Change-Id: Iab9f6a3f1a53c4133b635458ce173250cc9a3fac
This step would get inserted at the wrong point in the build pipeline
otherwise, causing a dependency cycle and causing the pipeline to fail.
Change-Id: I534568eec77f74ae6c47276820f8a9e99493a3ea
This simplifies the fallback logic used in case of Nix evaluation
failure and makes it so that the evaluation step itself is the one
that is marked as failed in Buildkite.
This is possible because the pipeline upload command will insert new
steps at the point where it runs in the pipeline, and not later.
Change-Id: I870534c004ebc457a1602623c4e5f9c0c68e28fc
Adds a systemd EnvironmentFile secret that contains the Gerrit
username & password for gerrit-queue.
Change-Id: I25acf87764c26774045138402b8a417b6813ee8f
This is not yet including the secret configuration for gerrit-queue,
and just expects the secret (gerrit username & password) to be
available in /etc/secrets.
Change-Id: Ia465ef7f3f521c70d606d7fdeba9aa83c7e1b98b
This is required for a simplification of the build pipeline (following
CL) and needs to be in a separate commit as it can not be done
atomically (merging the other commit to deploy it would immediately
break pipelines otherwise).
Change-Id: I5d8ec8f3238f79b5518d799486bf98d1d9516c43
Sets up the key set and adds an initial secret (besadii config with
tokens) to be deployed to whitby.
Change-Id: Ic07fd5e66b9e7a533013e04c35e052c2aa11f77d
Gerrit wraps RFC5322 emails in another layer of quotes when passing
them as flags, and this needs to be unquoted.
Otherwise hook invocations fail with cryptic errors.
Change-Id: Ieeb74c662873d99a4154f8cbc92da77b039cb88e
Ensure that besadii sees $0 as the correct command name, since that is
the sole mechanism by which its functionality is switched around.
There was a lingering commit that introduced this bug and hadn't been
deployed in a couple of days. Maybe time to tighten deploy cycles soon
...
Change-Id: Ie4284c0f6e5e06d71a71a3702ec7e092260e0ce5
Extracts author information from the flags passed by Gerrit and moves
them along to Buildkite. This should display the owners of builds
correctly in the UI, rather than marking everything as coming from me.
Change-Id: If9efe5553a13f0dbdb8bf3936c1d341ae5922318
This makes it possible to use besadii for any TVL-ish setup using
Gerrit and Buildkite, with the same hook functionality as for TVL.
Change-Id: I1144b68d7ec01c4c8e34f7bee4da590f2ff8c53c
Adds configuration keys and rudimentary validation for all other
besadii settings that are currently hardcoded.
This adds the config options:
* repository: Name of the repository in Gerrit.
* branch: Name of the HEAD branch in the repository.
* gerritUrl: Base URL of the Gerrit instance
* gerritUser: Username of the Gerrit user
* gerritPassword: Password of the Gerrit user
* buildkiteOrg: Name of the Buildkite organisation
* buildkiteProject: Name of the pipeline inside the Buildkite
organisation
* buildkiteToken: Auth token for Buildkite access
All of these configuration options are required.
Change-Id: Ie6b109de9cd8484a3773c6351d7fd140f39a49ed
On whitby, the besadii config will live in
/etc/secrets/besadii.json. This CL updates the call sites to pass this
config path to besadii so that it can load Sourcegraph configuration.
Change-Id: Ia139b9fa3b827e7a5f2386214390acc6fe19a75a
Initial step towards moving besadii away from hardcoded values and
onto config files. This is required because I want to reuse besadii
outside of the TVL context.
Change-Id: Id4fa7a49c5d4f876a02b202f04a421ab5ba0dcc4
Change the Nixery configuration to use the plain nixpkgs package path
instead of the depot path. AFAIK, nobody uses this to fetches depot
packages at the moment - but plenty of people fetch non-depot
packages.
This means that Nixery is cache-busted less often (previously on every
commit => every deploy).
We'll figure out another way to have a depot Nixery later.
Change-Id: Iba632333346181c3d2ce992fbab396ed0d9f86aa
Removes besadii support for the previously used 'ref-updated' hook and
instead introduces support for the 'change-merged' and
'patchset-created' hooks.
These hooks more accurately capture the semantics of when besadii
should trigger CI builds and using them will avoid problems such as
skipping 'canon' builds if chains of CLs are submitted together.
Change-Id: Ib90356c069780bf0c0250e56b927e46a5b31ce7f
Instead of manually tracking the build status through Buildkite
metadata, use the Buildkite GraphQL API in the `🦆` build
step (i.e. the one that determines the status of the entire pipeline
to be reported back to Gerrit) to fetch the number of failed jobs.
This way we have less manual state accounting in the pipeline.
The downside is that the GraphQL query embedded here is a little hard
to read.
Notes:
* This needs an access token for Buildkite. We already have one for
besadii which is also run by the agents, so I've given it GraphQL
permissions and reused it.
* I almost introduced a very rare bug here: My initial intuition was
to simply `exit $FAILED_JOBS` - in the extremely rare case where
`$FAILED_JOBS % 256 = 0` this would mean we would ... fail to fail
the build :)
Change-Id: I61976b11b591d722494d3010a362b544efe2cb25
We are changing the Gerrit hooks which invoke besadii, but this
structure will be used for both kinds.
Change-Id: Idb1cb0c640d2c42db8e7af39f3ab372a97bfef91
This is causing failures when trying to update Sourcegraph at least,
for good measure I've trimmed both.
Change-Id: I40266ee83b4e266ffe50f16bb365eb2e51952513
This function is also generally useful for readTree consumers that
have the concept of subtargets.
Change-Id: Ic7fc03380dec6953fb288763a28e50ab3624d233
Since GCP nuked us, the backups are now moving to GleSYS'
S3-compatible object storage.
This refactors the restic module to support S3-compatible storage
instead of GCP, and switches to the appropriate new secret paths.
The secrets were placed on whitby manually and I verified that the
backups work.
This fixes b/157
Change-Id: I6a9d2b0581967605ce736605a3befb44cdeae7e1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3883
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
It seems that shell variables don't work as expected inside the
Buildkite pipeline, so usage of variables has been removed.
We also don't echo the revision anymore because of that, but it does
still appear in the log of `git push`.
Change-Id: I124e3b09af896da898f2a78715ed371651a1c5f8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3780
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
This makes the revision number available much earlier (before the rest
of the pipeline runs, while Nix eval is happening) which should only
be a few seconds after a commit to canon.
It is also more readable in this shape.
Change-Id: Iccbb17dfef6afe68f54fda41e8d10c4dc52b08c2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3775
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
This automatically pushes a new ref at refs/r/$revision to Gerrit
whenever a CI run completes on canon.
Revision numbers can be fetched from Gerrit with this command:
git fetch gerrit "refs/r/*:refs/r/*"
Note that this build step requires credentials to be provisioned on
the CI runner machine.
Change-Id: I37bb14346832f891240aa47bb55affaace3d5f21
The previous hash had a weird salt length and a trailing newline.
This fixes it.
Change-Id: I1f03238181d0caad38e1f1dbc477356bc20fc32d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3689
Reviewed-by: tazjin <mail@tazj.in>
Tested-by: BuildkiteCI
The setup is explained in the comment, but TL;DR: Use the derivation
hash of static files to create permanent URLs.
Relates to b/151.
Change-Id: Ib1ca3a1a00c90a47f4bf39c29a8b4bbf5b215e7d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3664
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
This hostname can be used for hosting static assets with aggressive
caching for everything, or potentially CDNing stuff if we ever have
large things here.
Change-Id: I10afdad5eb08125d8d09108e9e099f5573362fe5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3663
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
As cschilling explained on cl/3563, there isn't actually anything in
this state that we *need* to persist. We're still keeping it in a
persistent directory on disk as this serves as an optimisation after
restarts of josh.
Change-Id: Ia88886792a5acac34508b5b8a669bd519ca033de
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3631
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
This lets each service declare their backup paths together with the
configuration for the service, which is a lot more sensible than what
we had before.
Fixes b/147
Change-Id: If76fe62639f4cc0e6fbb63a2959d584479d8f0fb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3583
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
I can never remember which is which.
Change-Id: I69b8235862b8c5b49030a74bfca25aaa113273b7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3582
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
This makes it easier to click through to a build from Gerrit after
submitting a CL.
Change-Id: Ic5c6eeb81c87bc4ea23c5c5ca25704434b081fd0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3572
Tested-by: BuildkiteCI
Reviewed-by: lukegb <lukegb@tvl.fyi>
Currently besadii only posts comments when builds succeed, but it
might be very useful to also have a link to a build when the build is
started.
This just shuffles code around. The only functional change is that the
`labels` field in the review input is marked as `omitempty`, as this
will not be needed when posting the build start comment.
Change-Id: Id4a43fad8817c9a15da02f01ab2b781d48b46978
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3571
Tested-by: BuildkiteCI
Reviewed-by: lukegb <lukegb@tvl.fyi>
Relates to b/147.
First step towards giving depot modules the ability to declare their
own backup directories by moving all restic configuration into a new
module and adding a NixOS option for inclusion/exclusion paths for
backups.
This still keeps all backup paths within the whitby config.
Change-Id: Ia96833668f1a3d02da892261153d8b02156b8ac0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3565
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Previously we served the dumb git HTTP protocol from code.tvl.fyi via
cgit. This CL disables this feature and instead runs josh in the same
location (by redirecting appropriately), but while also enabling
partial cloning of all subtrees of the depot.
For example, after this CL the following would result in an
independent clone of //nix/readTree:
git clone https://code.tvl.fyi/depot.git:/nix/readTree.git
Note that there are no josh workspaces configured at all for now,
these references are only for static depot subpaths.
Please refer to the documentation for josh for more information on
available kinds of josh filters.
Josh state is kept in a systemd state directory in /var/lib/josh and
backed up to Restic. Backing this up is necessary, as josh uses
stateful information to do things like tracking merges and rewriting
history per subtree appropriately to avoid cloned repositories ending
up in peculiar states.
Change-Id: I156f0298c2aa42e3bdbf5a0e86109070d640c56e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3563
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This one seems a little more involved:
https://docs.sourcegraph.com/admin/migration/3_31
I believe we skip that corruption issue in the previous CL though, by
simply never deploying a version with that weird broken image.
See b/144
Change-Id: I3bbf1b719d00905e08a92011ace5485467f504ef
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3525
Tested-by: BuildkiteCI
Reviewed-by: lukegb <lukegb@tvl.fyi>
We changed away from the default sourcegraph one because it didn't
support Nix, but it seems that there's been a change in the
interaction protocol.
Change-Id: I3a2691df6a87672cf83b819143f25d93d9cd6d13
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3531
Tested-by: BuildkiteCI
Reviewed-by: eta <tvl@eta.st>
Reviewed-by: sterni <sternenseemann@systemli.org>
Add the beginnings of an auto-deploy script for whitby, intended to
be (eventually) suitable for running automatically in a systemd timer.
The current iteration of the script doesn't actually do any deploying,
but instead takes as an argument a revision, creates a new git worktree
in /tmp with that revision checked out, runs a nix-diff of whitby's
system derivation in the running system and at that closure, puts an
html-rendered version of that diff in the public directory used by
deploy.tvl.fyi, and finally sends a message to IRC via irccat with a
link to that HTML page.
Refs: b/110
Change-Id: Id40525567f8845590c909568befd8d00c07a481c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3145
Tested-by: BuildkiteCI
Reviewed-by: tazjin <mail@tazj.in>
Reviewed-by: kn <klemens@posteo.de>
Add a new domain and nginx virtual host at deploys.tvl.fyi, serving out
of a static directory on whitby which is created by systemd-tmpfiles.
This will be used to serve diffs rendered by nix-diff for
pending deploys for whitby
Since this contains stateful data, it is added to the restic backups
on whitby.
Refs: b/110
Change-Id: I5869d40800bbf5fb8fb39878a857f66ff5787830
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3144
Tested-by: BuildkiteCI
Reviewed-by: tazjin <mail@tazj.in>
We changed the configured pipeline in Buildkite to upload
`static-pipeline.yaml` instead of containing the steps of that
pipeline itself.
This makes it easier to test changes to builds and such, but adds
another build step with scheduling overhead etc.
However - we can work around this by killing one of the existing build
steps. There's no reason the failure status zeroing (required for
status reporting) shouldn't be part of the pipeline setup, so I've
moved it there instead and nuked that step.
This should mean that the pipeline is configurable from within the
repo, but without slowing anything down.
Change-Id: I206ecc02647de42a461e33c02879ab84daf5ed2b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3461
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
Skip build steps if they have already been built, reducing pipelines
to the things that actually changed between builds. On canon all
targets are always built (we require this for anchoring).
Note that this is not perfect, garbage collection and competing
pipelines may affect each other.
Also note that we have some impure targets that change on every
commit.
Change-Id: Ic6bae3b6c8e1e7fd2116ec252f5089f471854ab6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3427
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
Reviewed-by: grfn <grfn@gws.fyi>
We currently evaluate every target twice -- once when the depot pipeline
is built and once when actually running the build step in question. Nix
evaluation is quite slow especially given heavy use of import from
derivation in depot, so avoiding the second evaluation is desireable.
Evaluating a derivation yields a `drv` file in the nix store which can
be passed to `nix-store --realise` in order to build it eliminating the
need to wait for evaluation. We can obtain the path to the `drv` file
while building the pipeline via `target.drvPath` and remember it for the
build later.
However we need to work around a flaw (or oversight) in Nix's dependency
tracking via string context: This is based on derivations, not output
path (because this is what evaluation deals with, likely). This is no
problem per se, but an issue is that Nix can't express a dependency on
a `drv` file without any of its output paths. This means for us that we
either have to build all output paths at evaluation time (which we don't
want, obviously) or to deal with the fact that the `drv` file we need
may be garbage collected at any moment after discarding the string
context -- then nix is unable to track the reference from the pipeline
to the `drv` file in the store.
So to prevent a race condition between the pipeline and the garbage
collector we fall back to the normal nix-build invocation as we did
before.
Change-Id: I9ef8bd233085dc6e30eba54f403ea03ac2d35748
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3426
Tested-by: BuildkiteCI
Reviewed-by: tazjin <mail@tazj.in>
This is because I'm bored of CAS gradually consuming all the RAM on Whitby.
Change-Id: Idcc14c19d99a6d3553739c5765be3faf2bdf9d84
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3233
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
Reviewed-by: tazjin <mail@tazj.in>
This is a bit of an under-documented feature, but if the "tag" field for
a gerrit review starts with the string
"autogenerated:<something>~<something-else>", only the last comment per
instance of <something> will be shown by default on the CL page (with
the rest viewable by toggling the "Show all entries" switch). The idea
behind the "<something-else>" tag is to be used for the "type" of
comment within a particular system - gerrit's documentation gives the
example of one tag for "the build is running" and another for "the build
has finished, here's the result".
Change-Id: I9199a6ed97beca1b3a51ec5d6230c6c8358ba2b3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3374
Tested-by: BuildkiteCI
Reviewed-by: tazjin <mail@tazj.in>
The dropping of `www.` is intentional, that was unused.
Change-Id: I300f82bb6e5626e2658be8fc5b5e3cf872ab7099
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3384
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>