Commit graph

17 commits

Author SHA1 Message Date
Vincent Ambo
c2e6c0719c refactor(ops/pipelines): Generalise fetch-parent-targets script
Removes all TVL-specific values in favour of environment variables
supplied by Buildkite.

This makes it possible to reuse this script outside of TVL.

Change-Id: Ic543bc41e4c81e65ee349ad241c515231e97ab30
Reviewed-on: https://cl.tvl.fyi/c/depot/+/5005
Tested-by: BuildkiteCI
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: ezemtsov <eugene.zemtsov@gmail.com>
2022-01-19 17:22:36 +00:00
Vincent Ambo
9596c642d5 feat(ops/pipelines): Fetch parent target map for pipeline generation
Change-Id: I1c7d48fc0974549d67146a15f79ddb0b6ddfe805
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4947
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
2022-01-17 11:49:01 +00:00
Vincent Ambo
0a21da2bb4 feat(ops/pipelines): Create drvmap structure for each commit
Always create a structure that maps all targets to derivations, and
persist it as a JSON file.

This relates to some of the ideas expressed in:

https://docs.google.com/document/d/16A0a5oUxH1VoiSM8hyFyLW0WiUYpNo2e2D6FTW4BlH8/edit

The file is always uploaded to Buildkite as an artifact. This allows
for retrieving it based on the commit ID in a Buildkite GraphQL query.

By default, Buildkite stores artefacts for 6 months. Storage location
can be overridden (with custom retention) through some environment
variables, but for now at TVL the Buildkite-managed storage is fine.
See also: https://buildkite.com/docs/pipelines/artifacts

In the subsequent filtering implementation, when diffing commits
across a time-range that exceeds artefact retention time, we should
simply default to building everything.

Change-Id: I6d808461cd1c1fdd6983ba8c8ef075736d42caa7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3662
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
2022-01-17 10:26:08 +00:00
tazjin
1bd6c2a85b revert: "fix(ops/pipelines): Remove duplicated wait step"
This reverts commit 5e036ed9fc.

Reason for revert: This introduced a logic error since the remaining
step runs at the wrong point in the pipeline. Temporarily reverting to
having duplicated waits in order to clean up later.

Change-Id: Ifa6ece50dd22924f02efd7b790a5863ca1189af7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4841
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
2022-01-07 01:06:56 +00:00
Vincent Ambo
4ab061ed98 fix(ops/pipelines): Realise anchor derivation for rooting
Turns the anchor derivation into something that can actually be
built (a call creating a propagated build inputs file), and builds it.

This should fix the anchoring logic we have on canon.

Change-Id: If6a7662b82e2e396388980f65e332cf67a45b46e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4763
Tested-by: BuildkiteCI
Autosubmit: tazjin <mail@tazj.in>
Reviewed-by: sterni <sternenseemann@systemli.org>
2022-01-02 22:25:42 +00:00
Vincent Ambo
5e036ed9fc fix(ops/pipelines): Remove duplicated wait step
This now happens in //nix/buildkite instead

Change-Id: Ie9e239ee4f28ac34aa4d3279dac55d70a2cb9d86
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4764
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
2022-01-02 19:09:19 +00:00
Vincent Ambo
4ad94b9cf8 feat(ops/pipelines): annotate patchset builds with Gerrit URLs
If available, provide a link back to Gerrit on the overview page of a
build.

Uses the default style (i.e. style unset), which makes it
non-intrusive visually.

Change-Id: I4271d589d548015b75762fd0584f3958bfcc53e5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4442
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
2021-12-19 21:23:51 +00:00
Vincent Ambo
38ec27e834 fix(ops/pipelines): Chunk build pipeline into multiple uploads
The number of jobs in the depot pipeline is reaching the limits of the
Buildkite backend's ability for a single pipeline upload. Based on a
conversation with their support my understanding is that this has to
do with internal locking mechanisms at Buildkite.

To work around this, we can instead chunk the pipeline into several
smaller chunks that are uploaded serially.

This commit introduces logic to chunk the pipeline accordingly. The
chunk size chosen is 256 for now (a multiple of our number of agents,
which is useful if we can get builds from the first chunk to start
before the next ones are uploaded).

Note that this chunk size is significantly below even the current
number of targets (~460 as of this commit), but choosing a lower chunk
size might alleviate problems we've been seeing with timeouts during
pipeline uploads.

Change-Id: I77030aaf8b874c330218b78c77d15216e13b9af7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4332
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
Autosubmit: tazjin <mail@tazj.in>
2021-12-15 15:49:40 +00:00
Vincent Ambo
2b9be81ea0 refactor(ops/pipelines): Use agenix-deployed besadii secrets
I *think* this is the final step for b/161

Change-Id: Ie7a2198a045f2f1866a245884ab0f5414e205327
2021-12-10 23:14:41 +03:00
Vincent Ambo
fc14c21bb9 fix(ops/pipelines): Move to static pipeline
This step would get inserted at the wrong point in the build pipeline
otherwise, causing a dependency cycle and causing the pipeline to fail.

Change-Id: I534568eec77f74ae6c47276820f8a9e99493a3ea
2021-12-10 11:01:21 +03:00
Vincent Ambo
e4231c9816 refactor(ops/pipelines): Move 🦆 logic into static pipeline
This simplifies the fallback logic used in case of Nix evaluation
failure and makes it so that the evaluation step itself is the one
that is marked as failed in Buildkite.

This is possible because the pipeline upload command will insert new
steps at the point where it runs in the pipeline, and not later.

Change-Id: I870534c004ebc457a1602623c4e5f9c0c68e28fc
2021-12-10 07:55:34 +00:00
Vincent Ambo
6edfdd0773 refactor(ops/pipelines): Query build status from Buildkite API
Instead of manually tracking the build status through Buildkite
metadata, use the Buildkite GraphQL API in the `🦆` build
step (i.e. the one that determines the status of the entire pipeline
to be reported back to Gerrit) to fetch the number of failed jobs.

This way we have less manual state accounting in the pipeline.

The downside is that the GraphQL query embedded here is a little hard
to read.

Notes:

  * This needs an access token for Buildkite. We already have one for
    besadii which is also run by the agents, so I've given it GraphQL
    permissions and reused it.

  * I almost introduced a very rare bug here: My initial intuition was
    to simply `exit $FAILED_JOBS` - in the extremely rare case where
    `$FAILED_JOBS % 256 = 0` this would mean we would ... fail to fail
    the build :)

Change-Id: I61976b11b591d722494d3010a362b544efe2cb25
2021-11-29 23:38:24 +03:00
Vincent Ambo
099f36e5ee fix(ops/pipelines): Fix tagging of commit revisions
It seems that shell variables don't work as expected inside the
Buildkite pipeline, so usage of variables has been removed.

We also don't echo the revision anymore because of that, but it does
still appear in the log of `git push`.

Change-Id: I124e3b09af896da898f2a78715ed371651a1c5f8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3780
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
2021-11-06 00:33:23 +00:00
Vincent Ambo
4b33401a36 refactor(ops/pipelines): Move revision tagging into static pipeline
This makes the revision number available much earlier (before the rest
of the pipeline runs, while Nix eval is happening) which should only
be a few seconds after a commit to canon.

It is also more readable in this shape.

Change-Id: Iccbb17dfef6afe68f54fda41e8d10c4dc52b08c2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3775
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
2021-11-05 14:24:53 +00:00
Vincent Ambo
43269730e6 refactor(ops/pipelines): Move failure status zeroing to setup
We changed the configured pipeline in Buildkite to upload
`static-pipeline.yaml` instead of containing the steps of that
pipeline itself.

This makes it easier to test changes to builds and such, but adds
another build step with scheduling overhead etc.

However - we can work around this by killing one of the existing build
steps. There's no reason the failure status zeroing (required for
status reporting) shouldn't be part of the pipeline setup, so I've
moved it there instead and nuked that step.

This should mean that the pipeline is configurable from within the
repo, but without slowing anything down.

Change-Id: I206ecc02647de42a461e33c02879ab84daf5ed2b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/3461
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
2021-08-29 12:37:04 +00:00
Profpatsch
e7ad8143e4 fix(ops/piplines/static-pipeline): add --show-trace to nix-build
Change-Id: Ib0473f916b1436934844e620ce981f52d11e8512
Reviewed-on: https://cl.tvl.fyi/c/depot/+/2467
Tested-by: BuildkiteCI
Reviewed-by: tazjin <mail@tazj.in>
2021-01-30 09:02:09 +00:00
Vincent Ambo
73c862279a feat(ops/pipelines): Check in the static pipeline
This file represents the static pipeline which is configured in the
Buildkite web UI. Updates to this file should be applied in the admin
interface.

These steps are responsible for launching the dynamic pipeline
evaluation, or falling back to the fallback pipeline if evaluation fails.

Change-Id: I6d7dd623cde65e8c69faea729f737c9bba00c2fb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/2103
Tested-by: BuildkiteCI
Reviewed-by: glittershark <grfn@gws.fyi>
2020-11-17 22:33:11 +00:00