feat(ops/pipelines): support buildkite retries
cl/12228 did enable automatic retries for some flaky tests, which generally did work, as can be seen in https://buildkite.com/tvl/depot/builds/35893 However, "🦆" still reports as failing, because we check the number of steps to be nonzero, which is not the case if retries have happened. We cannot check for the overall status of the build, as it's still "RUNNING", but instead of counting all failed steps so far, we can query all failed jobs and then filter out the ones that were already retried. Change-Id: Ib9d27587c8a8ba7970850812c4302fecdc4482e7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12233 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
This commit is contained in:
parent
98863e7312
commit
bb5d7c9678
1 changed files with 6 additions and 4 deletions
|
@ -88,10 +88,12 @@ steps:
|
|||
continue_on_failure: true
|
||||
|
||||
# Exit with success or failure depending on whether any other steps
|
||||
# failed.
|
||||
# failed (but not retried).
|
||||
#
|
||||
# This information is checked by querying the Buildkite GraphQL API
|
||||
# and fetching the count of failed steps.
|
||||
# and fetching all failed steps, then filtering out the ones that were
|
||||
# retried (retried jobs create new jobs, which would also show up in the
|
||||
# query).
|
||||
#
|
||||
# This step must be :duck: (yes, really!) because the post-command
|
||||
# hook will inspect this name.
|
||||
|
@ -109,8 +111,8 @@ steps:
|
|||
readonly FAILED_JOBS=$(curl 'https://graphql.buildkite.com/v1' \
|
||||
--silent \
|
||||
-H "Authorization: Bearer $(cat ${BUILDKITE_TOKEN_PATH})" \
|
||||
-d "{\"query\": \"query BuildStatusQuery { build(uuid: \\\"$BUILDKITE_BUILD_ID\\\") { jobs(passed: false) { count } } }\"}" | \
|
||||
jq -r '.data.build.jobs.count')
|
||||
-d "{\"query\": \"query BuildStatusQuery { build(uuid: \\\"$BUILDKITE_BUILD_ID\\\") { jobs(passed: false, first: 500 ) { edges { node { ... on JobTypeCommand { retried } } } } } }\"}" | \
|
||||
jq -r '.data.build.jobs.edges | map(select(.node.retried == false)) | length')
|
||||
|
||||
echo "$$FAILED_JOBS build jobs failed."
|
||||
|
||||
|
|
Loading…
Reference in a new issue