This adds a Directory service using
https://cloud.google.com/bigtable/docs/ as a K/V store.
Directory (closures) are put in individual keys.
We don't do any bucketed upload of directory closures (yet), as castore/
fs does query individually, does not request recursively (and buffers).
This will be addressed by store composition at some point.
Change-Id: I7fada45bf386a78b7ec93be38c5f03879a2a6e22
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11212
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
This (and more) should now be covered by the generic testsuite
(in crate::blobservice::tests).
Change-Id: Ib3afc4f19f7e37a561b7398d43663dc941971f5c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11365
Tested-by: BuildkiteCI
Reviewed-by: picnoir picnoir <picnoir@alternativebit.fr>
We need to define behaviours and add tests for these.
Change-Id: Id5825fafbf47897d8de42503ea6006eb131b1082
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11281
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This drops pretty much all of castore/utils.rs.
There were only two things left in there, both a bit messy and only used
for tests:
Some `gen_*_service()` helper functions. These can be expressed by
`from_addr("memory://")`.
The other thing was some plumbing code to test the gRPC layer, by
exposing a in-memory implementation via gRPC, and then connecting to
that channel via a gRPC client again.
Previous CLs moved the connection setup code to
{directory,blob}service::tests::utils, close to where we exercise them,
the new rstest-based tests.
The tests interacting directly on the gRPC types are removed, all
scenarios that were in there show now be covered through the rstest ones
on the trait level.
Change-Id: I450ccccf983b4c62145a25d81c36a40846664814
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11223
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
This introduces rstest-based tests. We also add fixtures for creating
some BlobService / DirectoryService out of thin air.
To test a PathInfoService, we don't really care too much about its
internal storage - ensuring they work is up to the castore tests.
Change-Id: Ia62af076ef9c9fbfcf8b020a781454ad299d972e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11272
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This allows us to use containers around BlobServices as BlobServices too.
Change-Id: I3c7feb074f42b4e07c550fb8dfa63cf81d448ab5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11249
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This creates test scenarios (using the DirectoryService trait) that we
want all DirectoryService implementations to pass.
Some of these tests are ported from proto::tests::grpc_directoryservice,
which tested this on the gRPC interface (rather than the trait),
some others ensure certain behaviour for which we only recently
introduced general checking logic (through ClosureValidator).
We also borrow some code related to setting up a gRPC DirectoryService
client (connecting to a server exposing a in-memory DiretoryService)
from castore::utils, this will be deleted once it's all ported over.
Change-Id: I6810215a76101f908e2aaecafa803c70d85bc552
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11247
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
This allows us to use containers around DirectoryServices as DirectoryServices too.
Change-Id: I56cca27b3212858db8b12b874df0e567dd868711
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11248
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This uses DirectoryClosureValidator for validation and the sled batch
API to insert multiple directories at once.
Change-Id: I2d6dc513ccbc02e638f8d22173da5463e73182ee
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11222
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This greatly simplifies the code in this function, replacing it with a
much better tested (and more capable!) version of the validation logic.
It also enables the gRPC server frontend to make use of the
DirectoryPutter interface. While this might not be too visible in terms
of latency thanks to gRPC streams bursting, it also enables further
optimizations later (such as bucketing of directory closures).
Change-Id: I21f805aa72377dd5266de3b525905d9f445337d6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11221
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This simplifies a bunch of code, and gets rid of some TODOs.
Also, move it out of castore/utils, and into its own file.
Change-Id: Ie63e05a6cdfb2a73e878cf7107f9172aed1cdf13
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11224
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
This can be used to validate a Directory closure (connected DAG of
Directories), and their insertion order.
Directories need to be inserted (via `add`), in an order from the leaves
to the root. During insertion, we validate as much as we can at that
time:
- individual validation of Directory messages
- validation of insertion order (no upload of not-yet-known Directories)
- validation of size fields of referred Directories
Internally it keeps all received Directories (and their sizes) in a HashMap,
keyed by digest.
Once all Directories have been inserted, a drain() function can be
called to get a (deduplicated and) validated list of directories, in
from-leaves-to-root order (to be stored somewhere).
While assembling that list, a check for graph connectivity is performed
too, to ensure there's no separate components being sent (and only one
root).
It adds a test suite for these cases, which is much nicer to test than
where we previously had these checks (only in the gRPC server wrapper).
Followup CLs will move the existing putters to use this.
Change-Id: Ie88c832924c170a24626e9e3e91d868497b5d7a4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11220
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
We need to ensure the Directories are successfully uploaded before doing
any testing with them.
Change-Id: Iafa8deb86b3d5eb302ebfba3ced34385f67a7229
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11244
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
This allows these messages to be put in HashSets.
Change-Id: Ia58094cafe53eb624578821d3d8d969c5d21a1d7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11219
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Log the entire span with "trace" level, not just its `ret` level.
The level of the error value event defaults to ERROR, so we don't loose
these.
B3Digest implements Debug and Display the same way, so we can omit the
`(Display)` part in `ret(Display)` for them.
Change-Id: Id00d123a5798e5bdc9820dd97ae2b4d4eb5455f0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11218
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This is no public API to construct this, there's exactly one caller,
and it's perfectly fine to directly populate the struct there.
Change-Id: Idae43a0162ee9bc687d21c550e0c9df33f12d263
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11217
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This makes it easier to see what's going wrong when uploading multiple
Directories.
Change-Id: Ieb71424b9761777c5f719b2f365962644de82baf
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11209
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
This functionality is provided by the object store backend too
(using `objectstore+file://$some_path`).
This backend also supports content-defined chunking and compresses
chunks with zstd.
Change-Id: I5968c713112c400d23897c59db06b6c713c9d8cb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11205
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This controls whether tvix-castore has support for various cloud
backends or not.
Use this to control the set of feature flags for the object_store
backend, and only enable the aws, azure and gcp ones if it's set.
In the future this can be used to enable/disable other cloud backends
too.
Without feature flags, `object_store` already supports the `InMemory`
and `LocalFilesystem` backends, and we also want to unconditionally
enable the `http` one. Make sure at least the construction of these
services is covered in the tests.
Similarly, the tvix-store crate, which provides the tvix-store CLI has a
`cloud` feature flag too (defaulting to enabled).
Change-Id: I9fb9c87b740e7dc83f8ff7a0862905d036d513f2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11204
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
The rust trait was missing to document the order of the elements in the
stream. Document that, and also the reasoning behind this.
Change-Id: I27ef0b2020082783fc41c2015233175e2b8e716d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11203
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This will allow feature-flagging some of the backends.
Change-Id: Iddbdb89d3cf9c966a2c25b06b03e6917b284cae5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11201
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
This will allow feature-flagging some of the backends.
Change-Id: Idffbf8b3fd154f5a3d938225c3871feffea8ff8c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11200
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
We don't need to use BASE64 here on our own, B3Digest has a Display
impl.
This will also make sure the `b3:` digest is present in field values.
Change-Id: I0ce6ee0f7e7e99fb9b16872953a1b742e99be291
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11192
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Have derive_{blob,chunk}_path emit trace-level events for both the
values they're called with, as well as the return value.
With RUST_LOG in place, it doesn't get lost in other unrelated noise.
Change-Id: Id2451e3657324eff482841eb26a22d19e22bde30
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11136
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Whenever this encounters an open_read(), it'll first check for more
granular chunking. If there's more granular chunking data available, a
ChunkedReader is constructed (which supports seeking backwards).
This currently is still a bit stupid, and doesn't compose, as
`ChunkedReader` uses `self` as the `BlobService` to ask for the
individual chunks.
In store composition future, we might want to compose this differently,
essentially constructing `ChunkedReader` with another `BlobService`
representing the entire hierarchy, so there's a chance to locally cache
things, and do less requests.
Change-Id: I22e0df4d6245f666d083b4f0b7114d3ac41d1dce
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11185
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Use the same format as Display, b3: followed by the base64
representation. This makes the debug implementation of everything
containing a b3 digest much nicer to read.
Change-Id: I3ca3154d0b6fb07781c8f9c83ece3ff1a6957902
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11181
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
This bumps tonic and surrounding crates to 0.11.x.
We added support for tonic 0.11.x into tokio-listener
(https://github.com/vi/tokio-listener/pull/4), so that's bumped as well.
Change-Id: Icfade5894403228299836fefb21b2f9ae59dbebb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11156
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
`build.rs` emits rerun-if-changed statements for all proto files, as
well as all include paths we pass it.
Unfortunately, due to protobufs include path rules, we need to specify
the path to the depot root itself as an include path, at least when
building impurely with `cargo`. This causes cargo to essentially always
rebuild, as it also puts its own temporary files in there.
Unfortunately, tonic-build does not chase down to individual .proto
files that are included.
Disable emitting these `rerun-if-changed` statements for now.
This could cause cargo to not rebuild protos every time, causing stale
data until the next local `cargo clean`, but considering the protos
change not that frequently, and it'll immediately surface if trying to
build via Nix (either locally or in CI), it's a good-enough compromise.
Change-Id: Ifd279a2216222ef3fc0e70c5a2fe6f87997f562e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11157
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
The object_store crate supports a ton of different stores, with different schemes.
For now, use a objectstore+ scheme prefix to enable these.
Change-Id: I946f76e32a0fb0867ef59060217894cda5b959b9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11080
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
This uses the `object_store` crate to expose a tvix-castore BlobService
backed by object storage.
It's using FastCDC to chunk blobs into smaller chunks when writing to
it.
These are exposed at the .chunks() method.
Change-Id: I2858c403d4d6490cdca73ebef03c26290b2b3c8e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11076
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Reviewed-by: Brian Olsen <me@griff.name>
This only contains the outer metadata wrapping, and that's not too interesting:
> Request { metadata: MetadataMap { headers: {"content-type":
> "application/grpc", "user-agent": "grpc-go/1.60.1", "te": "trailers",
> "grpc-accept-encoding": "gzip"} }, message: Streaming, extensions:
> Extensions }
Drop these fields for now, and rely on the underlying implementations to
add instrumentation for the application-specific fields.
Clean up the error logging a bit.
Change-Id: Ife1090ed411766a61e1fa60fd4c9570f38de1e98
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11102
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
This only contains the outer metadata wrapping, and that's not too interesting:
> Request { metadata: MetadataMap { headers: {"content-type":
> "application/grpc", "user-agent": "grpc-go/1.60.1", "te": "trailers",
> "grpc-accept-encoding": "gzip"} }, message: Streaming, extensions:
> Extensions }
Drop these fields for now, and rely on the underlying implementations to
add instrumentation for the application-specific fields.
Log errors in some places where we didn't so far.
Change-Id: Ia68d6c526987d3716be62a0809195401cf28512b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11101
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
For everything using reqwest here during test cases, we also need to
set SSL_CERT_FILE.
Change-Id: If8aeda65f3d75cb9ac5c9bc64e37a0cb7dffc17c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11092
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
If there's an unexpected test failure, print it out, rather than just
saying something is false even though it should be true.
Use .expect() for this, which displays the error if it failed.
We can't use expect_err(), as our stores are not display'able, so use an
assertion with a message there.
Change-Id: I2d88861d979d107edc0717fbdb3cdac9a6bfc5e4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11091
Tested-by: BuildkiteCI
Reviewed-by: Brian Olsen <me@griff.name>
Reviewed-by: flokli <flokli@flokli.de>
HashingReader wraps an existing AsyncRead, and allows querying for the
digest of all data read "through" it.
The hash function is configurable by type parameter, and we define
B3HashingReader.
Change-Id: Ic08142077566fc08836662218f5ec8c3aff80be5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11087
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
This allows calling .into() to get a B3Digest.
Change-Id: I6e63b496413cd00d84acfcd15c7de0f64c79721f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11086
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
The public-consumable thing here is ChunkedReader, not ChunkedBlob.
ChunkedBlob is a helper that can be used to get a new AsyncRead, but
not AsyncSeek. It is used internally by ChunkedReader whenever the
client seeks.
Make this more obvious, by extending the documentation, and putting
ChunkedReader at the top of this file.
Also make ChunkedBlob and its methods private, and give ChunkedReader a
more useful constructor (from_chunks, instead of from_chunked_blob).
Change-Id: I2399867591df923faa73927b924e7c116ad98dc0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11079
Tested-by: BuildkiteCI
Reviewed-by: Brian Olsen <me@griff.name>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Some implementations of DirectoryService might not allow retrieval of
intermediate Directory nodes, that are not at the "root".
Think about an object store implementation. The client is doing a
get_recursive anyways to reduce the number of roundtrips.
By documenting the fact we don't need to support looking up intermediate
Directory messages, we can just batch all directories into the same
object, keyed by the root.
Change-Id: I019d720186d03c4125cec9191e93d20586a20963
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10988
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This otherwise gets a bit spammy.
Change-Id: I288350a600d79a394c239f253424ad55bc3cefc5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10954
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
These provide seekable access into a Blob for which we have more
granular chunking information.
There's no support for verified streaming in here yet, this simply
produces a stream of readers for each chunk, skipping irrelevant chunks
and data from the first chunk at the beginning.
A seek simply does produce a new reader using the same process.
Change-Id: I37f76b752adce027586770475435f3990a6dee0b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10731
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
docs/verified-streaming.md explained how CDC and verified streaming can
work together, but didn't really highlight enough how chunking in
general also helps with seeking.
In addition, a lot of the thoughts w.r.t. the BlobStore protocol, both
gRPC and Rust traits, as well as why there's no support for seeking
directly in gRPC, as well as how clients should behave w.r.t. chunked
fetching was missing, or mixed together with the verified streaming
bits.
While there is no verified streaming version yet, a chunked one is
coming soon, and documenting this a bit better is gonna make it easier
to understand, as well as provide some lookout on where this is heading.
Change-Id: Ib11b8ccf2ef82f9f3a43b36103df0ad64a9b68ce
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10733
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
The Stat() method was just always signalling no granular chunks are
available. However, as we now have a .chunks() method, we can expose it
over gRPC.
Change-Id: I74f0890ae083f301bb0cec62f1ea4a95463ac590
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10736
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
All chunks must have valid blake3 digests. It is allowed to send an
empty list, if no more granular chunking is available.
Change-Id: I7ecb53579cdf40fd938bb68a85685751b4d3626f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10726
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
This can be written without the additional function.
Change-Id: Ib11c5d5254d3e44c8fa9661414835b0622eb1ac4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10735
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
"given chunksize" is misleading here. It's up to the backend to decide
if it does chunking at all, and how it chunks.
Change-Id: I4f130ca9ac34db79f18ef1d6475295806ac7f9a4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10728
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
BlobService already implies Send and Sync, we don't need to explicitly
list it here.
Change-Id: I58a4c5912be61a60acd961565979aa01d94ee0f7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10727
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
In the past, we had a `todo!` on unsupported node types, this returns a proper error
that can be caught by the caller.
Change-Id: Icba4c1dab33c0d670a97f162c9b358d1ed5855cb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10675
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
The BoxStream type alias is a more concise and easier to read than
the full `Pin<Box<dyn Stream<Item = ...> + Send + ...>>` type.
Change-Id: I5b7bccfd066ded5557e01f7895f4cf5c4a33bd44
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10677
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Autosubmit: Connor Brewster <cbrewster@hey.com>
In one function that does the heavy lifting: `ingest_entries`, and three additional helpers:
- `walk_path_for_ingestion` which perform the tree walking in a very naive way and can be replaced by the user
- `leveled_entries_to_stream` which transforms a list of a list of
entries ordered by their depth in the tree to a stream of entries in
the bottom to top order (Merkle-compatible order I will say in the
future).
- `ingest_path` which calls the previous functions.
Change-Id: I724b972d3c5bffc033f03363255eae448f017cef
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10573
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: raitobezarius <tvl@lahfa.xyz>
To make use of the filtering feature, we need to revert the internal walker to a real DFS.
We will therefore just invert the whole tree by storing all of its
contents in a level-keyed vector.
This is horribly expensive in memory, this is a compromise between CPU
and memory, here is the fundamental reason for why:
When you encounter a directory, it's either a leaf or not, i.e. it
contains subdirectories or not.
To know this fact, you can:
- wait until you notice subdirectories under it, i.e. you need to store
any intermediate nodes you see in the meantime -> memory penalty.
- getdents or readdir on it to determine *NOW* its subdirectories -> CPU
penalty and I/O penalty.
This is an implementation of the first proposal, we pay memory.
In practice, we are paying O(#nb of nodes) in memory.
There's a smarter albeit much more complicated algorithm that pays only
O(\sum_i #siblings(p_i)) nodes where (p_1, ..., p_n) is the path to a leaf.
which means for:
A
/ \
B C
/ / \
D E F
We would never store D, E, F but only E, F at a given time.
But we would still store B, C no matter what.
Change-Id: I456ed1c3f0db493e018ba1182665d84bebe29c11
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10567
Tested-by: BuildkiteCI
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
- Adjust to ecl 23.9.9 release
- Regenerate go protos after protoc-gen-go update
- Drop dhall fork which hasn't kept up with 1.42.*
- Address new clippy warnings:
- Variant naming of Error::ValidationError
- Simplify .try_into().unwrap()
- Drop unnecessary identity function
- Test module must be last in file
- Drop unused `pub use`
- Update agenix to 0.15.0. Current master has a installCheckPhase that
doesn't work with C++ Nix 2.3.*:
a23aa271be (commitcomment-137185861)
Change-Id: Ic29eef20d6fd1362ce1031364a5ca6b4edf195bd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10615
Reviewed-by: aspen <root@gws.fyi>
Tested-by: BuildkiteCI
Autosubmit: sterni <sternenseemann@systemli.org>
So that we can just `map_err` easily in functions returning `std::io::Error` but calling functions
returning `castore::import::Error`.
Change-Id: Id181b95e8431c69e95f3a8cd569ca10306656e1d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10572
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This adds support to retrieve a list of chunks for a given blob to the
BlobService interface.
While theoretically all chunk-awareness could be kept private inside
each BlobService reader, we'd not be able to resolve individual chunks
from different Blobservices - and due to this, not able to substitute
chunks we already have in a more local store.
This function allows asking a BlobService for the list of chunks,
leaving any actual fetching up to the caller (be it through individual
calls to open_read), or asking another store for it.
Change-Id: I1d33c591195ed494be3aec71a8c804743cbe0dca
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10586
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Make it clear this is only used inside the scope.
Change-Id: Ie94f88d7f0fb58cd4bf9c2f1176000b272e6f2e6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10585
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
The docstrings were not updated once we made the BlobService trait async.
There's no more need to turn things into a sync reader.
Also, rearrange the stream manipulation a bit, and remove the need to
create a new VecDeque for each element in the stream. bytes::Bytes
implements the Buf trait.
Fixes b/289.
Change-Id: Id2bbedca5876b462e630c144b74cc289c3916c4d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10582
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
It was a `//` not a `///`.
Change-Id: Iee3e8c116d73b5dd8a41c027153714415a66695f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10566
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Extend our validation function to also check for the None case.
Change-Id: Ib75f880646d7fb3d66588f1988e61ec18be816a2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10534
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Make this an `AsRef<dyn DirectoryService>`.
This helps dropping some Clone requirements.
Unfortunately, we can't thread this through to TvixStoreIO just yet.
Change-Id: I3f07eb28d6c793d3313fe21506ada84d5a8aa3ac
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10533
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Only convert to and reuse an Arc<…> where needed.
Change-Id: I2c1bc69cca5a4a3ebd3bdb33d6e28e1f5fb86cb9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10514
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
We can also drop the Clone requirement. Because the trait is async since
some time, there's no need to clone before moving into an async closure,
allowing us to simplify the code a bit.
Change-Id: I9b0a0e10077d8c548d218207b908bfd92c5b8de0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10515
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
We don't actually care if it's an Arc<dyn BlobService>, or something
else, as long as we can Deref to a BlobService and clone.
Change-Id: I0852aaf723f51c5e6b820be8db1199d17309ab08
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10510
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
While we currently mostly use it in an Arc, as we need to clone it
inside PathInfoService, there might be other usecases not requiring it
to be Clone.
Change-Id: Ia05bb370340792a048e2036be30e285ef1e63870
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10483
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
While we currently mostly use it in an Arc, as we need to clone it
inside PathInfoService, there might be other usecases not requiring it
to be Clone.
Change-Id: I7bd337cd2e4c2d4154b385461eefa62c9b78345d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10482
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
We only do things with the reference, so we don't need to locally borrow
it.
Change-Id: I6073f7ec7aff717ae3069e28a00b1cb408a50ceb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10455
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
There's no need to pass in an unused directory service into the
populate_blob_* method, and considering we have one or two invocation of
each of these, we don't really gain much from having all these functions
follow the same structure, at least for now.
Also, update some function names to better describe what they're doing.
Change-Id: I92f680745c157fb0a602b07342f8838bfad23ecd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10411
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
cl/10378 did already move store/fs to castore/fs, but we kept the tests
in tvix-store, as they were populating a PathInfoService to make nodes
appear in the mount root.
Update these tests to now just insert root nodes into a BTreeMap, and
ensure we can use that as a RootNodes too.
Change-Id: Iad7d1ee4f9423eb6e3a1da33f433842c9ae0de1f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10410
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
With the recent introduction of the RootNodes trait, there's nothing in
the fs module pulling in tvix-store dependencies, so it can live in
tvix-castore.
This allows other crates to make use of TvixStoreFS, without having to
pull in tvix-store.
For example, a tvix-build using a fuse mountpoint at /nix/store doesn't
need a PathInfoService to hold the root nodes that should be present,
but just a list.
tvix-store now has a pathinfoservice/fs module, which contains the
necessary glue logic to implement the RootNodes trait for a
PathInfoService.
To satisfy Rust orphan rules for trait implementations, we had to add a
small wrapper struct. It's mostly hidden away by the make_fs helper
function returning a TvixStoreFs.
It can't be entirely private, as its still leaking into the concrete
type of TvixStoreFS.
tvix-store still has `fuse` and `virtiofs` features, but they now simply
enable these features in the `tvix-castore` crate they depend on.
The tests for the fuse functionality stay in tvix-store for now, as
they populate the root nodes through a PathInfoService.
Once above mentioned "list of root nodes" implementation exists, we
might want to shuffle this around one more time.
Fixes b/341.
Change-Id: I989f664827a5a361b23b34368d242d10c157c756
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10378
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
This is not gonna end up as a interlinked docstring.
Change-Id: I2b0ca106aa75bae0156c0b411da5931da60c725d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10406
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
The simple filesystem `BlobService` enable a user to write blob store
on an existing filesystem using a prefix-style layout in the provided root directory,
e.g. the two first bytes of the blake3 hashes are used as directories prefixes.
Change-Id: I3451a688a6f39027b9c6517d853b95a87adb3a52
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10071
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This is only used in the gRPC version (GRPCPutter), during the test
automation.
So define it as a method there, behind #[cfg(test)], and remove from
the trait.
Change-Id: Idf170884e3a10be0e96c75d946d9c431171e5e88
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10340
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>