docs/verified-streaming.md explained how CDC and verified streaming can
work together, but didn't really highlight enough how chunking in
general also helps with seeking.
In addition, a lot of the thoughts w.r.t. the BlobStore protocol, both
gRPC and Rust traits, as well as why there's no support for seeking
directly in gRPC, as well as how clients should behave w.r.t. chunked
fetching was missing, or mixed together with the verified streaming
bits.
While there is no verified streaming version yet, a chunked one is
coming soon, and documenting this a bit better is gonna make it easier
to understand, as well as provide some lookout on where this is heading.
Change-Id: Ib11b8ccf2ef82f9f3a43b36103df0ad64a9b68ce
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10733
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
The Stat() method was just always signalling no granular chunks are
available. However, as we now have a .chunks() method, we can expose it
over gRPC.
Change-Id: I74f0890ae083f301bb0cec62f1ea4a95463ac590
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10736
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
All chunks must have valid blake3 digests. It is allowed to send an
empty list, if no more granular chunking is available.
Change-Id: I7ecb53579cdf40fd938bb68a85685751b4d3626f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10726
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
This can be written without the additional function.
Change-Id: Ib11c5d5254d3e44c8fa9661414835b0622eb1ac4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10735
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
"given chunksize" is misleading here. It's up to the backend to decide
if it does chunking at all, and how it chunks.
Change-Id: I4f130ca9ac34db79f18ef1d6475295806ac7f9a4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10728
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
BlobService already implies Send and Sync, we don't need to explicitly
list it here.
Change-Id: I58a4c5912be61a60acd961565979aa01d94ee0f7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10727
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
In the past, we had a `todo!` on unsupported node types, this returns a proper error
that can be caught by the caller.
Change-Id: Icba4c1dab33c0d670a97f162c9b358d1ed5855cb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10675
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
The BoxStream type alias is a more concise and easier to read than
the full `Pin<Box<dyn Stream<Item = ...> + Send + ...>>` type.
Change-Id: I5b7bccfd066ded5557e01f7895f4cf5c4a33bd44
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10677
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Autosubmit: Connor Brewster <cbrewster@hey.com>
In one function that does the heavy lifting: `ingest_entries`, and three additional helpers:
- `walk_path_for_ingestion` which perform the tree walking in a very naive way and can be replaced by the user
- `leveled_entries_to_stream` which transforms a list of a list of
entries ordered by their depth in the tree to a stream of entries in
the bottom to top order (Merkle-compatible order I will say in the
future).
- `ingest_path` which calls the previous functions.
Change-Id: I724b972d3c5bffc033f03363255eae448f017cef
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10573
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: raitobezarius <tvl@lahfa.xyz>
To make use of the filtering feature, we need to revert the internal walker to a real DFS.
We will therefore just invert the whole tree by storing all of its
contents in a level-keyed vector.
This is horribly expensive in memory, this is a compromise between CPU
and memory, here is the fundamental reason for why:
When you encounter a directory, it's either a leaf or not, i.e. it
contains subdirectories or not.
To know this fact, you can:
- wait until you notice subdirectories under it, i.e. you need to store
any intermediate nodes you see in the meantime -> memory penalty.
- getdents or readdir on it to determine *NOW* its subdirectories -> CPU
penalty and I/O penalty.
This is an implementation of the first proposal, we pay memory.
In practice, we are paying O(#nb of nodes) in memory.
There's a smarter albeit much more complicated algorithm that pays only
O(\sum_i #siblings(p_i)) nodes where (p_1, ..., p_n) is the path to a leaf.
which means for:
A
/ \
B C
/ / \
D E F
We would never store D, E, F but only E, F at a given time.
But we would still store B, C no matter what.
Change-Id: I456ed1c3f0db493e018ba1182665d84bebe29c11
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10567
Tested-by: BuildkiteCI
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
- Adjust to ecl 23.9.9 release
- Regenerate go protos after protoc-gen-go update
- Drop dhall fork which hasn't kept up with 1.42.*
- Address new clippy warnings:
- Variant naming of Error::ValidationError
- Simplify .try_into().unwrap()
- Drop unnecessary identity function
- Test module must be last in file
- Drop unused `pub use`
- Update agenix to 0.15.0. Current master has a installCheckPhase that
doesn't work with C++ Nix 2.3.*:
a23aa271be (commitcomment-137185861)
Change-Id: Ic29eef20d6fd1362ce1031364a5ca6b4edf195bd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10615
Reviewed-by: aspen <root@gws.fyi>
Tested-by: BuildkiteCI
Autosubmit: sterni <sternenseemann@systemli.org>
So that we can just `map_err` easily in functions returning `std::io::Error` but calling functions
returning `castore::import::Error`.
Change-Id: Id181b95e8431c69e95f3a8cd569ca10306656e1d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10572
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This adds support to retrieve a list of chunks for a given blob to the
BlobService interface.
While theoretically all chunk-awareness could be kept private inside
each BlobService reader, we'd not be able to resolve individual chunks
from different Blobservices - and due to this, not able to substitute
chunks we already have in a more local store.
This function allows asking a BlobService for the list of chunks,
leaving any actual fetching up to the caller (be it through individual
calls to open_read), or asking another store for it.
Change-Id: I1d33c591195ed494be3aec71a8c804743cbe0dca
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10586
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Make it clear this is only used inside the scope.
Change-Id: Ie94f88d7f0fb58cd4bf9c2f1176000b272e6f2e6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10585
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
The docstrings were not updated once we made the BlobService trait async.
There's no more need to turn things into a sync reader.
Also, rearrange the stream manipulation a bit, and remove the need to
create a new VecDeque for each element in the stream. bytes::Bytes
implements the Buf trait.
Fixes b/289.
Change-Id: Id2bbedca5876b462e630c144b74cc289c3916c4d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10582
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
It was a `//` not a `///`.
Change-Id: Iee3e8c116d73b5dd8a41c027153714415a66695f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10566
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Extend our validation function to also check for the None case.
Change-Id: Ib75f880646d7fb3d66588f1988e61ec18be816a2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10534
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Make this an `AsRef<dyn DirectoryService>`.
This helps dropping some Clone requirements.
Unfortunately, we can't thread this through to TvixStoreIO just yet.
Change-Id: I3f07eb28d6c793d3313fe21506ada84d5a8aa3ac
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10533
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Only convert to and reuse an Arc<…> where needed.
Change-Id: I2c1bc69cca5a4a3ebd3bdb33d6e28e1f5fb86cb9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10514
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
We can also drop the Clone requirement. Because the trait is async since
some time, there's no need to clone before moving into an async closure,
allowing us to simplify the code a bit.
Change-Id: I9b0a0e10077d8c548d218207b908bfd92c5b8de0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10515
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
We don't actually care if it's an Arc<dyn BlobService>, or something
else, as long as we can Deref to a BlobService and clone.
Change-Id: I0852aaf723f51c5e6b820be8db1199d17309ab08
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10510
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
While we currently mostly use it in an Arc, as we need to clone it
inside PathInfoService, there might be other usecases not requiring it
to be Clone.
Change-Id: Ia05bb370340792a048e2036be30e285ef1e63870
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10483
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
While we currently mostly use it in an Arc, as we need to clone it
inside PathInfoService, there might be other usecases not requiring it
to be Clone.
Change-Id: I7bd337cd2e4c2d4154b385461eefa62c9b78345d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10482
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
We only do things with the reference, so we don't need to locally borrow
it.
Change-Id: I6073f7ec7aff717ae3069e28a00b1cb408a50ceb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10455
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
There's no need to pass in an unused directory service into the
populate_blob_* method, and considering we have one or two invocation of
each of these, we don't really gain much from having all these functions
follow the same structure, at least for now.
Also, update some function names to better describe what they're doing.
Change-Id: I92f680745c157fb0a602b07342f8838bfad23ecd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10411
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
cl/10378 did already move store/fs to castore/fs, but we kept the tests
in tvix-store, as they were populating a PathInfoService to make nodes
appear in the mount root.
Update these tests to now just insert root nodes into a BTreeMap, and
ensure we can use that as a RootNodes too.
Change-Id: Iad7d1ee4f9423eb6e3a1da33f433842c9ae0de1f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10410
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
With the recent introduction of the RootNodes trait, there's nothing in
the fs module pulling in tvix-store dependencies, so it can live in
tvix-castore.
This allows other crates to make use of TvixStoreFS, without having to
pull in tvix-store.
For example, a tvix-build using a fuse mountpoint at /nix/store doesn't
need a PathInfoService to hold the root nodes that should be present,
but just a list.
tvix-store now has a pathinfoservice/fs module, which contains the
necessary glue logic to implement the RootNodes trait for a
PathInfoService.
To satisfy Rust orphan rules for trait implementations, we had to add a
small wrapper struct. It's mostly hidden away by the make_fs helper
function returning a TvixStoreFs.
It can't be entirely private, as its still leaking into the concrete
type of TvixStoreFS.
tvix-store still has `fuse` and `virtiofs` features, but they now simply
enable these features in the `tvix-castore` crate they depend on.
The tests for the fuse functionality stay in tvix-store for now, as
they populate the root nodes through a PathInfoService.
Once above mentioned "list of root nodes" implementation exists, we
might want to shuffle this around one more time.
Fixes b/341.
Change-Id: I989f664827a5a361b23b34368d242d10c157c756
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10378
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
This is not gonna end up as a interlinked docstring.
Change-Id: I2b0ca106aa75bae0156c0b411da5931da60c725d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10406
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: edef <edef@edef.eu>
Tested-by: BuildkiteCI
The simple filesystem `BlobService` enable a user to write blob store
on an existing filesystem using a prefix-style layout in the provided root directory,
e.g. the two first bytes of the blake3 hashes are used as directories prefixes.
Change-Id: I3451a688a6f39027b9c6517d853b95a87adb3a52
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10071
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This is only used in the gRPC version (GRPCPutter), during the test
automation.
So define it as a method there, behind #[cfg(test)], and remove from
the trait.
Change-Id: Idf170884e3a10be0e96c75d946d9c431171e5e88
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10340
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
This comment didn't make a lot of sense before.
Change-Id: Ie057a133ca4b1a099ed3c885e32316b0d87c5eb0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10339
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Namely, all trait implementations should reject invalid data being fed,
and detect invalid data being returned.
b/355 tracks writing some more tests for this, to ensure we're compliant
with this.
Change-Id: I3b05752932837ce208785efb21ffc21508b4b33a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10338
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
Autosubmit: flokli <flokli@flokli.de>
If the path specified doesn't exist, construct a proper error instead
of panicking.
Part of b/344.
Change-Id: Id5c6a91248b0a387f3e8f138f8e686e402009e8f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10330
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
This will emit a log event / trace in case this function returns an
error-y type.
Change-Id: I48db6807f3e42304357c422a2b6e177cb8b95228
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10329
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
For all these calls, the caller has enough context about what it did, so
it should be fine to use io::Result here.
We pretty much only constructed crate::Error::StorageError before
anyways, so this conveys *more* information.
Change-Id: I5cabb3769c9c2314bab926d34dda748fda9d3ccc
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10328
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
We don't really require the Path to be a PathBuf, we don't even require
it to be a Path, we only need it to be AsRef<Path>>.
This removes some conversion in the from_addr cases, which can just
reuse `url.path()` (a `&str`).
Change-Id: I38d536dbaf0b44421e41f211a9ad2b13605179e9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10258
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI