tvl-depot/tvix/castore/docs/blobstore-protocol.md
Florian Klink 40d81d0c74 docs(tvix/castore/blobstore): reorganize docs
docs/verified-streaming.md explained how CDC and verified streaming can
work together, but didn't really highlight enough how chunking in
general also helps with seeking.

In addition, a lot of the thoughts w.r.t. the BlobStore protocol, both
gRPC and Rust traits, as well as why there's no support for seeking
directly in gRPC, as well as how clients should behave w.r.t. chunked
fetching was missing, or mixed together with the verified streaming
bits.

While there is no verified streaming version yet, a chunked one is
coming soon, and documenting this a bit better is gonna make it easier
to understand, as well as provide some lookout on where this is heading.

Change-Id: Ib11b8ccf2ef82f9f3a43b36103df0ad64a9b68ce
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10733
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-02-06 18:28:00 +00:00

104 lines
4.8 KiB
Markdown

# BlobStore: Protocol / Composition
This documents describes the protocol that BlobStore uses to substitute blobs
other ("remote") BlobStores.
How to come up with the blake3 digest of the blob to fetch is left to another
layer in the stack.
To put this into the context of Tvix as a Nix alternative, a blob represents an
individual file inside a StorePath.
In the Tvix Data Model, this is accomplished by having a `FileNode` (either the
`root_node` in a `PathInfo` message, or a individual file inside a `Directory`
message) encode a BLAKE3 digest.
However, the whole infrastructure can be applied for other usecases requiring
exchange/storage or access into data of which the blake3 digest is known.
## Protocol and Interfaces
As an RPC protocol, BlobStore currently uses gRPC.
On the Rust side of things, every blob service implements the
[`BlobService`](../src/blobservice/mod.rs) async trait, which isn't
gRPC-specific.
This `BlobService` trait provides functionality to check for existence of Blobs,
read from blobs, and write new blobs.
It also provides a method to ask for more granular chunks if they are available.
In addition to some in-memory, on-disk and (soon) object-storage-based
implementations, we also have a `BlobService` implementation that talks to a
gRPC server, as well as a gRPC server wrapper component, which provides a gRPC
service for anything implementing the `BlobService` trait.
This makes it very easy to talk to a remote `BlobService`, which does not even
need to be written in the same language, as long it speaks the same gRPC
protocol.
It also puts very little requirements on someone implementing a new
`BlobService`, and how its internal storage or chunking algorithm looks like.
The gRPC protocol is documented in `../protos/rpc_blobstore.proto`.
Contrary to the `BlobService` trait, it does not have any options for seeking/
ranging, as it's more desirable to provide this through chunking (see also
`./blobstore-chunking.md`).
## Composition
Different `BlobStore` are supposed to be "composed"/"layered" to express
caching, multiple local and remote sources.
The fronting interface can be the same, it'd just be multiple "tiers" that can
respond to requests, depending on where the data resides. [^1]
This makes it very simple for consumers, as they don't need to be aware of the
entire substitutor config.
The flexibility of this doesn't need to be exposed to the user in the default
case; in most cases we should be fine with some form of on-disk storage and a
bunch of substituters with different priorities.
### gRPC Clients
Clients are encouraged to always read blobs in a chunked fashion (asking for a
list of chunks for a blob via `BlobService.Stat()`, then fetching chunks via
`BlobService.Read()` as needed), instead of directly reading the entire blob via
`BlobService.Read()`.
In a composition setting, this provides opportunity for caching, and avoids
downloading some chunks if they're already present locally (for example, because
they were already downloaded by reading from a similar blob earlier).
It also removes the need for seeking to be a part of the gRPC protocol
alltogether, as chunks are supposed to be "reasonably small" [^2].
There's some further optimization potential, a `BlobService.Stat()` request
could tell the server it's happy with very small blobs just being inlined in
an additional additional field in the response, which would allow clients to
populate their local chunk store in a single roundtrip.
## Verified Streaming
As already described in `./docs/blobstore-chunking.md`, the physical chunk
information sent in a `BlobService.Stat()` response is still sufficient to fetch
in an authenticated fashion.
The exact protocol and formats are still a bit in flux, but here's some notes:
- `BlobService.Stat()` request gets a `send_bao` field (bool), signalling a
[BAO][bao-spec] should be sent. Could also be `bao_shift` integer, signalling
how detailed (down to the leaf chunks) it should go.
The exact format (and request fields) still need to be defined, edef has some
ideas around omitting some intermediate hash nodes over the wire and
recomputing them, reducing size by another ~50% over [bao-tree].
- `BlobService.Stat()` response gets some bao-related fields (`bao_shift`
field, signalling the actual format/shift level the server replies with, the
actual bao, and maybe some format specifier).
It would be nice to also be compatible with the baos used by [iroh], so we
can provide an implementation using it too.
---
[^1]: We might want to have some backchannel, so it becomes possible to provide
feedback to the user that something is downloaded.
[^2]: Something between 512K-4M, TBD.
[bao-spec]: https://github.com/oconnor663/bao/blob/master/docs/spec.md
[bao-tree]: https://github.com/n0-computer/bao-tree
[iroh]: https://github.com/n0-computer/iroh