docs(tvix/docs/TODO): document ChunkService split idea

Change-Id: Ie9c88b0d14902c642e2d3d6603265688eef0e10d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11755
Reviewed-by: yuka <yuka@yuka.dev>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
This commit is contained in:
Florian Klink 2024-06-05 23:38:16 +03:00 committed by clbot
parent c4d4cce657
commit 154e0d71e0

View file

@ -178,7 +178,22 @@ logs etc, but this is something requiring a lot of designing.
### BlobService
- On the trait side, currently there's no way to distinguish reading a
known-chunk vs blob, so we might be calling `.chunks()` unnecessarily often.
At least for the `object_store` backend, this might be a problem.
At least for the `object_store` backend, this might be a problem, causing a
lot of round-trips. It also doesn't compose well - every implementation of
`BlobService` needs to both solve the "holding metadata about chunking info"
as well as "storing chunks" questions.
Design idea (@flokli): split these two concerns into two separate traits:
- a `ChunkService` dealing with retrieving individual chunks, by their
content digests. Chunks are small enough to keep around in contiguous
memory.
- a `BlobService` storing metadata about blobs.
Individual stores would not need to implement `BlobReader` anymore, but that
could be a global thing with access to the whole store composition layer,
which should make it easier to reuse chunks from other backends. Unclear
if the write path should be structured the same way. At least for some
backends, we want the remote end to be able to decide about chunking.
- While `object_store` recently got support for `Content-Type`
(https://github.com/apache/arrow-rs/pull/5650), there's no support on the
local filesystem yet. We'd need to add support to this (through xattrs).