tvl-depot/tvix/castore
Ryan Lahfa 1f1a42b4da feat(tvix/castore): ingestion does DFS and invert it
To make use of the filtering feature, we need to revert the internal walker to a real DFS.

We will therefore just invert the whole tree by storing all of its
contents in a level-keyed vector.

This is horribly expensive in memory, this is a compromise between CPU
and memory, here is the fundamental reason for why:

When you encounter a directory, it's either a leaf or not, i.e. it
contains subdirectories or not.

To know this fact, you can:

- wait until you notice subdirectories under it, i.e. you need to store
  any intermediate nodes you see in the meantime -> memory penalty.
- getdents or readdir on it to determine *NOW* its subdirectories -> CPU
  penalty and I/O penalty.

This is an implementation of the first proposal, we pay memory.

In practice, we are paying O(#nb of nodes) in memory.

There's a smarter albeit much more complicated algorithm that pays only
O(\sum_i #siblings(p_i)) nodes where (p_1, ..., p_n) is the path to a leaf.

which means for:

             A
            / \
           B   C
          /   / \
         D   E   F

We would never store D, E, F but only E, F at a given time.
But we would still store B, C no matter what.

Change-Id: I456ed1c3f0db493e018ba1182665d84bebe29c11
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10567
Tested-by: BuildkiteCI
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
2024-01-20 17:16:01 +00:00
..
docs docs(tvix/castore): add notes on verified streaming 2023-11-02 09:08:20 +00:00
protos docs(tvix/castore/protos): remove reference 2023-12-21 16:44:48 +00:00
src feat(tvix/castore): ingestion does DFS and invert it 2024-01-20 17:16:01 +00:00
build.rs chore(tvix/[ca]store): allow building without tonic-reflection 2023-09-26 10:07:40 +00:00
Cargo.toml chore(tvix): bump test-case dep to 3.3.1 2024-01-05 16:43:34 +00:00
default.nix refactor(tvix): move castore into tvix-castore crate 2023-09-22 12:51:21 +00:00