tvl-depot/tvix/castore/src/directoryservice/utils.rs
Florian Klink 49b173786c refactor(tvix/castore): remove name from Nodes
Nodes only have names if they're contained inside a Directory, or if
they're a root node and have something else possibly giving them a name
externally.

This removes all `name` fields in the three different Nodes, and instead
maintains it inside a BTreeMap inside the Directory.

It also removes the NamedNode trait (they don't have a get_name()), as
well as Node::rename(self, name), and all [Partial]Ord implementations
for Node (as they don't have names to use for sorting).

The `nodes()`, `directories()`, `files()` iterators inside a `Directory`
now return a tuple of Name and Node, as does the RootNodesProvider.

The different {Directory,File,Symlink}Node struct constructors got
simpler, and the {Directory,File}Node ones became infallible - as
there's no more possibility to represent invalid state.

The proto structs stayed the same - there's now from_name_and_node and
into_name_and_node to convert back and forth between the two `Node`
structs.

Some further cleanups:

The error types for Node validation were renamed. Everything related to
names is now in the DirectoryError (not yet happy about the naming)

There's some leftover cleanups to do:
 - There should be a from_(sorted_)iter and into_iter in Directory, so
   we can construct and deconstruct in one go.
   That should also enable us to implement conversions from and to the
   proto representation that moves, rather than clones.

 - The BuildRequest and PathInfo structs are still proto-based, so we
   still do a bunch of conversions back and forth there (and have some
   ugly expect there). There's not much point for error handling here,
   this will be moved to stricter types in a followup CL.

Change-Id: I7369a8e3a426f44419c349077cb4fcab2044ebb6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12205
Tested-by: BuildkiteCI
Reviewed-by: yuka <yuka@yuka.dev>
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: benjaminedwardwebb <benjaminedwardwebb@gmail.com>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-08-17 09:45:58 +00:00

74 lines
3.1 KiB
Rust

use super::Directory;
use super::DirectoryService;
use crate::B3Digest;
use crate::Error;
use async_stream::try_stream;
use futures::stream::BoxStream;
use std::collections::{HashSet, VecDeque};
use tracing::instrument;
use tracing::warn;
/// Traverses a [Directory] from the root to the children.
///
/// This is mostly BFS, but directories are only returned once.
#[instrument(skip(directory_service))]
pub fn traverse_directory<'a, DS: DirectoryService + 'static>(
directory_service: DS,
root_directory_digest: &B3Digest,
) -> BoxStream<'a, Result<Directory, Error>> {
// The list of all directories that still need to be traversed. The next
// element is picked from the front, new elements are enqueued at the
// back.
let mut worklist_directory_digests: VecDeque<B3Digest> =
VecDeque::from([root_directory_digest.clone()]);
// The list of directory digests already sent to the consumer.
// We omit sending the same directories multiple times.
let mut sent_directory_digests: HashSet<B3Digest> = HashSet::new();
let root_directory_digest = root_directory_digest.clone();
Box::pin(try_stream! {
while let Some(current_directory_digest) = worklist_directory_digests.pop_front() {
let current_directory = match directory_service.get(&current_directory_digest).await.map_err(|e| {
warn!("failed to look up directory");
Error::StorageError(format!(
"unable to look up directory {}: {}",
current_directory_digest, e
))
})? {
// the root node of the requested closure was not found, return an empty list
None if current_directory_digest == root_directory_digest => break,
// if a child directory of the closure is not there, we have an inconsistent store!
None => {
warn!("directory {} does not exist", current_directory_digest);
Err(Error::StorageError(format!(
"directory {} does not exist",
current_directory_digest
)))?;
break;
}
Some(dir) => dir,
};
// We're about to send this directory, so let's avoid sending it again if a
// descendant has it.
sent_directory_digests.insert(current_directory_digest);
// enqueue all child directory digests to the work queue, as
// long as they're not part of the worklist or already sent.
// This panics if the digest looks invalid, it's supposed to be checked first.
for (_, child_directory_node) in current_directory.directories() {
let child_digest = child_directory_node.digest();
if worklist_directory_digests.contains(child_digest)
|| sent_directory_digests.contains(child_digest)
{
continue;
}
worklist_directory_digests.push_back(child_digest.clone());
}
yield current_directory;
}
})
}