feat(tvix/cli): implement initial refscan module

This module implements a ReferenceScanner struct which uses the
aho_corasick crate to scan string inputs for known, non-overlapping
candidates (store paths, in our case).

I experimented with several different APIs, and landed on this version
with an initial accumulator in the scanner. The scanner is
instantiated from the candidates and "fed" all the strings, then
consumed by the caller to retrieve the result.

Right now only things that look vaguely like bytestrings can be fed to
the scanner, there is no streaming support in the API yet.

Change-Id: I7782f0f0df5fc64bccd813aa14712f5525b0168c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7808
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This commit is contained in:
Vincent Ambo 2023-01-11 18:14:08 +03:00 committed by clbot
parent 9382afdb0d
commit 3045645df0
5 changed files with 104 additions and 0 deletions

View file

@ -6477,6 +6477,10 @@ rec {
then lib.cleanSourceWith { filter = sourceFilter; src = ./cli; }
else ./cli;
dependencies = [
{
name = "aho-corasick";
packageId = "aho-corasick";
}
{
name = "clap";
packageId = "clap 4.0.32";