tvl-depot/tvix
Vincent Ambo 9d6f29a72b refactor(tvix/cli): use Wu-Manber string scanning for drv references
Switch out the string-scanning algorithm used in the reference scanner.

The construction of aho-corasick automata made up the vast majority of
runtime when evaluating nixpkgs previously. While the actual scanning
with a constructed automaton is relatively fast, we almost never scan
for the same set of strings twice and the cost is not worth it.

An algorithm that better matches our needs is the Wu-Manber multiple
string match algorithm, which works efficiently on *long* and *random*
strings of the *same length*, which describes store paths (up to their
hash component).

This switches the refscanner crate to a Rust implementation[0][1] of
this algorithm.

This has several implications:

1. This crate does not provide a way to scan streams. I'm not sure if
   this is an inherent problem with the algorithm (probably not, but
   it would need buffering). Either way, related functions and
   tests (which were actually unused) have been removed.

2. All strings need to be of the same length. For this reason, we
   truncate the known paths after their hash part (they are still
   unique, of course).

3. Passing an empty set of matches, or a match that is shorter than
   the length of a store path, causes the crate to panic. We safeguard
   against this by completely skipping the refscanning if there are no
   known paths (i.e. when evaluating the first derivation of an eval),
   and by bailing out of scanning a string that is shorter than a
   store path.

On the upside, this reduces overall runtime to less 1/5 of what it was
before when evaluating `pkgs.stdenv.drvPath`.

[0]: Frankly, it's a random, research-grade MIT-licensed
     crate that I found on Github:

     https://github.com/jneem/wu-manber

[1]: We probably want to rewrite or at least fork the above crate, and
     add things like a three-byte wide scanner. Evaluating large
     portions of nixpkgs can easily lead to more than 65k derivations
     being scanned for.

Change-Id: I08926778e1e5d5a87fc9ac26e0437aed8bbd9eb0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/8017
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-02-02 17:50:44 +00:00
..
.vscode chore(tvix): fix vscode rust-analyzer recommendation 2022-10-15 16:54:28 +00:00
cli refactor(tvix/cli): use Wu-Manber string scanning for drv references 2023-02-02 17:50:44 +00:00
docs docs(tvix): fix minor spelling problems in pointer equality document 2023-01-25 14:30:50 +00:00
eval fix(tvix/eval): unsafeDiscardStringContext is a no-op 2023-02-02 14:30:32 +00:00
nix-compat feat(tvix/nix-compat/derivation): Display -> to_aterm_string() 2023-02-01 16:31:56 +00:00
nix_cli chore(tvix): upgrade to clap 4.0 2022-12-21 13:23:38 +00:00
proto chore(tvix/store): move castore.proto 2022-12-04 10:41:39 +00:00
serde refactor(tvix/serde): allow dead_code in struct 2023-01-31 15:35:46 +00:00
store feat(tvix/store): add write_nar function 2023-01-31 15:28:22 +00:00
verify-lang-tests test(tvix/eval): add test for builtins parity 2023-01-06 12:00:38 +00:00
.gitignore feat(tvix/): .gitignore target folders 2022-11-11 19:55:12 +00:00
Cargo.lock refactor(tvix/cli): use Wu-Manber string scanning for drv references 2023-02-02 17:50:44 +00:00
Cargo.nix refactor(tvix/cli): use Wu-Manber string scanning for drv references 2023-02-02 17:50:44 +00:00
Cargo.toml refactor(tvix/nix-compat): absorb nar writer 2023-01-31 15:18:39 +00:00
crate-hashes.json refactor(tvix/cli): use Wu-Manber string scanning for drv references 2023-02-02 17:50:44 +00:00
default.nix fix(tvix): add dummy target to attach extra-step to 2023-02-01 17:34:08 +00:00
LICENSE chore(tvix): Bootstrap Tvix folder 2021-03-27 00:09:49 +00:00
OWNERS chore(gerrit): migrate OWNERS files to code-owners style 2022-09-19 11:13:28 +00:00
README.md docs(tvix): add more information to README 2023-02-02 16:25:52 +00:00

Tvix

Tvix is a new implementation of the Nix language and package manager. See the announcement post for information about the background of this project.

Tvix is developed by TVL in our monorepo, the depot, at //tvix. Code reviews take place on Gerrit, bugs are filed in our issue tracker.

For more information about Tvix, feel free to reach out. We are interested in people who would like to help us review designs, brainstorm and describe requirements that we may not yet have considered.

Most of the discussion around development happens on our IRC channel, which you can join in several ways documented on tvl.fyi, or on our mailing list.

Contributions to Tvix follow the TVL review flow and contribution guidelines.

WARNING: Tvix is not ready for use in production. None of our current APIs should be considered stable in any way.

WARNING: Any other instances of this project or repository are josh-mirrors. We do not accept code contributions or issues outside of the tooling and communication methods outlined above.

Components

This folder contains the following components:

  • //tvix/eval - an implementation of the Nix programming language
  • //tvix/nix-compat - library functions for compatibility with C++ Nix
  • //tvix/cli - preliminary REPL & CLI implementation for Tvix
  • //tvix/serde - Rust library for using the Nix language for app configuration
  • //tvix/store - implementation of a file store for Tvix

Some additional folders with auxiliary things exist and can be explored at your leisure.

Building the CLI

The CLI can also be built with standard Rust tooling (i.e. cargo build), as long as you are in a shell with the right dependencies.

  • If you cloned the full monorepo, it can be provided by mg shell // tvix:shell.
  • If you cloned the tvix workspace only (git clone https://code.tvl.fyi/depot.git:workspace=views/tvix.git), nix-shell provides it.

If you're in the TVL monorepo, you can also run mg build //tvix/cli (or mg build from inside that folder) for a more incremental build.

Please follow the depot-wide instructions on how to get mg and use the depot tooling.

Compatibility

Important note: We only use and test Nix builds of our software against Nix 2.3. There are a variety of bugs and subtle problems in newer Nix versions which we do not have the bandwidth to address, builds in newer Nix versions may or may not work.

Rust projects, crate2nix

Some parts of Tvix are written in Rust. To simplify the dependency management on the Nix side of these builds, we use crate2nix in a single Rust workspace in //tvix to maintain the Nix build configuration.

When making changes to Cargo dependency configuration in any of the Rust projects under //tvix, be sure to run mg run //tvix:crate2nixGenerate -- in //tvix itself and commit the changes to the generated Cargo.nix file. This only applies to the full TVL checkout.

License structure

All code implemented for Tvix is licensed under the GPL-3.0, with the exception of the protocol buffer definitions used for communication between services which are available under a more permissive license (MIT).

The idea behind this structure is that any direct usage of our code (e.g. linking to it, embedding the evaluator, etc.) will fall under the terms of the GPL3, but users are free to implement their own components speaking these protocols under the terms of the MIT license.