Our fork fixes a small bug (https://github.com/jneem/wu-manber/pull/1)
but it's not clear whether upstream will accept patches, so for now
lets point this directly at our fork.
Change-Id: Iccdcedae3e9a8b783241431787c952561d032694
Reviewed-on: https://cl.tvl.fyi/c/depot/+/8031
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Switch out the string-scanning algorithm used in the reference scanner.
The construction of aho-corasick automata made up the vast majority of
runtime when evaluating nixpkgs previously. While the actual scanning
with a constructed automaton is relatively fast, we almost never scan
for the same set of strings twice and the cost is not worth it.
An algorithm that better matches our needs is the Wu-Manber multiple
string match algorithm, which works efficiently on *long* and *random*
strings of the *same length*, which describes store paths (up to their
hash component).
This switches the refscanner crate to a Rust implementation[0][1] of
this algorithm.
This has several implications:
1. This crate does not provide a way to scan streams. I'm not sure if
this is an inherent problem with the algorithm (probably not, but
it would need buffering). Either way, related functions and
tests (which were actually unused) have been removed.
2. All strings need to be of the same length. For this reason, we
truncate the known paths after their hash part (they are still
unique, of course).
3. Passing an empty set of matches, or a match that is shorter than
the length of a store path, causes the crate to panic. We safeguard
against this by completely skipping the refscanning if there are no
known paths (i.e. when evaluating the first derivation of an eval),
and by bailing out of scanning a string that is shorter than a
store path.
On the upside, this reduces overall runtime to less 1/5 of what it was
before when evaluating `pkgs.stdenv.drvPath`.
[0]: Frankly, it's a random, research-grade MIT-licensed
crate that I found on Github:
https://github.com/jneem/wu-manber
[1]: We probably want to rewrite or at least fork the above crate, and
add things like a three-byte wide scanner. Evaluating large
portions of nixpkgs can easily lead to more than 65k derivations
being scanned for.
Change-Id: I08926778e1e5d5a87fc9ac26e0437aed8bbd9eb0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/8017
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This adds a function that consumes a [proto::node::Node] pointing to the
root of a (store) path, and writes the contents in NAR serialization to
the passed [std::io::Write].
We need this in various places:
- tvix-store's calculate_nar() RPC method needs to render a NAR stream
to get the nar hash, which is necessary to give things imported in
the store a "NAR-based" store path.
- communication with (remote) Nix (via daemon protocol) needs a NAR
representation.
- Things like nar-bridge, exposing a NAR/NARInfo HTTP interface need a
NAR representation.
Change-Id: I7fb2e0bf01814a1c09094c0e35394d9d6b3e43b6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7956
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Put this in its src/derivation.
Change-Id: Ic047ab1c2da555a833ee454e10ef60c77537b617
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7967
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Move nixbase32 and store_path into this.
This allows //tvix/cli to not pull in //tvix/store for now.
Change-Id: Id3a32867205d95794bc0d33b21d4cb3d9bafd02a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7964
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This helper function only was created because
populate_output_configuration was hard to test before cl/7939.
With that out of the way, we can pull it in.
Change-Id: I64b36c0eb34343290a8d84a03b0d29392a821fc7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7961
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This is much less code, and makes it much easier to read.
Change-Id: I9028f226105f905c2cc2cabd33907ff493e26225
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7938
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Instead of being called with `md5`, `sha1`, `sha256` or `sha512`,
`fetchurl.nix` (from corepkgs / `<nix`) can also be called with a `hash`
attribute, being an SRI hash.
In that case, `builtin.derivation` is called with `outputHashAlgo` being
an empty string, and `outputHash` being an SRI hash string.
In other cases, an SRI hash is passed as outputHash, but outputHashAlgo
is set too.
Nix does modify these values in (single, fixed) output specification it
serializes to ATerm, but keeps it unharmed in `env`.
Move this into a construct_output_hash helper function, that can be
tested better in isolation.
Change-Id: Id9d716a119664c44ea7747540399966752e20187
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7933
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This allows parsing TOML from Tvix. We can enable the eval-okay-fromTOML
testcase from nix_tests. It uses the `toml` crate, and the serde
integration it brings with it.
Change-Id: Ic6f95aacf2aeb890116629b409752deac49dd655
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7920
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Apparently our naive implementation of float formatting, which simply
used {:.5}, and trimmed trailing "0" strings not sufficient.
It wrongly trimmed numbers with zeroes but no decimal point, like
`10000` got trimmed to `1`.
Nix uses `std::to_string` on the double, which according to
https://en.cppreference.com/w/cpp/string/basic_string/to_string
is equivalent to `std::sprintf(buf, "%f", value)`.
https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated
like this:
> Precision specifies the exact number of digits to appear after
> the decimal point character. The default precision is 6. In the
> alternative implementation decimal point character is written even if
> no digits follow it. For infinity and not-a-number conversion style
> see notes.
This doesn't seem to be the case though, and Nix uses scientific
notation in some cases.
There's a whole bunch of strategies to determine which is a more compact
notation, and which notation should be used for a given number.
https://github.com/rust-lang/rust/issues/24556 provides some pointers
into various rabbit holes for those interested.
This gist seems to be that currently a different formatting is not
exposed in rust directly, at least not for public consumption.
There is the
[lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate
though, which provides a way to format floats with various strategies
and formats.
Change our implementation of `TotalDisplay` for the `Value::Float` case
to use that. We still need to do some post-processing, because Nix
always adds the sign in scientific notation (and there's no way to
configure lexical-core to do that), and lexical-core in some cases keeps
the trailing zeros.
Even with all that in place, there as a difference in `eval-okay-
fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated
the fixture to a less problematic number.
With this, the testsuite passes again, and does for the upcoming CL
introducing builtins.fromTOML, and enabling the nix testsuite bits for
it, too.
Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
This is used for content-defined chunking.
Change-Id: I10345372cecb9a643cc51ca45aa5b77d2a05198a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7889
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
These will be threaded through to eval through the new `TvixError`
variant.
Change-Id: Ia0d3f8710dcf26bb95015cd2a6a2b2911f06343f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7842
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Instead of going through Vec/BTreeMap for generating our internal
types, use the proptest strategies from imbl.
The one thing I couldn't figure out in the previous implementation is
where the ranges/sizes of generated collections came from. The
strategies in proptest use different types (Range, with an unknown
default value, and SizeRange with 0..100). I've opted to specify
0..100 directly, but we can probably make it configurable.
Change-Id: I749bc4c703fe424099240cab822b1642e5216361
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7791
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This module implements a ReferenceScanner struct which uses the
aho_corasick crate to scan string inputs for known, non-overlapping
candidates (store paths, in our case).
I experimented with several different APIs, and landed on this version
with an initial accumulator in the scanner. The scanner is
instantiated from the candidates and "fed" all the strings, then
consumed by the caller to retrieve the result.
Right now only things that look vaguely like bytestrings can be fed to
the scanner, there is no streaming support in the API yet.
Change-Id: I7782f0f0df5fc64bccd813aa14712f5525b0168c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7808
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This project was not previously covered by CI (fixed in this commit),
so we didn't catch breakage due to a renamed module.
This was noticed while rebasing a CL that has a dependency on this
crate in its Nix build.
Change-Id: Ic48570b9313e5f73e14daab50cf7ea70918c94d1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7778
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This allows other crates to import tvix_store.
Rename the bin crate to tvix-store-bin, to avoid having multiple crates
with the same name (https://github.com/rust-lang/cargo/issues/6313)
Change-Id: I857768d6115640dbf102e79ed03e8474090df2fe
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7728
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This will make it possible fairly easily use Nix to represent
arbitrary data structures, e.g. for using Nix as a config language.
Only pure Nix (i.e. no `import` etc.) is supported for now.
Not all types, specifically no struct traversal, are implemented in
this commit.
Change-Id: I9ac91a229a0d12bf818e6e3249f3e5a691599a2c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7712
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
This uses [tracing](https://github.com/tokio-rs/tracing) for logs/
tracing.
Annotate all method handlers with an instrument macro, and warn! a
message for them being unimplemented.
Co-Authored-By: Márton Boros <martonboros@gmail.com>
Change-Id: Id42a41db33782d82abfb8dc0e49a8915000e5d89
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7665
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This implements grpc.reflection.v1alpha.ServerReflection, and will make tools
like evans automatically discover available services, without having to
specify the path to the .proto files client-side.
It's behind a reflection feature flag, which is enabled by default.
Change-Id: Icbcb5eb05ceede5b9952e38a2ba72eaa6fa8a437
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7435
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This replaces the hello world example from tvix-store with an actual
gRPC endpoint, implementing all of BlobService, DirectoryService and
PathInfoService.
All RPC methods currently respond with the unimplemented gRPC status.
Co-Authored-By: Márton Boros <martonboros@gmail.com>
Change-Id: Ieba333cca44dc1e3f2ffbe676ba7a99e672b9bfb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7664
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This implements the nix-specific base32 encoding and decoding, exposing
a subset of the API that the data-encoding crate provides.
Nix uses a custom alphabet, no padding, and encodes bytes in reverse
order. The latter one is the reason we can't just use the data-encoding
crate directly.
Three odd corner case tests ported over from go-nix failed. We opened
b/235 to further investigate.
Change-Id: I73fab6ddd67177d882e4c3f2b48761c95853d558
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7683
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This uses the `im::OrdMap` for `NixAttrs` to enable sharing of memory
between different iterations of a map.
This slightly speeds up eval, but not significantly. Future work might
include benchmarking whether using a `HashMap` and only ordering in
cases where order is actually required would help.
This switches to a fork of `im` that fixes some bugs with its OrdMap
implementation.
Change-Id: I2f6a5ff471b6d508c1e8a98b13f889f49c0d9537
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7676
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
This adds a Derivation structure and allows to write it to a structure that implements std::fmt:Write.
The implementation is based on the go-nix version.
Change-Id: Ib54e1202b5c67f5d206b21bc109a751e971064cf
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7659
Reviewed-by: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This is a persistent, structurally sharing data structure which is
more efficient in some of our use-cases. I have verified the
efficiency improvement using `hyperfine` repeatedly over expressions
on nixpkgs.
Lists are not the most performance-critical structure in Nix (that
would be attribute sets), but we can already see a small (~5-10%)
improvement.
Note that there are a handful of cases where we still go via `Vec`
that need to be fixed, most notable for `builtins.sort` which can not
currently be implemented directly using `im::Vector` because of a
restrictive type bound.
Change-Id: I237cc50cbd7629a046e5a5e4601fbb40355e551d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7670
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
lazy_static is only used in tests, and anyhow isn't used at all (yet).
This can be dropped.
Change-Id: Ic41ff3f9bb93cfa600c3485e85464f78a3976504
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7668
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
While there's currently nothing in here checking the size of the digest,
we should use something that passes the to-be-introduced validate()
function.
Change-Id: I0c515d9e3afc79292dedebce659a32485aa3d936
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7649
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
While prost-build already exposes protobuf message types as structs, we
actually need tonic-build too, to be able to get traits for all the RPC
services defined in the proto files.
Change-Id: I7f4c08454bf0d280d577975c7cdae13ccc2d933b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7320
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This type allows for temporarily compatibility with the C++ Nix store,
specifically (for now) it gives us the store directory used by Nix and
imports files the same way.
Change-Id: I4767794ef2863eba49661315c63c4e17de946d60
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7587
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
This should make no difference in Nix builds, but allows running tests
locally again with `cargo test` for //tvix/eval.
Change-Id: I97d61840143d5c14db61d5862781bf635f9a28e7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7590
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
In //tvix/eval:
* criterion bumped to 4.0, which at least depends on clap 3.x instead
of 2.x, which is less incompatible
In //tvix/cli:
* no changes required
In //tvix/nix_cli:
* some minor changes for compatibility with clap 4.0, no functionality
changes
Change-Id: If793f64b59fcaa2402d3d483ddbab4092f32df03
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7588
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
The tvix-eval project is independent from any *uses* of the evaluator,
such as the tvix-repl.
This functionality has been split out into a separate "tvix-cli"
crate. Note that this doesn't have to mean that this CLI crate is the
"final" CLI crate for tvix, the point of this is not "getting the CLI
structure right" but rather "getting the evaluator structure right".
This reshuffling is part of restructuring the way that functionality
like store communication is injected into language evaluation.
Note that at this commit the new CLI crate is not at feature-parity.
Change-Id: Id0af03dc8e07ef09a9f882a89612ad555eca8f93
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7541
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
Introduces granular dependency builds using crate2nix, bootstrapped
off the generated configuration from the newly introduced
workspace (see cl/7533).
This commit checks in the generated Cargo.nix file which can be
regenerated with a parameterless invocation of `crate2nix generate` in
`//tvix`. I tried generating this in IFD, but it turned out to be
harder than what seemed worthwhile for now.
In this setup, the various build targets for Rust projects end up
being attributes of the imported `Cargo.nix` file at the `tvix.crates`
attribute. These still lack configuration, however, which has been
fixed in the various `default.nix` files of individual projects.
Note that we (temporarily) lose the ability to build tvix-eval's
benchmarks in CI. I haven't figured out what magic incantation summons
them from the void again ...
The `eval-okay-readDir` tests from both test suites have been disabled
because they fail for unknown reasons when run in this new derivation.
Somebody will have to debug it!
Change-Id: I2014614ccb9c8951aedbd71df7966ca191a13695
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7538
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI