tvl-depot/tvix/eval/src/value/mod.rs

1075 lines
39 KiB
Rust
Raw Normal View History

//! This module implements the backing representation of runtime
//! values in the Nix language.
use std::cmp::Ordering;
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
use std::fmt::Display;
feat(tvix/eval): use lexical-core to format float Apparently our naive implementation of float formatting, which simply used {:.5}, and trimmed trailing "0" strings not sufficient. It wrongly trimmed numbers with zeroes but no decimal point, like `10000` got trimmed to `1`. Nix uses `std::to_string` on the double, which according to https://en.cppreference.com/w/cpp/string/basic_string/to_string is equivalent to `std::sprintf(buf, "%f", value)`. https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated like this: > Precision specifies the exact number of digits to appear after > the decimal point character. The default precision is 6. In the > alternative implementation decimal point character is written even if > no digits follow it. For infinity and not-a-number conversion style > see notes. This doesn't seem to be the case though, and Nix uses scientific notation in some cases. There's a whole bunch of strategies to determine which is a more compact notation, and which notation should be used for a given number. https://github.com/rust-lang/rust/issues/24556 provides some pointers into various rabbit holes for those interested. This gist seems to be that currently a different formatting is not exposed in rust directly, at least not for public consumption. There is the [lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate though, which provides a way to format floats with various strategies and formats. Change our implementation of `TotalDisplay` for the `Value::Float` case to use that. We still need to do some post-processing, because Nix always adds the sign in scientific notation (and there's no way to configure lexical-core to do that), and lexical-core in some cases keeps the trailing zeros. Even with all that in place, there as a difference in `eval-okay- fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated the fixture to a less problematic number. With this, the testsuite passes again, and does for the upcoming CL introducing builtins.fromTOML, and enabling the nix testsuite bits for it, too. Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-24 19:27:20 +01:00
use std::num::{NonZeroI32, NonZeroUsize};
use std::path::PathBuf;
use std::rc::Rc;
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
use bstr::{BString, ByteVec};
feat(tvix/eval): use lexical-core to format float Apparently our naive implementation of float formatting, which simply used {:.5}, and trimmed trailing "0" strings not sufficient. It wrongly trimmed numbers with zeroes but no decimal point, like `10000` got trimmed to `1`. Nix uses `std::to_string` on the double, which according to https://en.cppreference.com/w/cpp/string/basic_string/to_string is equivalent to `std::sprintf(buf, "%f", value)`. https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated like this: > Precision specifies the exact number of digits to appear after > the decimal point character. The default precision is 6. In the > alternative implementation decimal point character is written even if > no digits follow it. For infinity and not-a-number conversion style > see notes. This doesn't seem to be the case though, and Nix uses scientific notation in some cases. There's a whole bunch of strategies to determine which is a more compact notation, and which notation should be used for a given number. https://github.com/rust-lang/rust/issues/24556 provides some pointers into various rabbit holes for those interested. This gist seems to be that currently a different formatting is not exposed in rust directly, at least not for public consumption. There is the [lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate though, which provides a way to format floats with various strategies and formats. Change our implementation of `TotalDisplay` for the `Value::Float` case to use that. We still need to do some post-processing, because Nix always adds the sign in scientific notation (and there's no way to configure lexical-core to do that), and lexical-core in some cases keeps the trailing zeros. Even with all that in place, there as a difference in `eval-okay- fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated the fixture to a less problematic number. With this, the testsuite passes again, and does for the upcoming CL introducing builtins.fromTOML, and enabling the nix testsuite bits for it, too. Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-24 19:27:20 +01:00
use lexical_core::format::CXX_LITERAL;
use serde::Deserialize;
#[cfg(feature = "arbitrary")]
mod arbitrary;
mod attrs;
mod builtin;
mod function;
mod json;
mod list;
mod path;
mod string;
mod thunk;
use crate::errors::{CatchableErrorKind, ErrorKind};
use crate::opcode::StackIdx;
use crate::spans::LightSpan;
use crate::vm::generators::{self, GenCo};
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
use crate::AddContext;
pub use attrs::NixAttrs;
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
pub use builtin::{Builtin, BuiltinResult};
pub(crate) use function::Formals;
pub use function::{Closure, Lambda};
pub use list::NixList;
pub use path::canon_path;
pub use string::{NixContext, NixContextElement, NixString};
pub use thunk::Thunk;
pub use self::thunk::ThunkSet;
feat(tvix/eval): use lexical-core to format float Apparently our naive implementation of float formatting, which simply used {:.5}, and trimmed trailing "0" strings not sufficient. It wrongly trimmed numbers with zeroes but no decimal point, like `10000` got trimmed to `1`. Nix uses `std::to_string` on the double, which according to https://en.cppreference.com/w/cpp/string/basic_string/to_string is equivalent to `std::sprintf(buf, "%f", value)`. https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated like this: > Precision specifies the exact number of digits to appear after > the decimal point character. The default precision is 6. In the > alternative implementation decimal point character is written even if > no digits follow it. For infinity and not-a-number conversion style > see notes. This doesn't seem to be the case though, and Nix uses scientific notation in some cases. There's a whole bunch of strategies to determine which is a more compact notation, and which notation should be used for a given number. https://github.com/rust-lang/rust/issues/24556 provides some pointers into various rabbit holes for those interested. This gist seems to be that currently a different formatting is not exposed in rust directly, at least not for public consumption. There is the [lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate though, which provides a way to format floats with various strategies and formats. Change our implementation of `TotalDisplay` for the `Value::Float` case to use that. We still need to do some post-processing, because Nix always adds the sign in scientific notation (and there's no way to configure lexical-core to do that), and lexical-core in some cases keeps the trailing zeros. Even with all that in place, there as a difference in `eval-okay- fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated the fixture to a less problematic number. With this, the testsuite passes again, and does for the upcoming CL introducing builtins.fromTOML, and enabling the nix testsuite bits for it, too. Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-24 19:27:20 +01:00
use lazy_static::lazy_static;
#[warn(variant_size_differences)]
#[derive(Clone, Debug, Deserialize)]
#[serde(untagged)]
pub enum Value {
Null,
Bool(bool),
Integer(i64),
Float(f64),
String(Box<NixString>),
#[serde(skip)]
Path(Box<PathBuf>),
Attrs(Box<NixAttrs>),
List(NixList),
#[serde(skip)]
Closure(Rc<Closure>), // must use Rc<Closure> here in order to get proper pointer equality
#[serde(skip)]
Builtin(Builtin),
// Internal values that, while they technically exist at runtime,
// are never returned to or created directly by users.
#[serde(skip_deserializing)]
Thunk(Thunk),
// See [`compiler::compile_select_or()`] for explanation
#[serde(skip)]
AttrNotFound,
// this can only occur in Chunk::Constants and nowhere else
#[serde(skip)]
Blueprint(Rc<Lambda>),
#[serde(skip)]
DeferredUpvalue(StackIdx),
#[serde(skip)]
UnresolvedPath(Box<PathBuf>),
#[serde(skip)]
Json(Box<serde_json::Value>),
fix(tvix/eval): only finalise formal arguments if defaulting When dealing with a formal argument in a function argument pattern that has a default expression, there are two different things that can happen at runtime: Either we select its value from the passed attribute successfully or we need to use the default expression. Both of these may be thunks and both of these may need finalisers. However, in the former case this is taken care of elsewhere, the value will always be finalised already if necessary. In the latter case we may need to finalise the thunk resulting from the default expression. However, the thunk corresponding to the expression may never end up in the local's stack slot. Since finalisation goes by stack slot (and not constants), we need to prevent a case where we don't fall back to the default expression, but finalise anyways. Previously, we worked around this by making `OpFinalise` ignore non-thunks. Since finalisation of already evaluated thunks still crashed, the faulty compilation of function pattern arguments could still cause a crash. As a new approach, we reinstate the old behavior of `OpFinalise` to crash whenever encountering something that is either not a thunk or doesn't need finalisation. This can also help catching (similar) miscompilations in the future. To then prevent the crash, we need to track whether we have fallen back or not at runtime. This is done using an additional phantom on the stack that holds a new `FinaliseRequest` value. When it comes to finalisation we check this value and conditionally execute `OpFinalise` based on its value. Resolves b/261 and b/265 (partially). Change-Id: Ic04fb80ec671a2ba11fa645090769c335fb7f58b Reviewed-on: https://cl.tvl.fyi/c/depot/+/8705 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: sterni <sternenseemann@systemli.org>
2023-06-03 02:10:31 +02:00
#[serde(skip)]
FinaliseRequest(bool),
#[serde(skip)]
Catchable(Box<CatchableErrorKind>),
}
impl From<CatchableErrorKind> for Value {
#[inline]
fn from(c: CatchableErrorKind) -> Value {
Value::Catchable(Box::new(c))
}
}
impl<V> From<Result<V, CatchableErrorKind>> for Value
where
Value: From<V>,
{
#[inline]
fn from(v: Result<V, CatchableErrorKind>) -> Value {
match v {
Ok(v) => v.into(),
Err(e) => Value::Catchable(Box::new(e)),
}
}
}
feat(tvix/eval): use lexical-core to format float Apparently our naive implementation of float formatting, which simply used {:.5}, and trimmed trailing "0" strings not sufficient. It wrongly trimmed numbers with zeroes but no decimal point, like `10000` got trimmed to `1`. Nix uses `std::to_string` on the double, which according to https://en.cppreference.com/w/cpp/string/basic_string/to_string is equivalent to `std::sprintf(buf, "%f", value)`. https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated like this: > Precision specifies the exact number of digits to appear after > the decimal point character. The default precision is 6. In the > alternative implementation decimal point character is written even if > no digits follow it. For infinity and not-a-number conversion style > see notes. This doesn't seem to be the case though, and Nix uses scientific notation in some cases. There's a whole bunch of strategies to determine which is a more compact notation, and which notation should be used for a given number. https://github.com/rust-lang/rust/issues/24556 provides some pointers into various rabbit holes for those interested. This gist seems to be that currently a different formatting is not exposed in rust directly, at least not for public consumption. There is the [lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate though, which provides a way to format floats with various strategies and formats. Change our implementation of `TotalDisplay` for the `Value::Float` case to use that. We still need to do some post-processing, because Nix always adds the sign in scientific notation (and there's no way to configure lexical-core to do that), and lexical-core in some cases keeps the trailing zeros. Even with all that in place, there as a difference in `eval-okay- fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated the fixture to a less problematic number. With this, the testsuite passes again, and does for the upcoming CL introducing builtins.fromTOML, and enabling the nix testsuite bits for it, too. Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-24 19:27:20 +01:00
lazy_static! {
static ref WRITE_FLOAT_OPTIONS: lexical_core::WriteFloatOptions =
lexical_core::WriteFloatOptionsBuilder::new()
.trim_floats(true)
.round_mode(lexical_core::write_float_options::RoundMode::Round)
.positive_exponent_break(Some(NonZeroI32::new(5).unwrap()))
.max_significant_digits(Some(NonZeroUsize::new(6).unwrap()))
.build()
.unwrap();
}
// Helper macros to generate the to_*/as_* macros while accounting for
// thunks.
/// Generate an `as_*` method returning a reference to the expected
/// type, or a type error. This only works for types that implement
/// `Copy`, as returning a reference to an inner thunk value is not
/// possible.
/// Generate an `as_*/to_*` accessor method that returns either the
/// expected type, or a type error.
macro_rules! gen_cast {
( $name:ident, $type:ty, $expected:expr, $variant:pat, $result:expr ) => {
pub fn $name(&self) -> Result<$type, ErrorKind> {
match self {
$variant => Ok($result),
Value::Thunk(thunk) => Self::$name(&thunk.value()),
other => Err(type_error($expected, &other)),
}
}
};
}
/// Generate an `as_*_mut/to_*_mut` accessor method that returns either the
/// expected type, or a type error.
macro_rules! gen_cast_mut {
( $name:ident, $type:ty, $expected:expr, $variant:ident) => {
pub fn $name(&mut self) -> Result<&mut $type, ErrorKind> {
match self {
Value::$variant(x) => Ok(x),
other => Err(type_error($expected, &other)),
}
}
};
}
/// Generate an `is_*` type-checking method.
macro_rules! gen_is {
( $name:ident, $variant:pat ) => {
pub fn $name(&self) -> bool {
match self {
$variant => true,
Value::Thunk(thunk) => Self::$name(&thunk.value()),
_ => false,
}
}
};
}
/// Describes what input types are allowed when coercing a `Value` to a string
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct CoercionKind {
/// If false only coerce already "stringly" types like strings and paths, but
/// also coerce sets that have a `__toString` attribute. In Tvix, this is
/// usually called a weak coercion. Equivalent to passing `false` as the
/// `coerceMore` argument of `EvalState::coerceToString` in C++ Nix.
///
/// If true coerce all value types included by a weak coercion, but also
/// coerce `null`, booleans, integers, floats and lists of coercible types.
/// Consequently, we call this a strong coercion. Equivalent to passing
/// `true` as `coerceMore` in C++ Nix.
pub strong: bool,
/// If `import_paths` is `true`, paths are imported into the store and their
/// store path is the result of the coercion (equivalent to the
/// `copyToStore` argument of `EvalState::coerceToString` in C++ Nix).
pub import_paths: bool,
}
impl<T> From<T> for Value
where
T: Into<NixString>,
{
fn from(t: T) -> Self {
Self::String(Box::new(t.into()))
}
}
/// Constructors
impl Value {
/// Construct a [`Value::Attrs`] from a [`NixAttrs`].
pub fn attrs(attrs: NixAttrs) -> Self {
Self::Attrs(Box::new(attrs))
}
}
/// Controls what kind of by-pointer equality comparison is allowed.
///
/// See `//tvix/docs/value-pointer-equality.md` for details.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum PointerEquality {
/// Pointer equality not allowed at all.
ForbidAll,
/// Pointer equality comparisons only allowed for nested values.
AllowNested,
/// Pointer equality comparisons are allowed in all contexts.
AllowAll,
}
impl Value {
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
/// Deeply forces a value, traversing e.g. lists and attribute sets and forcing
/// their contents, too.
///
/// This is a generator function.
pub(super) async fn deep_force(self, co: GenCo, span: LightSpan) -> Result<Value, ErrorKind> {
if let Some(v) = Self::deep_force_(self.clone(), co, span).await? {
Ok(v)
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
} else {
Ok(self)
}
}
/// Returns Some(v) or None to indicate the returned value is myself
async fn deep_force_(
myself: Value,
co: GenCo,
span: LightSpan,
) -> Result<Option<Value>, ErrorKind> {
// This is a stack of values which still remain to be forced.
let mut vals = vec![myself];
let mut thunk_set: ThunkSet = Default::default();
loop {
let v = if let Some(v) = vals.pop() {
v
} else {
return Ok(None);
};
// Get rid of any top-level thunks, and bail out of self-recursive
// thunks.
let value = if let Value::Thunk(t) = &v {
if !thunk_set.insert(t) {
continue;
}
Thunk::force_(t.clone(), &co, span.clone()).await?
} else {
v
};
match value {
// Short-circuit on already evaluated values, or fail on internal values.
Value::Null
| Value::Bool(_)
| Value::Integer(_)
| Value::Float(_)
| Value::String(_)
| Value::Path(_)
| Value::Closure(_)
| Value::Builtin(_) => continue,
Value::List(list) => {
for val in list.into_iter().rev() {
vals.push(val);
}
continue;
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
}
Value::Attrs(attrs) => {
for (_, val) in attrs.into_iter().rev() {
vals.push(val);
}
continue;
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
}
Value::Thunk(_) => panic!("Tvix bug: force_value() returned a thunk"),
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
Value::Catchable(_) => return Ok(Some(value)),
Value::AttrNotFound
| Value::Blueprint(_)
| Value::DeferredUpvalue(_)
| Value::UnresolvedPath(_)
| Value::Json(_)
| Value::FinaliseRequest(_) => panic!(
"Tvix bug: internal value left on stack: {}",
value.type_of()
),
}
}
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
}
pub async fn coerce_to_string(
self,
co: GenCo,
kind: CoercionKind,
span: LightSpan,
) -> Result<Value, ErrorKind> {
self.coerce_to_string_(&co, kind, span).await
}
/// Coerce a `Value` to a string. See `CoercionKind` for a rundown of what
/// input types are accepted under what circumstances.
pub async fn coerce_to_string_(
self,
co: &GenCo,
kind: CoercionKind,
span: LightSpan,
) -> Result<Value, ErrorKind> {
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
let mut result = BString::default();
let mut vals = vec![self];
// Track if we are coercing the first value of a list to correctly emit
// separating white spaces.
let mut is_list_head = None;
// FIXME(raitobezarius): as per https://b.tvl.fyi/issues/364
// we might be interested into more powerful context-related coercion kinds.
let mut context: NixContext = NixContext::new();
loop {
let value = if let Some(v) = vals.pop() {
v.force(co, span.clone()).await?
} else {
return Ok(Value::String(Box::new(NixString::new_context_from(
context, result,
))));
};
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
let coerced: Result<BString, _> = match (value, kind) {
// coercions that are always done
(Value::String(mut s), _) => {
if let Some(ctx) = s.context_mut() {
context = context.join(ctx);
}
Ok((*s).into())
}
// TODO(sterni): Think about proper encoding handling here. This needs
// general consideration anyways, since one current discrepancy between
// C++ Nix and Tvix is that the former's strings are arbitrary byte
// sequences without NUL bytes, whereas Tvix only allows valid
// Unicode. See also b/189.
(
Value::Path(p),
CoercionKind {
import_paths: true, ..
},
) => {
let imported = generators::request_path_import(co, *p).await;
// When we import a path from the evaluator, we must attach
// its original path as its context.
context = context.append(NixContextElement::Plain(
imported.to_string_lossy().to_string(),
));
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
Ok(imported.into_os_string().into_encoded_bytes().into())
}
(
Value::Path(p),
CoercionKind {
import_paths: false,
..
},
) => Ok(p.into_os_string().into_encoded_bytes().into()),
// Attribute sets can be converted to strings if they either have an
// `__toString` attribute which holds a function that receives the
// set itself or an `outPath` attribute which should be a string.
// `__toString` is preferred.
(Value::Attrs(attrs), kind) => {
if let Some(to_string) = attrs.select("__toString") {
let callable = to_string.clone().force(co, span.clone()).await?;
// Leave the attribute set on the stack as an argument
// to the function call.
generators::request_stack_push(co, Value::Attrs(attrs.clone())).await;
// Call the callable ...
let result = generators::request_call(co, callable).await;
// Recurse on the result, as attribute set coercion
// actually works recursively, e.g. you can even return
// /another/ set with a __toString attr.
vals.push(result);
continue;
} else if let Some(out_path) = attrs.select("outPath") {
vals.push(out_path.clone());
continue;
} else {
return Err(ErrorKind::NotCoercibleToString { from: "set", kind });
}
}
// strong coercions
(Value::Null, CoercionKind { strong: true, .. })
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
| (Value::Bool(false), CoercionKind { strong: true, .. }) => Ok("".into()),
(Value::Bool(true), CoercionKind { strong: true, .. }) => Ok("1".into()),
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
(Value::Integer(i), CoercionKind { strong: true, .. }) => Ok(format!("{i}").into()),
(Value::Float(f), CoercionKind { strong: true, .. }) => {
// contrary to normal Display, coercing a float to a string will
// result in unconditional 6 decimal places
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
Ok(format!("{:.6}", f).into())
}
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
// Lists are coerced by coercing their elements and interspersing spaces
(Value::List(list), CoercionKind { strong: true, .. }) => {
for elem in list.into_iter().rev() {
vals.push(elem);
}
// In case we are coercing a list within a list we don't want
// to touch this. Since the algorithm is nonrecursive, the
// space would not have been created yet (due to continue).
if is_list_head.is_none() {
is_list_head = Some(true);
}
continue;
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
}
(Value::Thunk(_), _) => panic!("Tvix bug: force returned unforced thunk"),
val @ (Value::Closure(_), _)
| val @ (Value::Builtin(_), _)
| val @ (Value::Null, _)
| val @ (Value::Bool(_), _)
| val @ (Value::Integer(_), _)
| val @ (Value::Float(_), _)
| val @ (Value::List(_), _) => Err(ErrorKind::NotCoercibleToString {
from: val.0.type_of(),
kind,
}),
(c @ Value::Catchable(_), _) => return Ok(c),
(Value::AttrNotFound, _)
| (Value::Blueprint(_), _)
| (Value::DeferredUpvalue(_), _)
| (Value::UnresolvedPath(_), _)
| (Value::Json(_), _)
| (Value::FinaliseRequest(_), _) => {
panic!("tvix bug: .coerce_to_string() called on internal value")
}
};
if let Some(head) = is_list_head {
if !head {
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
result.push(b' ');
} else {
is_list_head = Some(false);
}
}
result.push_str(&coerced?);
}
}
pub(crate) async fn nix_eq_owned_genco(
self,
other: Value,
co: GenCo,
ptr_eq: PointerEquality,
span: LightSpan,
) -> Result<Value, ErrorKind> {
self.nix_eq(other, &co, ptr_eq, span).await
}
/// Compare two Nix values for equality, forcing nested parts of the structure
/// as needed.
///
/// This comparison needs to be invoked for nested values (e.g. in lists and
/// attribute sets) as well, which is done by suspending and asking the VM to
/// perform the nested comparison.
///
/// The `top_level` parameter controls whether this invocation is the top-level
/// comparison, or a nested value comparison. See
/// `//tvix/docs/value-pointer-equality.md`
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
pub(crate) async fn nix_eq(
self,
other: Value,
co: &GenCo,
ptr_eq: PointerEquality,
span: LightSpan,
) -> Result<Value, ErrorKind> {
// this is a stack of ((v1,v2),peq) triples to be compared;
// after each triple is popped off of the stack, v1 is
// compared to v2 using peq-mode PointerEquality
let mut vals = vec![((self, other), ptr_eq)];
loop {
let ((a, b), ptr_eq) = if let Some(abp) = vals.pop() {
abp
} else {
// stack is empty, so comparison has succeeded
return Ok(Value::Bool(true));
};
let a = match a {
Value::Thunk(thunk) => {
// If both values are thunks, and thunk comparisons are allowed by
// pointer, do that and move on.
if ptr_eq == PointerEquality::AllowAll {
if let Value::Thunk(t1) = &b {
if t1.ptr_eq(&thunk) {
continue;
}
}
};
Thunk::force_(thunk, co, span.clone()).await?
}
_ => a,
};
let b = b.force(co, span.clone()).await?;
debug_assert!(!matches!(a, Value::Thunk(_)));
debug_assert!(!matches!(b, Value::Thunk(_)));
let result = match (a, b) {
// Trivial comparisons
(c @ Value::Catchable(_), _) => return Ok(c),
(_, c @ Value::Catchable(_)) => return Ok(c),
(Value::Null, Value::Null) => true,
(Value::Bool(b1), Value::Bool(b2)) => b1 == b2,
(Value::String(s1), Value::String(s2)) => s1 == s2,
(Value::Path(p1), Value::Path(p2)) => p1 == p2,
// Numerical comparisons (they work between float & int)
(Value::Integer(i1), Value::Integer(i2)) => i1 == i2,
(Value::Integer(i), Value::Float(f)) => i as f64 == f,
(Value::Float(f1), Value::Float(f2)) => f1 == f2,
(Value::Float(f), Value::Integer(i)) => i as f64 == f,
// List comparisons
(Value::List(l1), Value::List(l2)) => {
if ptr_eq >= PointerEquality::AllowNested && l1.ptr_eq(&l2) {
continue;
}
if l1.len() != l2.len() {
return Ok(Value::Bool(false));
}
vals.extend(l1.into_iter().rev().zip(l2.into_iter().rev()).zip(
std::iter::repeat(std::cmp::max(ptr_eq, PointerEquality::AllowNested)),
));
continue;
}
(_, Value::List(_)) | (Value::List(_), _) => return Ok(Value::Bool(false)),
// Attribute set comparisons
(Value::Attrs(a1), Value::Attrs(a2)) => {
if ptr_eq >= PointerEquality::AllowNested && a1.ptr_eq(&a2) {
continue;
}
// Special-case for derivation comparisons: If both attribute sets
// have `type = derivation`, compare them by `outPath`.
#[allow(clippy::single_match)] // might need more match arms later
match (a1.select("type"), a2.select("type")) {
(Some(v1), Some(v2)) => {
let s1 = v1.clone().force(co, span.clone()).await?;
if s1.is_catchable() {
return Ok(s1);
}
let s2 = v2.clone().force(co, span.clone()).await?;
if s2.is_catchable() {
return Ok(s2);
}
let s1 = s1.to_str();
let s2 = s2.to_str();
if let (Ok(s1), Ok(s2)) = (s1, s2) {
fix(tvix): Represent strings as byte arrays C++ nix uses C-style zero-terminated char pointers to represent strings internally - however, up to this point, tvix has used Rust `String` and `str` for string values. Since those are required to be valid utf-8, we haven't been able to properly represent all the string values that Nix supports. To fix that, this change converts the internal representation of the NixString struct from `Box<str>` to `BString`, from the `bstr` crate - this is a wrapper around a `Vec<u8>` with extra functions for treating that byte vector as a "morally string-like" value, which is basically exactly what we need. Since this changes a pretty fundamental assumption about a pretty core type, there are a *lot* of changes in a lot of places to make this work, but I've tried to keep the general philosophy and intent of most of the code in most places intact. Most notably, there's nothing that's been done to make the derivation stuff in //tvix/glue work with non-utf8 strings everywhere, instead opting to just convert to String/str when passing things into that - there *might* be something to be done there, but I don't know what the rules should be and I don't want to figure them out in this change. To deal with OS-native paths in a way that also works in WASM for tvixbolt, this also adds a dependency on the "os_str_bytes" crate. Fixes: b/189 Fixes: b/337 Change-Id: I5e6eb29c62f47dd91af954f5e12bfc3d186f5526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10200 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-12-05 17:25:52 -05:00
if s1 == "derivation" && s2 == "derivation" {
// TODO(tazjin): are the outPaths really required,
// or should it fall through?
let out1 = a1
.select_required("outPath")
.context("comparing derivations")?
.clone();
let out2 = a2
.select_required("outPath")
.context("comparing derivations")?
.clone();
let out1 = out1.clone().force(co, span.clone()).await?;
let out2 = out2.clone().force(co, span.clone()).await?;
if out1.is_catchable() {
return Ok(out1);
}
if out2.is_catchable() {
return Ok(out2);
}
let result =
out1.to_contextful_str()? == out2.to_contextful_str()?;
if !result {
return Ok(Value::Bool(false));
} else {
continue;
}
}
}
}
_ => {}
};
if a1.len() != a2.len() {
return Ok(Value::Bool(false));
}
// note that it is important to be careful here with the
// order we push the keys and values in order to properly
// compare attrsets containing `throw` elements.
let iter1 = a1.into_iter_sorted().rev();
let iter2 = a2.into_iter_sorted().rev();
for ((k1, v1), (k2, v2)) in iter1.zip(iter2) {
vals.push((
(v1, v2),
std::cmp::max(ptr_eq, PointerEquality::AllowNested),
));
vals.push((
(k1.into(), k2.into()),
std::cmp::max(ptr_eq, PointerEquality::AllowNested),
));
}
continue;
}
(Value::Attrs(_), _) | (_, Value::Attrs(_)) => return Ok(Value::Bool(false)),
(Value::Closure(c1), Value::Closure(c2))
if ptr_eq >= PointerEquality::AllowNested =>
{
if Rc::ptr_eq(&c1, &c2) {
continue;
} else {
return Ok(Value::Bool(false));
}
}
// Everything else is either incomparable (e.g. internal types) or
// false.
_ => return Ok(Value::Bool(false)),
};
if !result {
return Ok(Value::Bool(false));
}
}
}
pub fn type_of(&self) -> &'static str {
match self {
Value::Null => "null",
Value::Bool(_) => "bool",
Value::Integer(_) => "int",
Value::Float(_) => "float",
Value::String(_) => "string",
Value::Path(_) => "path",
Value::Attrs(_) => "set",
Value::List(_) => "list",
Value::Closure(_) | Value::Builtin(_) => "lambda",
// Internal types. Note: These are only elaborated here
// because it makes debugging easier. If a user ever sees
// any of these strings, it's a bug.
Value::Thunk(_) => "internal[thunk]",
Value::AttrNotFound => "internal[attr_not_found]",
Value::Blueprint(_) => "internal[blueprint]",
Value::DeferredUpvalue(_) => "internal[deferred_upvalue]",
Value::UnresolvedPath(_) => "internal[unresolved_path]",
Value::Json(_) => "internal[json]",
fix(tvix/eval): only finalise formal arguments if defaulting When dealing with a formal argument in a function argument pattern that has a default expression, there are two different things that can happen at runtime: Either we select its value from the passed attribute successfully or we need to use the default expression. Both of these may be thunks and both of these may need finalisers. However, in the former case this is taken care of elsewhere, the value will always be finalised already if necessary. In the latter case we may need to finalise the thunk resulting from the default expression. However, the thunk corresponding to the expression may never end up in the local's stack slot. Since finalisation goes by stack slot (and not constants), we need to prevent a case where we don't fall back to the default expression, but finalise anyways. Previously, we worked around this by making `OpFinalise` ignore non-thunks. Since finalisation of already evaluated thunks still crashed, the faulty compilation of function pattern arguments could still cause a crash. As a new approach, we reinstate the old behavior of `OpFinalise` to crash whenever encountering something that is either not a thunk or doesn't need finalisation. This can also help catching (similar) miscompilations in the future. To then prevent the crash, we need to track whether we have fallen back or not at runtime. This is done using an additional phantom on the stack that holds a new `FinaliseRequest` value. When it comes to finalisation we check this value and conditionally execute `OpFinalise` based on its value. Resolves b/261 and b/265 (partially). Change-Id: Ic04fb80ec671a2ba11fa645090769c335fb7f58b Reviewed-on: https://cl.tvl.fyi/c/depot/+/8705 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: sterni <sternenseemann@systemli.org>
2023-06-03 02:10:31 +02:00
Value::FinaliseRequest(_) => "internal[finaliser_sentinel]",
Value::Catchable(_) => "internal[catchable]",
}
}
gen_cast!(as_bool, bool, "bool", Value::Bool(b), *b);
gen_cast!(as_int, i64, "int", Value::Integer(x), *x);
gen_cast!(as_float, f64, "float", Value::Float(x), *x);
/// Cast the current value into a **context-less** string.
/// If you wanted to cast it into a potentially contextful string,
/// you have to explicitly use `to_contextful_str`.
/// Contextful strings are special, they should not be obtained
/// everytime you want a string.
pub fn to_str(&self) -> Result<NixString, ErrorKind> {
match self {
Value::String(s) if !s.has_context() => Ok((**s).clone()),
Value::Thunk(thunk) => Self::to_str(&thunk.value()),
other => Err(type_error("contextless strings", other)),
}
}
gen_cast!(
to_contextful_str,
NixString,
"contextful string",
Value::String(s),
(**s).clone()
);
gen_cast!(to_path, Box<PathBuf>, "path", Value::Path(p), p.clone());
gen_cast!(to_attrs, Box<NixAttrs>, "set", Value::Attrs(a), a.clone());
gen_cast!(to_list, NixList, "list", Value::List(l), l.clone());
gen_cast!(
as_closure,
Rc<Closure>,
"lambda",
Value::Closure(c),
c.clone()
);
gen_cast_mut!(as_list_mut, NixList, "list", List);
gen_is!(is_path, Value::Path(_));
gen_is!(is_number, Value::Integer(_) | Value::Float(_));
gen_is!(is_bool, Value::Bool(_));
gen_is!(is_attrs, Value::Attrs(_));
gen_is!(is_catchable, Value::Catchable(_));
/// Returns `true` if the value is a [`Thunk`].
///
/// [`Thunk`]: Value::Thunk
pub fn is_thunk(&self) -> bool {
matches!(self, Self::Thunk(..))
}
/// Compare `self` against other using (fallible) Nix ordering semantics.
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
///
/// The function is intended to be used from within other generator
/// functions or `gen!` blocks.
pub async fn nix_cmp_ordering(
self,
other: Self,
co: GenCo,
span: LightSpan,
) -> Result<Result<Ordering, CatchableErrorKind>, ErrorKind> {
Self::nix_cmp_ordering_(self, other, co, span).await
}
async fn nix_cmp_ordering_(
myself: Self,
other: Self,
co: GenCo,
span: LightSpan,
) -> Result<Result<Ordering, CatchableErrorKind>, ErrorKind> {
// this is a stack of ((v1,v2),peq) triples to be compared;
// after each triple is popped off of the stack, v1 is
// compared to v2 using peq-mode PointerEquality
let mut vals = vec![((myself, other), PointerEquality::ForbidAll)];
loop {
let ((mut a, mut b), ptr_eq) = if let Some(abp) = vals.pop() {
abp
} else {
// stack is empty, so they are equal
return Ok(Ok(Ordering::Equal));
};
if ptr_eq == PointerEquality::AllowAll {
if a.clone()
.nix_eq(b.clone(), &co, PointerEquality::AllowAll, span.clone())
.await?
.as_bool()?
{
continue;
}
a = a.force(&co, span.clone()).await?;
b = b.force(&co, span.clone()).await?;
}
let result = match (a, b) {
(Value::Catchable(c), _) => return Ok(Err(*c)),
(_, Value::Catchable(c)) => return Ok(Err(*c)),
// same types
(Value::Integer(i1), Value::Integer(i2)) => i1.cmp(&i2),
(Value::Float(f1), Value::Float(f2)) => f1.total_cmp(&f2),
(Value::String(s1), Value::String(s2)) => s1.cmp(&s2),
(Value::List(l1), Value::List(l2)) => {
let max = l1.len().max(l2.len());
for j in 0..max {
let i = max - 1 - j;
if i >= l2.len() {
vals.push(((1.into(), 0.into()), PointerEquality::ForbidAll));
} else if i >= l1.len() {
vals.push(((0.into(), 1.into()), PointerEquality::ForbidAll));
} else {
vals.push(((l1[i].clone(), l2[i].clone()), PointerEquality::AllowAll));
}
}
continue;
}
// different types
(Value::Integer(i1), Value::Float(f2)) => (i1 as f64).total_cmp(&f2),
(Value::Float(f1), Value::Integer(i2)) => f1.total_cmp(&(i2 as f64)),
// unsupported types
(lhs, rhs) => {
return Err(ErrorKind::Incomparable {
lhs: lhs.type_of(),
rhs: rhs.type_of(),
})
}
};
if result != Ordering::Equal {
return Ok(Ok(result));
}
}
}
chore(tvix/eval): mark async functions which are called by the VM Given Rust's current lack of support for tail calls, we cannot avoid using `async` for builtins. This is the only way to avoid overflowing the cpu stack when we have arbitrarily deep builtin/interpreted/builtin/interpreted/... "sandwiches" There are only five `async fn` functions which are not builtins (some come in multiple "flavors"): - add_values - resolve_with - force, final_deep_force - nix_eq, nix_cmp_eq - coerce_to_string These can be written iteratively rather than recursively (and in fact nix_eq used to be written that way!). I volunteer to rewrite them. If written iteratively they would no longer need to be `async`. There are two motivations for limiting our reliance on `async` to only the situation (builtins) where we have no other choice: 1. Performance. We don't really have any good measurement of the performance hit that the Box<dyn Future>s impose on us. Right now all of our large (nixpkgs-eval) tests are swamped by the cost of other things (e.g. fork()ing `nix-store`) so we can't really measure it. Builtins tend to be expensive operations anyways (regexp-matching, sorting, etc) that are likely to already cost more than the `async` overhead. 2. Preserving the ability to switch to `musttail` calls. Clang/LLVM recently got `musttail` (mandatory-elimination tail calls). Rust has refused to add this mainly because WASM doesn't support, but WASM `tail_call` has been implemented and was recently moved to phase 4 (standardization). It is very likely that Rust will get tail calls sometime in the next year; if it does, we won't need async anymore. In the meantime, I'd like to avoid adding any further reliance on `async` in places where it wouldn't be straightforward to replace it with a tail call. https://reviews.llvm.org/D99517 https://github.com/WebAssembly/proposals/pull/157 https: //github.com/rust-lang/rfcs/issues/2691#issuecomment-1462152908 Change-Id: Id15945d5a92bf52c16d93456e3437f91d93bdc57 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8290 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: Adam Joseph <adam@westernsemico.com>
2023-03-13 05:58:33 -07:00
// TODO(amjoseph): de-asyncify this (when called directly by the VM)
pub async fn force(self, co: &GenCo, span: LightSpan) -> Result<Value, ErrorKind> {
if let Value::Thunk(thunk) = self {
// TODO(amjoseph): use #[tailcall::mutual]
return Thunk::force_(thunk, co, span).await;
}
Ok(self)
}
// need two flavors, because async
pub async fn force_owned_genco(self, co: GenCo, span: LightSpan) -> Result<Value, ErrorKind> {
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
if let Value::Thunk(thunk) = self {
feat(tvix/eval): rewrite Thunk::force() in nonrecursive form This commit rewrites Thunk::force() so that it is not (directly) self-recursive. It maintains a Vec of all the previously-encountered thunks which point to the one it is currently forcing, rather than recursively calling itself. Benefits: - Short term: This commit saves the cost of a round-trip through the generator machinery for the generators::request_force() which is removed by this commit. - Medium term: Once a similar transformation has been applied to nix_cmp(), nix_add(), nix_eq(), and coerce_to_string(), those four functions, along with Thunk::force(), will make non-tail calls only to each other. They can then be merged into a single tail-recursive function which does not use the generator machinery at all: enum Task { Cmp, Add, Eq, CoerceToString, Force}; fn Value::walk(task:Task, v1:Value, v2:Value) { // ... - Long term: The long-term goal here is to use generators **only for builtins** and [Marionette]-style remote control of the VM. In other words: use `async` for things that actually involve concurrency. Calls from the VM to builtins can then be blocking calls, because even cppnix will overflow the stack if you make a MAX_STACK_DEPTH-deep recursive call which passes through a builtin at every stack frame (e.g. `{ func = builtins.sort (a: b: ... func ...) ...}`). This way the inner "tight loop" of the interpreter doesn't pay the costs of `async` and generators. These costs manifest in terms of: performance, complex nonlocal control flow, and language impediments (async Rust is a restricted subset of real Rust, and is missing things like traits). [Marionette]: https://firefox-source-docs.mozilla.org/testing/marionette/Intro.html Change-Id: I6179b8abb2ea0492180fcb347f37595a14665777 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10039 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-11-14 10:23:42 -08:00
// TODO(amjoseph): use #[tailcall::mutual]
return Thunk::force_(thunk, &co, span).await;
}
refactor(tvix/eval): flatten call stack of VM using generators Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
2023-02-14 15:02:39 +03:00
Ok(self)
}
/// Explain a value in a human-readable way, e.g. by presenting
/// the docstrings of functions if present.
pub fn explain(&self) -> String {
match self {
Value::Null => "the 'null' value".into(),
Value::Bool(b) => format!("the boolean value '{}'", b),
Value::Integer(i) => format!("the integer '{}'", i),
Value::Float(f) => format!("the float '{}'", f),
Value::String(s) if s.has_context() => format!("the contextful string '{}'", s),
Value::String(s) => format!("the contextless string '{}'", s),
Value::Path(p) => format!("the path '{}'", p.to_string_lossy()),
Value::Attrs(attrs) => format!("a {}-item attribute set", attrs.len()),
Value::List(list) => format!("a {}-item list", list.len()),
Value::Closure(f) => {
if let Some(name) = &f.lambda.name {
format!("the user-defined Nix function '{}'", name)
} else {
"a user-defined Nix function".to_string()
}
}
Value::Builtin(b) => {
let mut out = format!("the builtin function '{}'", b.name());
if let Some(docs) = b.documentation() {
out.push_str("\n\n");
out.push_str(docs);
}
out
}
// TODO: handle suspended thunks with a different explanation instead of panicking
Value::Thunk(t) => t.value().explain(),
Value::Catchable(_) => "a catchable failure".into(),
Value::AttrNotFound
| Value::Blueprint(_)
| Value::DeferredUpvalue(_)
| Value::UnresolvedPath(_)
fix(tvix/eval): only finalise formal arguments if defaulting When dealing with a formal argument in a function argument pattern that has a default expression, there are two different things that can happen at runtime: Either we select its value from the passed attribute successfully or we need to use the default expression. Both of these may be thunks and both of these may need finalisers. However, in the former case this is taken care of elsewhere, the value will always be finalised already if necessary. In the latter case we may need to finalise the thunk resulting from the default expression. However, the thunk corresponding to the expression may never end up in the local's stack slot. Since finalisation goes by stack slot (and not constants), we need to prevent a case where we don't fall back to the default expression, but finalise anyways. Previously, we worked around this by making `OpFinalise` ignore non-thunks. Since finalisation of already evaluated thunks still crashed, the faulty compilation of function pattern arguments could still cause a crash. As a new approach, we reinstate the old behavior of `OpFinalise` to crash whenever encountering something that is either not a thunk or doesn't need finalisation. This can also help catching (similar) miscompilations in the future. To then prevent the crash, we need to track whether we have fallen back or not at runtime. This is done using an additional phantom on the stack that holds a new `FinaliseRequest` value. When it comes to finalisation we check this value and conditionally execute `OpFinalise` based on its value. Resolves b/261 and b/265 (partially). Change-Id: Ic04fb80ec671a2ba11fa645090769c335fb7f58b Reviewed-on: https://cl.tvl.fyi/c/depot/+/8705 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: sterni <sternenseemann@systemli.org>
2023-06-03 02:10:31 +02:00
| Value::Json(_)
| Value::FinaliseRequest(_) => "an internal Tvix evaluator value".into(),
}
}
}
trait TotalDisplay {
fn total_fmt(&self, f: &mut std::fmt::Formatter<'_>, set: &mut ThunkSet) -> std::fmt::Result;
}
impl Display for Value {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.total_fmt(f, &mut Default::default())
}
}
/// Emulates the C++-Nix style formatting of floats, which diverges
/// significantly from Rust's native float formatting.
fn total_fmt_float<F: std::fmt::Write>(num: f64, mut f: F) -> std::fmt::Result {
let mut buf = [b'0'; lexical_core::BUFFER_SIZE];
let mut s = lexical_core::write_with_options::<f64, { CXX_LITERAL }>(
num,
&mut buf,
&WRITE_FLOAT_OPTIONS,
);
// apply some postprocessing on the buffer. If scientific
// notation is used (we see an `e`), and the next character is
// a digit, add the missing `+` sign.)
let mut new_s = Vec::with_capacity(s.len());
if s.contains(&b'e') {
for (i, c) in s.iter().enumerate() {
// encountered `e`
if c == &b'e' {
// next character is a digit (so no negative exponent)
if s.len() > i && s[i + 1].is_ascii_digit() {
// copy everything from the start up to (including) the e
new_s.extend_from_slice(&s[0..=i]);
// add the missing '+'
new_s.push(b'+');
// check for the remaining characters.
// If it's only one, we need to prepend a trailing zero
if s.len() == i + 2 {
new_s.push(b'0');
}
new_s.extend_from_slice(&s[i + 1..]);
break;
}
}
}
// if we modified the scientific notation, flip the reference
if !new_s.is_empty() {
s = &mut new_s
}
} else if s.contains(&b'.') {
// else, if this is not scientific notation, and there's a
// decimal point, make sure we really drop trailing zeroes.
// In some cases, lexical_core doesn't.
for (i, c) in s.iter().enumerate() {
// at `.``
if c == &b'.' {
// trim zeroes from the right side.
let frac = String::from_utf8_lossy(&s[i + 1..]);
let frac_no_trailing_zeroes = frac.trim_end_matches('0');
if frac.len() != frac_no_trailing_zeroes.len() {
// we managed to strip something, construct new_s
if frac_no_trailing_zeroes.is_empty() {
// if frac_no_trailing_zeroes is empty, the fractional part was all zeroes, so we can drop the decimal point as well
new_s.extend_from_slice(&s[0..=i - 1]);
} else {
// else, assemble the rest of the string
new_s.extend_from_slice(&s[0..=i]);
new_s.extend_from_slice(frac_no_trailing_zeroes.as_bytes());
}
// flip the reference
s = &mut new_s;
break;
}
}
}
}
write!(f, "{}", String::from_utf8_lossy(s))
}
impl TotalDisplay for Value {
fn total_fmt(&self, f: &mut std::fmt::Formatter<'_>, set: &mut ThunkSet) -> std::fmt::Result {
match self {
Value::Null => f.write_str("null"),
Value::Bool(true) => f.write_str("true"),
Value::Bool(false) => f.write_str("false"),
Value::Integer(num) => write!(f, "{}", num),
Value::String(s) => s.fmt(f),
Value::Path(p) => p.display().fmt(f),
Value::Attrs(attrs) => attrs.total_fmt(f, set),
Value::List(list) => list.total_fmt(f, set),
// TODO: fancy REPL display with position
Value::Closure(_) => f.write_str("<LAMBDA>"),
Value::Builtin(builtin) => builtin.fmt(f),
// Nix prints floats with a maximum precision of 5 digits
feat(tvix/eval): use lexical-core to format float Apparently our naive implementation of float formatting, which simply used {:.5}, and trimmed trailing "0" strings not sufficient. It wrongly trimmed numbers with zeroes but no decimal point, like `10000` got trimmed to `1`. Nix uses `std::to_string` on the double, which according to https://en.cppreference.com/w/cpp/string/basic_string/to_string is equivalent to `std::sprintf(buf, "%f", value)`. https://en.cppreference.com/w/cpp/io/c/fprintf mentions this is treated like this: > Precision specifies the exact number of digits to appear after > the decimal point character. The default precision is 6. In the > alternative implementation decimal point character is written even if > no digits follow it. For infinity and not-a-number conversion style > see notes. This doesn't seem to be the case though, and Nix uses scientific notation in some cases. There's a whole bunch of strategies to determine which is a more compact notation, and which notation should be used for a given number. https://github.com/rust-lang/rust/issues/24556 provides some pointers into various rabbit holes for those interested. This gist seems to be that currently a different formatting is not exposed in rust directly, at least not for public consumption. There is the [lexical-core](https://github.com/Alexhuszagh/rust-lexical) crate though, which provides a way to format floats with various strategies and formats. Change our implementation of `TotalDisplay` for the `Value::Float` case to use that. We still need to do some post-processing, because Nix always adds the sign in scientific notation (and there's no way to configure lexical-core to do that), and lexical-core in some cases keeps the trailing zeros. Even with all that in place, there as a difference in `eval-okay- fromjson.nix` (from tvix-tests), which I couldn't get to work. I updated the fixture to a less problematic number. With this, the testsuite passes again, and does for the upcoming CL introducing builtins.fromTOML, and enabling the nix testsuite bits for it, too. Change-Id: Ie6fba5619e1d9fd7ce669a51594658b029057acc Reviewed-on: https://cl.tvl.fyi/c/depot/+/7922 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-24 19:27:20 +01:00
// only. Except when it decides to use scientific notation
// (with a + after the `e`, and zero-padded to 0 digits)
Value::Float(num) => total_fmt_float(*num, f),
// internal types
Value::AttrNotFound => f.write_str("internal[not found]"),
Value::Blueprint(_) => f.write_str("internal[blueprint]"),
Value::DeferredUpvalue(_) => f.write_str("internal[deferred_upvalue]"),
Value::UnresolvedPath(_) => f.write_str("internal[unresolved_path]"),
Value::Json(_) => f.write_str("internal[json]"),
fix(tvix/eval): only finalise formal arguments if defaulting When dealing with a formal argument in a function argument pattern that has a default expression, there are two different things that can happen at runtime: Either we select its value from the passed attribute successfully or we need to use the default expression. Both of these may be thunks and both of these may need finalisers. However, in the former case this is taken care of elsewhere, the value will always be finalised already if necessary. In the latter case we may need to finalise the thunk resulting from the default expression. However, the thunk corresponding to the expression may never end up in the local's stack slot. Since finalisation goes by stack slot (and not constants), we need to prevent a case where we don't fall back to the default expression, but finalise anyways. Previously, we worked around this by making `OpFinalise` ignore non-thunks. Since finalisation of already evaluated thunks still crashed, the faulty compilation of function pattern arguments could still cause a crash. As a new approach, we reinstate the old behavior of `OpFinalise` to crash whenever encountering something that is either not a thunk or doesn't need finalisation. This can also help catching (similar) miscompilations in the future. To then prevent the crash, we need to track whether we have fallen back or not at runtime. This is done using an additional phantom on the stack that holds a new `FinaliseRequest` value. When it comes to finalisation we check this value and conditionally execute `OpFinalise` based on its value. Resolves b/261 and b/265 (partially). Change-Id: Ic04fb80ec671a2ba11fa645090769c335fb7f58b Reviewed-on: https://cl.tvl.fyi/c/depot/+/8705 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: sterni <sternenseemann@systemli.org>
2023-06-03 02:10:31 +02:00
Value::FinaliseRequest(_) => f.write_str("internal[finaliser_sentinel]"),
// Delegate thunk display to the type, as it must handle
// the case of already evaluated or cyclic thunks.
Value::Thunk(t) => t.total_fmt(f, set),
Value::Catchable(_) => panic!("total_fmt() called on a CatchableErrorKind"),
}
}
}
impl From<bool> for Value {
fn from(b: bool) -> Self {
Value::Bool(b)
}
}
impl From<i64> for Value {
fn from(i: i64) -> Self {
Self::Integer(i)
}
}
impl From<f64> for Value {
fn from(i: f64) -> Self {
Self::Float(i)
}
}
impl From<PathBuf> for Value {
fn from(path: PathBuf) -> Self {
Self::Path(Box::new(path))
}
}
fn type_error(expected: &'static str, actual: &Value) -> ErrorKind {
ErrorKind::TypeError {
expected,
actual: actual.type_of(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::mem::size_of;
#[test]
fn size() {
assert_eq!(size_of::<Value>(), 16);
}
mod floats {
use crate::value::total_fmt_float;
#[test]
fn format_float() {
let ff = [
(0f64, "0"),
(1.0f64, "1"),
(-0.01, "-0.01"),
(5e+22, "5e+22"),
(1e6, "1e+06"),
(-2E-2, "-0.02"),
(6.626e-34, "6.626e-34"),
(9_224_617.445_991_227, "9.22462e+06"),
];
for (n, expected) in ff.iter() {
let mut buf = String::new();
let res = total_fmt_float(*n, &mut buf);
assert!(res.is_ok());
assert_eq!(
expected, &buf,
"{} should be formatted as {}, but got {}",
n, expected, &buf
);
}
}
}
}