tvl-depot/tvix/eval/src/chunk.rs

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

301 lines
9.5 KiB
Rust
Raw Normal View History

refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
use crate::opcode::{CodeIdx, ConstantIdx, Op, OpArg};
use crate::value::Value;
use crate::{CoercionKind, SourceCode};
use std::io::Write;
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
/// Maximum size of a u64 encoded in the vu128 varint encoding.
const U64_VARINT_SIZE: usize = 9;
/// Represents a source location from which one or more operations
/// were compiled.
///
/// The span itself is an index into a [codemap::CodeMap], and the
/// structure tracks the number of operations that were yielded from
/// the same span.
///
/// At error reporting time, it becomes possible to either just fetch
/// the textual representation of that span from the codemap, or to
/// even re-parse the AST using rnix to create more semantically
/// interesting errors.
#[derive(Clone, Debug, PartialEq)]
struct SourceSpan {
/// Span into the [codemap::CodeMap].
span: codemap::Span,
/// Index of the first operation covered by this span.
start: usize,
}
/// A chunk is a representation of a sequence of bytecode
/// instructions, associated constants and additional metadata as
/// emitted by the compiler.
#[derive(Debug, Default)]
pub struct Chunk {
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
pub code: Vec<u8>,
pub constants: Vec<Value>,
spans: Vec<SourceSpan>,
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
/// Index of the last operation (i.e. not data) written to the code vector.
/// Some operations (e.g. jump patching) need to know this.
last_op: usize,
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
impl Chunk {
pub fn push_op(&mut self, op: Op, span: codemap::Span) -> usize {
self.last_op = self.code.len();
self.code.push(op as u8);
self.push_span(span, self.last_op);
self.last_op
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
pub fn push_uvarint(&mut self, data: u64) {
let mut encoded = [0u8; U64_VARINT_SIZE];
let bytes_written = vu128::encode_u64(&mut encoded, data);
self.code.extend_from_slice(&encoded[..bytes_written]);
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
pub fn read_uvarint(&self, idx: usize) -> (u64, usize) {
debug_assert!(
idx < self.code.len(),
"invalid bytecode (missing varint operand)",
);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
if self.code.len() - idx >= U64_VARINT_SIZE {
vu128::decode_u64(
&self.code[idx..idx + U64_VARINT_SIZE]
.try_into()
.expect("size statically checked"),
)
} else {
let mut tmp = [0u8; U64_VARINT_SIZE];
tmp[..self.code.len() - idx].copy_from_slice(&self.code[idx..]);
vu128::decode_u64(&tmp)
}
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
pub fn push_u16(&mut self, data: u16) {
self.code.extend_from_slice(&data.to_le_bytes())
feat(tvix/eval): deduplicate overlap between Closure and Thunk This commit deduplicates the Thunk-like functionality from Closure and unifies it with Thunk. Specifically, we now have one and only one way of breaking reference cycles in the Value-graph: Thunk. No other variant contains a RefCell. This should make it easier to reason about the behavior of the VM. InnerClosure and UpvaluesCarrier are no longer necessary. This refactoring allowed an improvement in code generation: `Rc<RefCell<>>`s are now created only for closures which do not have self-references or deferred upvalues, instead of for all closures. OpClosure has been split into two separate opcodes: - OpClosure creates non-recursive closures with no deferred upvalues. The VM will not create an `Rc<RefCell<>>` when executing this instruction. - OpThunkClosure is used for closures with self-references or deferred upvalues. The VM will create a Thunk when executing this opcode, but the Thunk will start out already in the `ThunkRepr::Evaluated` state, rather than in the `ThunkRepr::Suspeneded` state. To avoid confusion, OpThunk has been renamed OpThunkSuspended. Thanks to @sterni for suggesting that all this could be done without adding an additional variant to ThunkRepr. This does however mean that there will be mutating accesses to `ThunkRepr::Evaluated`, which was not previously the case. The field `is_finalised:bool` has been added to `Closure` to ensure that these mutating accesses are performed only on finalised Closures. Both the check and the field are present only if `#[cfg(debug_assertions)]`. Change-Id: I04131501029772f30e28da8281d864427685097f Signed-off-by: Adam Joseph <adam@westernsemico.com> Reviewed-on: https://cl.tvl.fyi/c/depot/+/7019 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2022-10-16 01:10:10 +02:00
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
/// Patches the argument to the jump operand of the jump at the given index
/// to point to the *next* instruction that will be emitted.
pub fn patch_jump(&mut self, idx: usize) {
let offset = (self.code.len() - idx - /* arg idx = */ 1 - /* jump arg size = */ 2) as u16;
self.code[idx + 1..idx + 3].copy_from_slice(&offset.to_le_bytes())
}
pub fn read_u16(&self, idx: usize) -> u16 {
if idx + 2 > self.code.len() {
panic!("Tvix bug: invalid bytecode (expected u16 operand not found)")
}
let byte_array: &[u8; 2] = &self.code[idx..idx + 2]
.try_into()
.expect("fixed-size slice can not fail to convert to array");
u16::from_le_bytes(*byte_array)
}
/// Get the first span of a chunk, no questions asked.
pub fn first_span(&self) -> codemap::Span {
self.spans[0].span
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
/// Return the last op in the chunk together with its index, if any.
pub fn last_op(&self) -> Option<(Op, usize)> {
if self.code.is_empty() {
return None;
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
Some((self.code[self.last_op].into(), self.last_op))
}
pub fn push_constant(&mut self, data: Value) -> ConstantIdx {
let idx = self.constants.len();
self.constants.push(data);
ConstantIdx(idx)
}
/// Return a reference to the constant at the given [`ConstantIdx`]
pub fn get_constant(&self, constant: ConstantIdx) -> Option<&Value> {
self.constants.get(constant.0)
}
fn push_span(&mut self, span: codemap::Span, start: usize) {
match self.spans.last_mut() {
// We do not need to insert the same span again, as this
// instruction was compiled from the same span as the last
// one.
Some(last) if last.span == span => {}
// In all other cases, this is a new source span.
_ => self.spans.push(SourceSpan { span, start }),
}
}
/// Retrieve the [codemap::Span] from which the instruction at
/// `offset` was compiled.
pub fn get_span(&self, offset: CodeIdx) -> codemap::Span {
let position = self
.spans
.binary_search_by(|span| span.start.cmp(&offset.0));
let span = match position {
Ok(index) => &self.spans[index],
Err(index) => {
if index == 0 {
&self.spans[0]
} else {
&self.spans[index - 1]
}
}
};
span.span
}
/// Write the disassembler representation of the operation at
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
/// `idx` to the specified writer, and return how many bytes in the code to
/// skip for the next op.
pub fn disassemble_op<W: Write>(
&self,
writer: &mut W,
source: &SourceCode,
width: usize,
idx: CodeIdx,
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
) -> Result<usize, std::io::Error> {
write!(writer, "{:#width$x}\t ", idx.0, width = width)?;
// Print continuation character if the previous operation was at
// the same line, otherwise print the line.
let line = source.get_line(self.get_span(idx));
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
if idx.0 > 0 && source.get_line(self.get_span(idx - 1)) == line {
write!(writer, " |\t")?;
} else {
write!(writer, "{:4}\t", line)?;
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let _fmt_constant = |idx: ConstantIdx| match &self.constants[idx.0] {
Value::Thunk(t) => t.debug_repr(),
Value::Closure(c) => format!("closure({:p})", c.lambda),
Value::Blueprint(b) => format!("blueprint({:p})", b),
val => format!("{}", val),
};
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let op: Op = self.code[idx.0].into();
match op.arg_type() {
OpArg::None => {
writeln!(writer, "Op{:?}", op)?;
Ok(1)
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
OpArg::Fixed => {
let arg = self.read_u16(idx.0 + 1);
writeln!(writer, "Op{:?}({})", op, arg)?;
Ok(3)
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
OpArg::Uvarint => {
let (arg, size) = self.read_uvarint(idx.0 + 1);
writeln!(writer, "Op{:?}({})", op, arg)?;
Ok(1 + size)
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
_ => match op {
Op::CoerceToString => {
let kind: CoercionKind = self.code[idx.0 + 1].into();
writeln!(writer, "Op{:?}({:?})", op, kind)?;
Ok(2)
}
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
Op::Closure | Op::ThunkClosure | Op::ThunkSuspended => {
let mut cidx = idx.0 + 1;
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let (bp_idx, size) = self.read_uvarint(cidx);
cidx += size;
let (packed_count, size) = self.read_uvarint(cidx);
cidx += size;
let captures_with = packed_count & 0b1 == 1;
let count = packed_count >> 1;
write!(writer, "Op{:?}(BP @ {}, ", op, bp_idx)?;
if captures_with {
write!(writer, "captures with, ")?;
}
writeln!(writer, "{} upvalues)", count)?;
for _ in 0..count {
let (_, size) = self.read_uvarint(cidx);
cidx += size;
}
Ok(cidx - idx.0)
}
_ => panic!("Tvix bug: don't know how to format argument for Op{:?}", op),
},
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::dummy_span;
// Note: These tests are about the functionality of the `Chunk` type, the
// opcodes used below do *not* represent valid, executable Tvix code (and
// don't need to).
#[test]
fn push_op() {
let mut chunk = Chunk::default();
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let idx = chunk.push_op(Op::Add, dummy_span());
assert_eq!(*chunk.code.last().unwrap(), Op::Add as u8);
assert_eq!(chunk.code[idx], Op::Add as u8);
}
#[test]
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
fn push_op_with_arg() {
let mut chunk = Chunk::default();
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let mut idx = chunk.push_op(Op::Constant, dummy_span());
chunk.push_uvarint(42);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
assert_eq!(chunk.code[idx], Op::Constant as u8);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
idx += 1;
let (arg, size) = chunk.read_uvarint(idx);
assert_eq!(idx + size, chunk.code.len());
assert_eq!(arg, 42);
}
#[test]
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
fn push_jump() {
let mut chunk = Chunk::default();
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
chunk.push_op(Op::Constant, dummy_span());
chunk.push_uvarint(0);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
let idx = chunk.push_op(Op::Jump, dummy_span());
chunk.push_u16(0);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
chunk.push_op(Op::Constant, dummy_span());
chunk.push_uvarint(1);
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
chunk.patch_jump(idx);
chunk.push_op(Op::Return, dummy_span());
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
#[rustfmt::skip]
let expected: Vec<u8> = vec![
Op::Constant as u8, 0,
Op::Jump as u8, 2, 0,
Op::Constant as u8, 1,
Op::Return as u8,
];
refactor(tvix/eval): ensure VM operations fit in a single byte This replaces the OpCode enum with a new Op enum which is guaranteed to fit in a single byte. Instead of carrying enum variants with data, every variant that has runtime data encodes it into the `Vec<u8>` that a `Chunk` now carries. This has several advantages: * Less stack space is required at runtime, and fewer allocations are required while compiling. * The OpCode doesn't need to carry "weird" special-cased data variants anymore. * It is faster (albeit, not by much). On my laptop, results consistently look approximately like this: Benchmark 1: ./before -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.224 s ± 0.272 s [User: 7.149 s, System: 0.688 s] Range (min … max): 7.759 s … 8.583 s 10 runs Benchmark 2: ./after -E '(import <nixpkgs> {}).firefox.outPath' --log-level ERROR --no-warnings Time (mean ± σ): 8.000 s ± 0.198 s [User: 7.036 s, System: 0.633 s] Range (min … max): 7.718 s … 8.334 s 10 runs See notes below for why the performance impact might be less than expected. * It is faster while at the same time dropping some optimisations we previously performed. This has several disadvantages: * The code is closer to how one would write it in C or Go. * Bit shifting! * There is (for now) slightly more code than before. On performance I have the following thoughts at the moment: In order to prepare for adding GC, there's a couple of places in Tvix where I'd like to fence off certain kinds of complexity (such as mutating bytecode, which, for various reaons, also has to be part of data that is subject to GC). With this change, we can drop optimisations like retroactively modifying existing bytecode and *still* achieve better performance than before. I believe that this is currently worth it to pave the way for changes that are more significant for performance. In general this also opens other avenues of optimisation: For example, we can profile which argument sizes actually exist and remove the copy overhead of varint decoding (which does show up in profiles) by using more adequately sized types for, e.g., constant indices. Known regressions: * Op::Constant is no longer printing its values in disassembly (this can be fixed, I just didn't get around to it, will do separately). Change-Id: Id9b3a4254623a45de03069dbdb70b8349e976743 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12191 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-08-10 22:59:38 +02:00
assert_eq!(chunk.code, expected);
}
}