tvl-depot/tvix/eval/docs/abandoned/thread-local-vm.md

# We can't have nice things because IFD

The thread-local VM work below was ultimately not merged because it
was decided that it would be harmful for `tvix::eval::Value` to
implement `Eq`, `Hash`, or any of the other `std` traits.

Implementing `std` traits on `Value` was deemed harmful because IFD
can cause arbitrary amounts of compilation to occur, including
network transactions with builders.  Obviously it would be
unexpected and error-prone to have a `PartialEq::eq()` which does
something like this.  This problem does not manifest within the
"nixpkgs compatibility only" scope, or in any undeprecated language
feature other than IFD.  Although IFD is outside the "nixpkgs
compatibility scope", it [has been added to the TVL compatibility
scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).

This was the sole reason for not merging.

The explanation below may be useful in case future circumstances
affect the relevance of the reasoning above.

The implementation can be found in these CLs:

- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)

# Thread-local storage for tvix::eval::vm::VM

## The problem

`Value::force()` takes a `&mut VM` argument, since forcing a value
requires executing opcodes.  This means that `Value::nix_eq()` too
must take a `&mut VM`, since any sensible definition of equality
will have to force thunks.

Unfortunately Rust's `PartialEq::eq()` function does not accept any
additional arguments like this, so `Value` cannot implement
`PartialEq`.  Worse, structs which *contain* `Value`s can't
implement `PartialEq` either.  This means `Value`, and anything
containing it, cannot be the key for a `BTreeMap` or `HashMap`.  We
can't even insert `Value`s into a `HashSet`!

There are other situations like this that don't involve `PartialEq`,
but it's the most glaring one.  The main problem is that you need a
`VM` in order to force thunks, and thunks can be anywhere in a
`Value`.

## Solving the problem with thread-locals

We could avoid threading the `&mut VM` through the entire codebase
by making it a thread-local.

To do this without a performance hit, we need to use LLVM
thread-locals, which are the same cost as references to `static`s
but load relative to
[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
relative to the data segment.  Unfortunately `#[thread_local]` [is
unstable][thread-local-unstable] and [unsafe in
general][thread-local-unsafe] for most of the cases where we would
want to use it.  There is one [exception][tls-const-init], however:
if a `!thread_local()` has a `const` initializer, the compiler will
insert a `#[thread_local]`; this special case is both safe and
stable.

The difficult decision is what the type of the thread-local should
be.  Since you can't get a mutable reference to a `thread_local!()`
it will have to be some interior-mutability-bestowing wrapper around
our current `struct VM`.  Here are the choices:

### `RefCell<VM>`

This is the obvious first choice, since it lets you borrow a
`RefMut<Target=VM>`.  The problem here is that we want to keep the
codebase written such that all the functions in `impl VM` still take
a `&mut self`.  This means that there will be an active mutable
borrow for the duration of `VM::call_builtin()`.  So if we implement
`PartialEq` by having `eq()` attempt a second mutable borrow from
the thread-local storage, it will fail since there is already an
active borrow.

The problem here is that you can't "unborrow" a `RefMut` except by
dropping it.  There's no way around this.

#### Problem: Uglification

The only solution here is to rewrite all the functions in `impl VM`
so they don't take any kind of `self` argument, and then have them
do a short-lived `.borrow_mut()` from the thread-local `RefCell`
*separately, each time* they want to modify one of the fields of
`VM` (currently `frames`, `stack`, `with_stack`, `warnings`).  This
means that if you had a code sequence like this:

```
impl VM {
  fn foo(&mut self, ...) {
    ...
    self.frame().ip += 1;
    self.some_other_method();
    self.frame().ip += 1;
```

You would need to add *two separate `borrow_mut()`s*, one for each
of the `self.frame().ip+=1` statements.  You can't just do one big
`borrow_mut()` because `some_other_method()` will call
`borrow_mut()` and panic.

#### Problem: Performance

The `RefCell<VM>` approach also has a fairly huge performance hit,
because every single modification to any part of `VM` will require a
reference count increment/decrement, and a conditional branch based
on the check (which will never fail) that the `RefCell` isn't
already mutably borrowed.  It will also impede a lot of rustc's
optimizations.

### `Cell<VM>`

This is a non-starter because it means that in order to mutate any
field of `VM`, you have to move the entire `struct VM` out of the
`Cell`, mutate it, and move it back in.

### `Cell<Box<VM>>`

Now we're getting warmer.  Here, we can move the `Box<VM>` out of
the cell with a single pointer-sized memory access.

We don't want to do the "uglification" described in the previous
section.  We are very fortunate that, sometime in mid-2019, the Rust
dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
are bit-for-bit identical, and even gave us mortals safe wrappers
[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
`mem::transmute()`.

So now, when a `VM` method (which takes `&mut self`) calls out to
some external code (like a builtin), instead of passing the `&mut
self` to the external code it can call `Cell::from_mut(&mut self)`,
and then `Cell::swap()` that into the thread-local storage cell for
the duration of the external code.  After the external code returns,
it can `Cell::swap()` it back.  This whole dance gets wrapped in a
lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
returned by `Cell::from_mut()` lives only until the end of the
lexical block, *so we get the `&mut self` back after the close-brace
for that block*.  NLL FTW.  This sounds like a lot of work, but it
should compile down to two pointer-sized loads and two pointer-sized
stores, and it is incurred basically only for `OpBuiltin`.

This all works, with only two issues:

1. `vm.rs` needs to be very careful to do the thread-local cell swap
   dance before calling anything that might call `PartialEq::eq()`
   (or any other method that expects to be able to pull the `VM` out
   of thread-local storage).  There is no compile-time check that we
   did the dance in all the right places.  If we forget to do the
   dance somewhere we'll get a runtime panic from `Option::expect()`
   (see next section).

2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
   than a bare `VM`, we still need to rewrite all of `vm.rs` so that
   every function takes a `&mut Box<VM>` instead of a `&mut self`.
   This creates a huge amount of "noise" in the code.

Fortunately, it turns out that nearly all the "noise" that arises
from the second point can be eliminated by taking advantage of
[deref coercions][deref-coercions]!  This was the last "shoe to
drop".

There is still the issue of having to be careful about calls from
`vm.rs` to things outside that file, but it's manageable.

### `Cell<Option<Box<VM>>>`

In order to get the "safe and stable `#[thread_local]`"
[exception][tls-const-init] we need a `const` initializer, which
means we need to be able to put something into the `Cell` that isn't
a `VM`.  So the type needs to be `Cell<Option<Box<VM>>>`.

Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
The latter type has the "is this a `Some` or `None`" bit immediately
adjacent to the bits representing `T`.  So if I hand you a `t:&T`
and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
This means that all the VM methods need to operate on an
`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
self` "at the last minute" before inserting it into the thread-local
storage cell.  Fortunately deref coercions save the day here too --
the coercion is inferred through both layers (`Box` and `Option`) of
wrapper, so there is no additional noise in the code.

Note that Rust is clever and can find some sequence of bits that
aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`.  And in fact,
`Box<T>` is one of these cases (and this is guaranteed).  So the
`Option` has no overhead.

# Closing thoughts, language-level support

This would have been easier with language-level support.

## What wouldn't help

Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
thing as a `Cell<T>`, so it wouldn't be safe for the standard
library to contain something like this:

```
impl<T> LocalKey<Cell<T>> {
  fn get_mut(&self) -> &mut T {
    unsafe {
      // ... mem::transmute() voodoo goes here ...
```

The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
end up with two `&mut T`s that point to the same thing (mutable aliasing) which
results in undefined behavior.

## What would help

The ideal solution is for Rust to let you call arbitrary methods
`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`.  This way you can
have one (and only one) `&mut T` at any syntactical point in your
program -- the `&mut self`.


[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic
docs(tvix/eval): document abandoned thread-local vm This commit adds a markdown document which explains how the thread-local VM infrastructure works, in case it is useful in the future. Change-Id: Id10e32a9e3c5fa38a15d4bec9800f7234c59234a Reviewed-on: https://cl.tvl.fyi/c/depot/+/7193 Tested-by: BuildkiteCI Autosubmit: Adam Joseph <adam@westernsemico.com> Reviewed-by: tazjin <tazjin@tvl.su> 2022-11-05 20:10:47 +01:00			`# We can't have nice things because IFD`

			`The thread-local VM work below was ultimately not merged because it`
			was decided that it would be harmful for `tvix::eval::Value` to
			implement `Eq`, `Hash`, or any of the other `std` traits.

			Implementing `std` traits on `Value` was deemed harmful because IFD
			`can cause arbitrary amounts of compilation to occur, including`
			`network transactions with builders. Obviously it would be`
			unexpected and error-prone to have a `PartialEq::eq()` which does
			`something like this. This problem does not manifest within the`
			`"nixpkgs compatibility only" scope, or in any undeprecated language`
			`feature other than IFD. Although IFD is outside the "nixpkgs`
			`compatibility scope", it [has been added to the TVL compatibility`
			`scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).`

			`This was the sole reason for not merging.`

			`The explanation below may be useful in case future circumstances`
			`affect the relevance of the reasoning above.`

			`The implementation can be found in these CLs:`

			`- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)`
			`- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)`
			`- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)`
			- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)

			`# Thread-local storage for tvix::eval::vm::VM`

			`## The problem`

			`Value::force()` takes a `&mut VM` argument, since forcing a value
			requires executing opcodes. This means that `Value::nix_eq()` too
			must take a `&mut VM`, since any sensible definition of equality
			`will have to force thunks.`

			Unfortunately Rust's `PartialEq::eq()` function does not accept any
			additional arguments like this, so `Value` cannot implement
			`PartialEq`. Worse, structs which contain `Value`s can't
			implement `PartialEq` either. This means `Value`, and anything
			containing it, cannot be the key for a `BTreeMap` or `HashMap`. We
			can't even insert `Value`s into a `HashSet`!

			There are other situations like this that don't involve `PartialEq`,
			`but it's the most glaring one. The main problem is that you need a`
			`VM` in order to force thunks, and thunks can be anywhere in a
			`Value`.

			`## Solving the problem with thread-locals`

			We could avoid threading the `&mut VM` through the entire codebase
			`by making it a thread-local.`

			`To do this without a performance hit, we need to use LLVM`
			thread-locals, which are the same cost as references to `static`s
			`but load relative to`
			[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
			relative to the data segment. Unfortunately `#[thread_local]` [is
			`unstable][thread-local-unstable] and [unsafe in`
			`general][thread-local-unsafe] for most of the cases where we would`
			`want to use it. There is one [exception][tls-const-init], however:`
			if a `!thread_local()` has a `const` initializer, the compiler will
			insert a `#[thread_local]`; this special case is both safe and
			`stable.`

			`The difficult decision is what the type of the thread-local should`
			be. Since you can't get a mutable reference to a `thread_local!()`
			`it will have to be some interior-mutability-bestowing wrapper around`
			our current `struct VM`. Here are the choices:

			### `RefCell<VM>`

			`This is the obvious first choice, since it lets you borrow a`
			`RefMut<Target=VM>`. The problem here is that we want to keep the
			codebase written such that all the functions in `impl VM` still take
			a `&mut self`. This means that there will be an active mutable
			borrow for the duration of `VM::call_builtin()`. So if we implement
			`PartialEq` by having `eq()` attempt a second mutable borrow from
			`the thread-local storage, it will fail since there is already an`
			`active borrow.`

			The problem here is that you can't "unborrow" a `RefMut` except by
			`dropping it. There's no way around this.`

			`#### Problem: Uglification`

			The only solution here is to rewrite all the functions in `impl VM`
			so they don't take any kind of `self` argument, and then have them
			do a short-lived `.borrow_mut()` from the thread-local `RefCell`
			`separately, each time they want to modify one of the fields of`
			`VM` (currently `frames`, `stack`, `with_stack`, `warnings`). This
			`means that if you had a code sequence like this:`

			```
			`impl VM {`
			`fn foo(&mut self, ...) {`
			`...`
			`self.frame().ip += 1;`
			`self.some_other_method();`
			`self.frame().ip += 1;`
			```

			You would need to add two separate `borrow_mut()`s, one for each
			of the `self.frame().ip+=1` statements. You can't just do one big
			`borrow_mut()` because `some_other_method()` will call
			`borrow_mut()` and panic.

			`#### Problem: Performance`

			The `RefCell<VM>` approach also has a fairly huge performance hit,
			because every single modification to any part of `VM` will require a
			`reference count increment/decrement, and a conditional branch based`
			on the check (which will never fail) that the `RefCell` isn't
			`already mutably borrowed. It will also impede a lot of rustc's`
			`optimizations.`

			### `Cell<VM>`

			`This is a non-starter because it means that in order to mutate any`
			field of `VM`, you have to move the entire `struct VM` out of the
			`Cell`, mutate it, and move it back in.

			### `Cell<Box<VM>>`

			Now we're getting warmer. Here, we can move the `Box<VM>` out of
			`the cell with a single pointer-sized memory access.`

			`We don't want to do the "uglification" described in the previous`
			`section. We are very fortunate that, sometime in mid-2019, the Rust`
			dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
			`are bit-for-bit identical, and even gave us mortals safe wrappers`
			[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
			`mem::transmute()`.

			So now, when a `VM` method (which takes `&mut self`) calls out to
			some external code (like a builtin), instead of passing the `&mut
			self` to the external code it can call `Cell::from_mut(&mut self)`,
			and then `Cell::swap()` that into the thread-local storage cell for
			`the duration of the external code. After the external code returns,`
			it can `Cell::swap()` it back. This whole dance gets wrapped in a
			lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
			returned by `Cell::from_mut()` lives only until the end of the
			lexical block, *so we get the `&mut self` back after the close-brace
			`for that block*. NLL FTW. This sounds like a lot of work, but it`
			`should compile down to two pointer-sized loads and two pointer-sized`
			stores, and it is incurred basically only for `OpBuiltin`.

			`This all works, with only two issues:`

			1. `vm.rs` needs to be very careful to do the thread-local cell swap
			dance before calling anything that might call `PartialEq::eq()`
			(or any other method that expects to be able to pull the `VM` out
			`of thread-local storage). There is no compile-time check that we`
			`did the dance in all the right places. If we forget to do the`
			dance somewhere we'll get a runtime panic from `Option::expect()`
			`(see next section).`

			2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
			than a bare `VM`, we still need to rewrite all of `vm.rs` so that
			every function takes a `&mut Box<VM>` instead of a `&mut self`.
			`This creates a huge amount of "noise" in the code.`

			`Fortunately, it turns out that nearly all the "noise" that arises`
			`from the second point can be eliminated by taking advantage of`
			`[deref coercions][deref-coercions]! This was the last "shoe to`
			`drop".`

			`There is still the issue of having to be careful about calls from`
			`vm.rs` to things outside that file, but it's manageable.

			### `Cell<Option<Box<VM>>>`

			In order to get the "safe and stable `#[thread_local]`"
			[exception][tls-const-init] we need a `const` initializer, which
			means we need to be able to put something into the `Cell` that isn't
			a `VM`. So the type needs to be `Cell<Option<Box<VM>>>`.

			Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
			The latter type has the "is this a `Some` or `None`" bit immediately
			adjacent to the bits representing `T`. So if I hand you a `t:&T`
			and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
			`This means that all the VM methods need to operate on an`
			`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
			self` "at the last minute" before inserting it into the thread-local
			`storage cell. Fortunately deref coercions save the day here too --`
			the coercion is inferred through both layers (`Box` and `Option`) of
			`wrapper, so there is no additional noise in the code.`

			`Note that Rust is clever and can find some sequence of bits that`
			aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`. And in fact,
			`Box<T>` is one of these cases (and this is guaranteed). So the
			`Option` has no overhead.

			`# Closing thoughts, language-level support`

			`This would have been easier with language-level support.`

			`## What wouldn't help`

			Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
			T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
			thing as a `Cell<T>`, so it wouldn't be safe for the standard
			`library to contain something like this:`

			```
			`impl<T> LocalKey<Cell<T>> {`
			`fn get_mut(&self) -> &mut T {`
			`unsafe {`
			`// ... mem::transmute() voodoo goes here ...`
			```

			The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
			end up with two `&mut T`s that point to the same thing (mutable aliasing) which
			`results in undefined behavior.`

			`## What would help`

			`The ideal solution is for Rust to let you call arbitrary methods`
			`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`. This way you can
			have one (and only one) `&mut T` at any syntactical point in your
			program -- the `&mut self`.


			`[tls-const-init]: https://github.com/rust-lang/rust/pull/90774`
			`[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594`
			`[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366`
			`[fiat-decree]: https://github.com/rust-lang/rust/issues/43038`
			`[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut`
			`[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut`
			`[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]`
			`[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods`
			`[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic`