docs(tvix/eval): document abandoned thread-local vm
This commit adds a markdown document which explains how the thread-local VM infrastructure works, in case it is useful in the future. Change-Id: Id10e32a9e3c5fa38a15d4bec9800f7234c59234a Reviewed-on: https://cl.tvl.fyi/c/depot/+/7193 Tested-by: BuildkiteCI Autosubmit: Adam Joseph <adam@westernsemico.com> Reviewed-by: tazjin <tazjin@tvl.su>
This commit is contained in:
parent
83dabf8955
commit
e816d3a9dc
1 changed files with 233 additions and 0 deletions
233
tvix/eval/docs/abandoned/thread-local-vm.md
Normal file
233
tvix/eval/docs/abandoned/thread-local-vm.md
Normal file
|
@ -0,0 +1,233 @@
|
|||
# We can't have nice things because IFD
|
||||
|
||||
The thread-local VM work below was ultimately not merged because it
|
||||
was decided that it would be harmful for `tvix::eval::Value` to
|
||||
implement `Eq`, `Hash`, or any of the other `std` traits.
|
||||
|
||||
Implementing `std` traits on `Value` was deemed harmful because IFD
|
||||
can cause arbitrary amounts of compilation to occur, including
|
||||
network transactions with builders. Obviously it would be
|
||||
unexpected and error-prone to have a `PartialEq::eq()` which does
|
||||
something like this. This problem does not manifest within the
|
||||
"nixpkgs compatibility only" scope, or in any undeprecated language
|
||||
feature other than IFD. Although IFD is outside the "nixpkgs
|
||||
compatibility scope", it [has been added to the TVL compatibility
|
||||
scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).
|
||||
|
||||
This was the sole reason for not merging.
|
||||
|
||||
The explanation below may be useful in case future circumstances
|
||||
affect the relevance of the reasoning above.
|
||||
|
||||
The implementation can be found in these CLs:
|
||||
|
||||
- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
|
||||
- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
|
||||
- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
|
||||
- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)
|
||||
|
||||
# Thread-local storage for tvix::eval::vm::VM
|
||||
|
||||
## The problem
|
||||
|
||||
`Value::force()` takes a `&mut VM` argument, since forcing a value
|
||||
requires executing opcodes. This means that `Value::nix_eq()` too
|
||||
must take a `&mut VM`, since any sensible definition of equality
|
||||
will have to force thunks.
|
||||
|
||||
Unfortunately Rust's `PartialEq::eq()` function does not accept any
|
||||
additional arguments like this, so `Value` cannot implement
|
||||
`PartialEq`. Worse, structs which *contain* `Value`s can't
|
||||
implement `PartialEq` either. This means `Value`, and anything
|
||||
containing it, cannot be the key for a `BTreeMap` or `HashMap`. We
|
||||
can't even insert `Value`s into a `HashSet`!
|
||||
|
||||
There are other situations like this that don't involve `PartialEq`,
|
||||
but it's the most glaring one. The main problem is that you need a
|
||||
`VM` in order to force thunks, and thunks can be anywhere in a
|
||||
`Value`.
|
||||
|
||||
## Solving the problem with thread-locals
|
||||
|
||||
We could avoid threading the `&mut VM` through the entire codebase
|
||||
by making it a thread-local.
|
||||
|
||||
To do this without a performance hit, we need to use LLVM
|
||||
thread-locals, which are the same cost as references to `static`s
|
||||
but load relative to
|
||||
[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
|
||||
relative to the data segment. Unfortunately `#[thread_local]` [is
|
||||
unstable][thread-local-unstable] and [unsafe in
|
||||
general][thread-local-unsafe] for most of the cases where we would
|
||||
want to use it. There is one [exception][tls-const-init], however:
|
||||
if a `!thread_local()` has a `const` initializer, the compiler will
|
||||
insert a `#[thread_local]`; this special case is both safe and
|
||||
stable.
|
||||
|
||||
The difficult decision is what the type of the thread-local should
|
||||
be. Since you can't get a mutable reference to a `thread_local!()`
|
||||
it will have to be some interior-mutability-bestowing wrapper around
|
||||
our current `struct VM`. Here are the choices:
|
||||
|
||||
### `RefCell<VM>`
|
||||
|
||||
This is the obvious first choice, since it lets you borrow a
|
||||
`RefMut<Target=VM>`. The problem here is that we want to keep the
|
||||
codebase written such that all the functions in `impl VM` still take
|
||||
a `&mut self`. This means that there will be an active mutable
|
||||
borrow for the duration of `VM::call_builtin()`. So if we implement
|
||||
`PartialEq` by having `eq()` attempt a second mutable borrow from
|
||||
the thread-local storage, it will fail since there is already an
|
||||
active borrow.
|
||||
|
||||
The problem here is that you can't "unborrow" a `RefMut` except by
|
||||
dropping it. There's no way around this.
|
||||
|
||||
#### Problem: Uglification
|
||||
|
||||
The only solution here is to rewrite all the functions in `impl VM`
|
||||
so they don't take any kind of `self` argument, and then have them
|
||||
do a short-lived `.borrow_mut()` from the thread-local `RefCell`
|
||||
*separately, each time* they want to modify one of the fields of
|
||||
`VM` (currently `frames`, `stack`, `with_stack`, `warnings`). This
|
||||
means that if you had a code sequence like this:
|
||||
|
||||
```
|
||||
impl VM {
|
||||
fn foo(&mut self, ...) {
|
||||
...
|
||||
self.frame().ip += 1;
|
||||
self.some_other_method();
|
||||
self.frame().ip += 1;
|
||||
```
|
||||
|
||||
You would need to add *two separate `borrow_mut()`s*, one for each
|
||||
of the `self.frame().ip+=1` statements. You can't just do one big
|
||||
`borrow_mut()` because `some_other_method()` will call
|
||||
`borrow_mut()` and panic.
|
||||
|
||||
#### Problem: Performance
|
||||
|
||||
The `RefCell<VM>` approach also has a fairly huge performance hit,
|
||||
because every single modification to any part of `VM` will require a
|
||||
reference count increment/decrement, and a conditional branch based
|
||||
on the check (which will never fail) that the `RefCell` isn't
|
||||
already mutably borrowed. It will also impede a lot of rustc's
|
||||
optimizations.
|
||||
|
||||
### `Cell<VM>`
|
||||
|
||||
This is a non-starter because it means that in order to mutate any
|
||||
field of `VM`, you have to move the entire `struct VM` out of the
|
||||
`Cell`, mutate it, and move it back in.
|
||||
|
||||
### `Cell<Box<VM>>`
|
||||
|
||||
Now we're getting warmer. Here, we can move the `Box<VM>` out of
|
||||
the cell with a single pointer-sized memory access.
|
||||
|
||||
We don't want to do the "uglification" described in the previous
|
||||
section. We are very fortunate that, sometime in mid-2019, the Rust
|
||||
dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
|
||||
are bit-for-bit identical, and even gave us mortals safe wrappers
|
||||
[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
|
||||
`mem::transmute()`.
|
||||
|
||||
So now, when a `VM` method (which takes `&mut self`) calls out to
|
||||
some external code (like a builtin), instead of passing the `&mut
|
||||
self` to the external code it can call `Cell::from_mut(&mut self)`,
|
||||
and then `Cell::swap()` that into the thread-local storage cell for
|
||||
the duration of the external code. After the external code returns,
|
||||
it can `Cell::swap()` it back. This whole dance gets wrapped in a
|
||||
lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
|
||||
returned by `Cell::from_mut()` lives only until the end of the
|
||||
lexical block, *so we get the `&mut self` back after the close-brace
|
||||
for that block*. NLL FTW. This sounds like a lot of work, but it
|
||||
should compile down to two pointer-sized loads and two pointer-sized
|
||||
stores, and it is incurred basically only for `OpBuiltin`.
|
||||
|
||||
This all works, with only two issues:
|
||||
|
||||
1. `vm.rs` needs to be very careful to do the thread-local cell swap
|
||||
dance before calling anything that might call `PartialEq::eq()`
|
||||
(or any other method that expects to be able to pull the `VM` out
|
||||
of thread-local storage). There is no compile-time check that we
|
||||
did the dance in all the right places. If we forget to do the
|
||||
dance somewhere we'll get a runtime panic from `Option::expect()`
|
||||
(see next section).
|
||||
|
||||
2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
|
||||
than a bare `VM`, we still need to rewrite all of `vm.rs` so that
|
||||
every function takes a `&mut Box<VM>` instead of a `&mut self`.
|
||||
This creates a huge amount of "noise" in the code.
|
||||
|
||||
Fortunately, it turns out that nearly all the "noise" that arises
|
||||
from the second point can be eliminated by taking advantage of
|
||||
[deref coercions][deref-coercions]! This was the last "shoe to
|
||||
drop".
|
||||
|
||||
There is still the issue of having to be careful about calls from
|
||||
`vm.rs` to things outside that file, but it's manageable.
|
||||
|
||||
### `Cell<Option<Box<VM>>>`
|
||||
|
||||
In order to get the "safe and stable `#[thread_local]`"
|
||||
[exception][tls-const-init] we need a `const` initializer, which
|
||||
means we need to be able to put something into the `Cell` that isn't
|
||||
a `VM`. So the type needs to be `Cell<Option<Box<VM>>>`.
|
||||
|
||||
Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
|
||||
The latter type has the "is this a `Some` or `None`" bit immediately
|
||||
adjacent to the bits representing `T`. So if I hand you a `t:&T`
|
||||
and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
|
||||
This means that all the VM methods need to operate on an
|
||||
`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
|
||||
self` "at the last minute" before inserting it into the thread-local
|
||||
storage cell. Fortunately deref coercions save the day here too --
|
||||
the coercion is inferred through both layers (`Box` and `Option`) of
|
||||
wrapper, so there is no additional noise in the code.
|
||||
|
||||
Note that Rust is clever and can find some sequence of bits that
|
||||
aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`. And in fact,
|
||||
`Box<T>` is one of these cases (and this is guaranteed). So the
|
||||
`Option` has no overhead.
|
||||
|
||||
# Closing thoughts, language-level support
|
||||
|
||||
This would have been easier with language-level support.
|
||||
|
||||
## What wouldn't help
|
||||
|
||||
Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
|
||||
T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
|
||||
thing as a `Cell<T>`, so it wouldn't be safe for the standard
|
||||
library to contain something like this:
|
||||
|
||||
```
|
||||
impl<T> LocalKey<Cell<T>> {
|
||||
fn get_mut(&self) -> &mut T {
|
||||
unsafe {
|
||||
// ... mem::transmute() voodoo goes here ...
|
||||
```
|
||||
|
||||
The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
|
||||
end up with two `&mut T`s that point to the same thing (mutable aliasing) which
|
||||
results in undefined behavior.
|
||||
|
||||
## What would help
|
||||
|
||||
The ideal solution is for Rust to let you call arbitrary methods
|
||||
`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`. This way you can
|
||||
have one (and only one) `&mut T` at any syntactical point in your
|
||||
program -- the `&mut self`.
|
||||
|
||||
|
||||
[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
|
||||
[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
|
||||
[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
|
||||
[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
|
||||
[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
|
||||
[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
|
||||
[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
|
||||
[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
|
||||
[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic
|
Loading…
Reference in a new issue