This commit adds a markdown document which explains how the thread-local VM infrastructure works, in case it is useful in the future. Change-Id: Id10e32a9e3c5fa38a15d4bec9800f7234c59234a Reviewed-on: https://cl.tvl.fyi/c/depot/+/7193 Tested-by: BuildkiteCI Autosubmit: Adam Joseph <adam@westernsemico.com> Reviewed-by: tazjin <tazjin@tvl.su>
10 KiB
We can't have nice things because IFD
The thread-local VM work below was ultimately not merged because it
was decided that it would be harmful for tvix::eval::Value
to
implement Eq
, Hash
, or any of the other std
traits.
Implementing std
traits on Value
was deemed harmful because IFD
can cause arbitrary amounts of compilation to occur, including
network transactions with builders. Obviously it would be
unexpected and error-prone to have a PartialEq::eq()
which does
something like this. This problem does not manifest within the
"nixpkgs compatibility only" scope, or in any undeprecated language
feature other than IFD. Although IFD is outside the "nixpkgs
compatibility scope", it has been added to the TVL compatibility
scope.
This was the sole reason for not merging.
The explanation below may be useful in case future circumstances affect the relevance of the reasoning above.
The implementation can be found in these CLs:
- refactor(tvix/eval): remove lifetime parameter from VM<'o>
- feat(tvix/eval): [FOUNDLING] thread-local VM
- feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods
- refactor(tvix/eval): [FOUNDLING]: drop explicit
&mut vm
parameter
Thread-local storage for tvix::eval::vm::VM
The problem
Value::force()
takes a &mut VM
argument, since forcing a value
requires executing opcodes. This means that Value::nix_eq()
too
must take a &mut VM
, since any sensible definition of equality
will have to force thunks.
Unfortunately Rust's PartialEq::eq()
function does not accept any
additional arguments like this, so Value
cannot implement
PartialEq
. Worse, structs which contain Value
s can't
implement PartialEq
either. This means Value
, and anything
containing it, cannot be the key for a BTreeMap
or HashMap
. We
can't even insert Value
s into a HashSet
!
There are other situations like this that don't involve PartialEq
,
but it's the most glaring one. The main problem is that you need a
VM
in order to force thunks, and thunks can be anywhere in a
Value
.
Solving the problem with thread-locals
We could avoid threading the &mut VM
through the entire codebase
by making it a thread-local.
To do this without a performance hit, we need to use LLVM
thread-locals, which are the same cost as references to static
s
but load relative to
llvm.threadlocal.address
instead of
relative to the data segment. Unfortunately #[thread_local]
is
unstable and unsafe in
general for most of the cases where we would
want to use it. There is one exception, however:
if a !thread_local()
has a const
initializer, the compiler will
insert a #[thread_local]
; this special case is both safe and
stable.
The difficult decision is what the type of the thread-local should
be. Since you can't get a mutable reference to a thread_local!()
it will have to be some interior-mutability-bestowing wrapper around
our current struct VM
. Here are the choices:
RefCell<VM>
This is the obvious first choice, since it lets you borrow a
RefMut<Target=VM>
. The problem here is that we want to keep the
codebase written such that all the functions in impl VM
still take
a &mut self
. This means that there will be an active mutable
borrow for the duration of VM::call_builtin()
. So if we implement
PartialEq
by having eq()
attempt a second mutable borrow from
the thread-local storage, it will fail since there is already an
active borrow.
The problem here is that you can't "unborrow" a RefMut
except by
dropping it. There's no way around this.
Problem: Uglification
The only solution here is to rewrite all the functions in impl VM
so they don't take any kind of self
argument, and then have them
do a short-lived .borrow_mut()
from the thread-local RefCell
separately, each time they want to modify one of the fields of
VM
(currently frames
, stack
, with_stack
, warnings
). This
means that if you had a code sequence like this:
impl VM {
fn foo(&mut self, ...) {
...
self.frame().ip += 1;
self.some_other_method();
self.frame().ip += 1;
You would need to add two separate borrow_mut()
s, one for each
of the self.frame().ip+=1
statements. You can't just do one big
borrow_mut()
because some_other_method()
will call
borrow_mut()
and panic.
Problem: Performance
The RefCell<VM>
approach also has a fairly huge performance hit,
because every single modification to any part of VM
will require a
reference count increment/decrement, and a conditional branch based
on the check (which will never fail) that the RefCell
isn't
already mutably borrowed. It will also impede a lot of rustc's
optimizations.
Cell<VM>
This is a non-starter because it means that in order to mutate any
field of VM
, you have to move the entire struct VM
out of the
Cell
, mutate it, and move it back in.
Cell<Box<VM>>
Now we're getting warmer. Here, we can move the Box<VM>
out of
the cell with a single pointer-sized memory access.
We don't want to do the "uglification" described in the previous
section. We are very fortunate that, sometime in mid-2019, the Rust
dieties decreed by fiat that &Cell<T>
and &mut T
are bit-for-bit identical, and even gave us mortals safe wrappers
from_mut()
and get_mut()
around
mem::transmute()
.
So now, when a VM
method (which takes &mut self
) calls out to
some external code (like a builtin), instead of passing the &mut self
to the external code it can call Cell::from_mut(&mut self)
,
and then Cell::swap()
that into the thread-local storage cell for
the duration of the external code. After the external code returns,
it can Cell::swap()
it back. This whole dance gets wrapped in a
lexical block, and the borrow checker sees that the &Cell<Box<VM>>
returned by Cell::from_mut()
lives only until the end of the
lexical block, so we get the &mut self
back after the close-brace
for that block. NLL FTW. This sounds like a lot of work, but it
should compile down to two pointer-sized loads and two pointer-sized
stores, and it is incurred basically only for OpBuiltin
.
This all works, with only two issues:
-
vm.rs
needs to be very careful to do the thread-local cell swap dance before calling anything that might callPartialEq::eq()
(or any other method that expects to be able to pull theVM
out of thread-local storage). There is no compile-time check that we did the dance in all the right places. If we forget to do the dance somewhere we'll get a runtime panic fromOption::expect()
(see next section). -
Since we need to call
Cell::from_mut()
on aBox<VM>
rather than a bareVM
, we still need to rewrite all ofvm.rs
so that every function takes a&mut Box<VM>
instead of a&mut self
. This creates a huge amount of "noise" in the code.
Fortunately, it turns out that nearly all the "noise" that arises from the second point can be eliminated by taking advantage of deref coercions! This was the last "shoe to drop".
There is still the issue of having to be careful about calls from
vm.rs
to things outside that file, but it's manageable.
Cell<Option<Box<VM>>>
In order to get the "safe and stable #[thread_local]
"
exception we need a const
initializer, which
means we need to be able to put something into the Cell
that isn't
a VM
. So the type needs to be Cell<Option<Box<VM>>>
.
Recall that you can't turn an Option<&T>
into an &Option<T>
.
The latter type has the "is this a Some
or None
" bit immediately
adjacent to the bits representing T
. So if I hand you a t:&T
and you wrap it as Some(t)
, those bits aren't adjacent in memory.
This means that all the VM methods need to operate on an
Option<Box<VM>>
-- we can't just wrap a Some()
around &mut self
"at the last minute" before inserting it into the thread-local
storage cell. Fortunately deref coercions save the day here too --
the coercion is inferred through both layers (Box
and Option
) of
wrapper, so there is no additional noise in the code.
Note that Rust is clever and can find some sequence of bits that
aren't a valid T
, so sizeof(Option<T>)==sizeof(T)
. And in fact,
Box<T>
is one of these cases (and this is guaranteed). So the
Option
has no overhead.
Closing thoughts, language-level support
This would have been easier with language-level support.
What wouldn't help
Although it it was decreed that Cell<T>
and &mut T
are interchangeable, a LocalKey<Cell<T>>
isn't quite the same
thing as a Cell<T>
, so it wouldn't be safe for the standard
library to contain something like this:
impl<T> LocalKey<Cell<T>> {
fn get_mut(&self) -> &mut T {
unsafe {
// ... mem::transmute() voodoo goes here ...
The problem here is that you can call LocalKey<Cell<T>>::get_mut()
twice and
end up with two &mut T
s that point to the same thing (mutable aliasing) which
results in undefined behavior.
What would help
The ideal solution is for Rust to let you call arbitrary methods
T::foo(&mut self...)
on a LocalKey<Cell<T>>
. This way you can
have one (and only one) &mut T
at any syntactical point in your
program -- the &mut self
.