tvl-depot/tvix/eval/docs/abandoned/thread-local-vm.md
Adam Joseph e816d3a9dc docs(tvix/eval): document abandoned thread-local vm
This commit adds a markdown document which explains how the
thread-local VM infrastructure works, in case it is useful in the
future.

Change-Id: Id10e32a9e3c5fa38a15d4bec9800f7234c59234a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7193
Tested-by: BuildkiteCI
Autosubmit: Adam Joseph <adam@westernsemico.com>
Reviewed-by: tazjin <tazjin@tvl.su>
2022-11-08 08:41:04 +00:00

10 KiB

We can't have nice things because IFD

The thread-local VM work below was ultimately not merged because it was decided that it would be harmful for tvix::eval::Value to implement Eq, Hash, or any of the other std traits.

Implementing std traits on Value was deemed harmful because IFD can cause arbitrary amounts of compilation to occur, including network transactions with builders. Obviously it would be unexpected and error-prone to have a PartialEq::eq() which does something like this. This problem does not manifest within the "nixpkgs compatibility only" scope, or in any undeprecated language feature other than IFD. Although IFD is outside the "nixpkgs compatibility scope", it has been added to the TVL compatibility scope.

This was the sole reason for not merging.

The explanation below may be useful in case future circumstances affect the relevance of the reasoning above.

The implementation can be found in these CLs:

Thread-local storage for tvix::eval::vm::VM

The problem

Value::force() takes a &mut VM argument, since forcing a value requires executing opcodes. This means that Value::nix_eq() too must take a &mut VM, since any sensible definition of equality will have to force thunks.

Unfortunately Rust's PartialEq::eq() function does not accept any additional arguments like this, so Value cannot implement PartialEq. Worse, structs which contain Values can't implement PartialEq either. This means Value, and anything containing it, cannot be the key for a BTreeMap or HashMap. We can't even insert Values into a HashSet!

There are other situations like this that don't involve PartialEq, but it's the most glaring one. The main problem is that you need a VM in order to force thunks, and thunks can be anywhere in a Value.

Solving the problem with thread-locals

We could avoid threading the &mut VM through the entire codebase by making it a thread-local.

To do this without a performance hit, we need to use LLVM thread-locals, which are the same cost as references to statics but load relative to llvm.threadlocal.address instead of relative to the data segment. Unfortunately #[thread_local] is unstable and unsafe in general for most of the cases where we would want to use it. There is one exception, however: if a !thread_local() has a const initializer, the compiler will insert a #[thread_local]; this special case is both safe and stable.

The difficult decision is what the type of the thread-local should be. Since you can't get a mutable reference to a thread_local!() it will have to be some interior-mutability-bestowing wrapper around our current struct VM. Here are the choices:

RefCell<VM>

This is the obvious first choice, since it lets you borrow a RefMut<Target=VM>. The problem here is that we want to keep the codebase written such that all the functions in impl VM still take a &mut self. This means that there will be an active mutable borrow for the duration of VM::call_builtin(). So if we implement PartialEq by having eq() attempt a second mutable borrow from the thread-local storage, it will fail since there is already an active borrow.

The problem here is that you can't "unborrow" a RefMut except by dropping it. There's no way around this.

Problem: Uglification

The only solution here is to rewrite all the functions in impl VM so they don't take any kind of self argument, and then have them do a short-lived .borrow_mut() from the thread-local RefCell separately, each time they want to modify one of the fields of VM (currently frames, stack, with_stack, warnings). This means that if you had a code sequence like this:

impl VM {
  fn foo(&mut self, ...) {
    ...
    self.frame().ip += 1;
    self.some_other_method();
    self.frame().ip += 1;

You would need to add two separate borrow_mut()s, one for each of the self.frame().ip+=1 statements. You can't just do one big borrow_mut() because some_other_method() will call borrow_mut() and panic.

Problem: Performance

The RefCell<VM> approach also has a fairly huge performance hit, because every single modification to any part of VM will require a reference count increment/decrement, and a conditional branch based on the check (which will never fail) that the RefCell isn't already mutably borrowed. It will also impede a lot of rustc's optimizations.

Cell<VM>

This is a non-starter because it means that in order to mutate any field of VM, you have to move the entire struct VM out of the Cell, mutate it, and move it back in.

Cell<Box<VM>>

Now we're getting warmer. Here, we can move the Box<VM> out of the cell with a single pointer-sized memory access.

We don't want to do the "uglification" described in the previous section. We are very fortunate that, sometime in mid-2019, the Rust dieties decreed by fiat that &Cell<T> and &mut T are bit-for-bit identical, and even gave us mortals safe wrappers from_mut() and get_mut() around mem::transmute().

So now, when a VM method (which takes &mut self) calls out to some external code (like a builtin), instead of passing the &mut self to the external code it can call Cell::from_mut(&mut self), and then Cell::swap() that into the thread-local storage cell for the duration of the external code. After the external code returns, it can Cell::swap() it back. This whole dance gets wrapped in a lexical block, and the borrow checker sees that the &Cell<Box<VM>> returned by Cell::from_mut() lives only until the end of the lexical block, so we get the &mut self back after the close-brace for that block. NLL FTW. This sounds like a lot of work, but it should compile down to two pointer-sized loads and two pointer-sized stores, and it is incurred basically only for OpBuiltin.

This all works, with only two issues:

  1. vm.rs needs to be very careful to do the thread-local cell swap dance before calling anything that might call PartialEq::eq() (or any other method that expects to be able to pull the VM out of thread-local storage). There is no compile-time check that we did the dance in all the right places. If we forget to do the dance somewhere we'll get a runtime panic from Option::expect() (see next section).

  2. Since we need to call Cell::from_mut() on a Box<VM> rather than a bare VM, we still need to rewrite all of vm.rs so that every function takes a &mut Box<VM> instead of a &mut self. This creates a huge amount of "noise" in the code.

Fortunately, it turns out that nearly all the "noise" that arises from the second point can be eliminated by taking advantage of deref coercions! This was the last "shoe to drop".

There is still the issue of having to be careful about calls from vm.rs to things outside that file, but it's manageable.

Cell<Option<Box<VM>>>

In order to get the "safe and stable #[thread_local]" exception we need a const initializer, which means we need to be able to put something into the Cell that isn't a VM. So the type needs to be Cell<Option<Box<VM>>>.

Recall that you can't turn an Option<&T> into an &Option<T>. The latter type has the "is this a Some or None" bit immediately adjacent to the bits representing T. So if I hand you a t:&T and you wrap it as Some(t), those bits aren't adjacent in memory. This means that all the VM methods need to operate on an Option<Box<VM>> -- we can't just wrap a Some() around &mut self "at the last minute" before inserting it into the thread-local storage cell. Fortunately deref coercions save the day here too -- the coercion is inferred through both layers (Box and Option) of wrapper, so there is no additional noise in the code.

Note that Rust is clever and can find some sequence of bits that aren't a valid T, so sizeof(Option<T>)==sizeof(T). And in fact, Box<T> is one of these cases (and this is guaranteed). So the Option has no overhead.

Closing thoughts, language-level support

This would have been easier with language-level support.

What wouldn't help

Although it it was decreed that Cell<T> and &mut T are interchangeable, a LocalKey<Cell<T>> isn't quite the same thing as a Cell<T>, so it wouldn't be safe for the standard library to contain something like this:

impl<T> LocalKey<Cell<T>> {
  fn get_mut(&self) -> &mut T {
    unsafe {
      // ... mem::transmute() voodoo goes here ...

The problem here is that you can call LocalKey<Cell<T>>::get_mut() twice and end up with two &mut Ts that point to the same thing (mutable aliasing) which results in undefined behavior.

What would help

The ideal solution is for Rust to let you call arbitrary methods T::foo(&mut self...) on a LocalKey<Cell<T>>. This way you can have one (and only one) &mut T at any syntactical point in your program -- the &mut self.