tvl-depot/tvix/eval/docs/known-optimisation-potential.md
Vincent Ambo 2246a31e72 refactor(tvix/eval): return call frame result from VM::call
Previously, "calling" (setting up the VM run loop for executing a call
frame) and "running" (running this loop to completion) were separate
operations.

This was basically an attempt to avoid nesting `VM::run` invocations.
However, doing things this way introduced some tricky bugs for exiting
out of the call frames of thunks vs. builtins & closures.

For now, we unify the two operations and always return the value to
the caller directly. For now this makes calls a little less effective,
but it gives us a chance to nail down some other strange behaviours
and then re-optimise this afterwards.

To make sure we tackle this again further down I've added it to the
list of known possible optimisations.

Change-Id: I96828ab6a628136e0bac1bf03555faa4e6b74ece
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6415
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
2022-09-08 12:53:20 +00:00

83 lines
3.4 KiB
Markdown

Known Optimisation Potential
============================
There are several areas of the Tvix evaluator code base where
potentially large performance gains can be achieved through
optimisations that we are already aware of.
The shape of most optimisations is that of moving more work into the
compiler to simplify the runtime execution of Nix code. This leads, in
some cases, to drastically higher complexity in both the compiler
itself and in invariants that need to be guaranteed between the
runtime and the compiler.
For this reason, and because we lack the infrastructure to adequately
track their impact (WIP), we have not yet implemented these
optimisations, but note the most important ones here.
* Use "open upvalues" [hard]
Right now, Tvix will immediately close over all upvalues that are
created and clone them into the `Closure::upvalues` array.
Instead of doing this, we can statically determine most locals that
are closed over *and escape their scope* (similar to how the
`compiler::scope::Scope` struct currently tracks whether locals are
used at all).
If we implement the machinery to track this, we can implement some
upvalues at runtime by simply sticking stack indices in the upvalue
array and only copy the values where we know that they escape.
* Avoid `with` value duplication [easy]
If a `with` makes use of a local identifier in a scope that can not
close before the with (e.g. not across `LambdaCtx` boundaries), we
can avoid the allocation of the phantom value and duplication of the
`NixAttrs` value on the stack. In this case we simply push the stack
index of the known local.
* Multiple attribute selection [medium]
An instruction could be introduced that avoids repeatedly pushing an
attribute set to/from the stack if multiple keys are being selected
from it. This occurs, for example, when inheriting from an attribute
set or when binding function formals.
* Split closure/function representation [easy]
Functions have fewer fields that need to be populated at runtime and
can directly use the `value::function::Lambda` representation where
possible.
* Tail-call optimisation [hard]
We can statically detect the conditions for tail-call optimisation.
The compiler should do this, and it should then emit a new operation
for doing the tail-calls.
* Optimise inner builtin access [medium]
When accessing identifiers like `builtins.foo`, the compiler should
not go through the trouble of setting up the attribute set on the
stack and accessing `foo` from it if it knows that the scope for
`builtins` is unpoisoned.
The same thing goes for resolving `with builtins;`, which should
definitely resolve statically.
* Avoid nested `VM::run` calls [hard]
Currently when encountering Nix-native callables (thunks, closures)
the VM's run loop will nest and return the value of the nested call
frame one level up. This makes the Rust call stack almost mirror the
Nix call stack, which is usually undesirable.
It is possible to detect situations where this is avoidable and
instead set up the VM in such a way that it continues and produces
the desired result in the same run loop, but this is kind of tricky
to get right - especially while other parts are still in flux.
For details consult the commit with Gerrit change ID
`I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial
attempt at doing this was reverted.