2022-08-27 21:45:19 +02:00
|
|
|
Known Optimisation Potential
|
|
|
|
============================
|
|
|
|
|
|
|
|
There are several areas of the Tvix evaluator code base where
|
|
|
|
potentially large performance gains can be achieved through
|
|
|
|
optimisations that we are already aware of.
|
|
|
|
|
|
|
|
The shape of most optimisations is that of moving more work into the
|
|
|
|
compiler to simplify the runtime execution of Nix code. This leads, in
|
|
|
|
some cases, to drastically higher complexity in both the compiler
|
|
|
|
itself and in invariants that need to be guaranteed between the
|
|
|
|
runtime and the compiler.
|
|
|
|
|
|
|
|
For this reason, and because we lack the infrastructure to adequately
|
|
|
|
track their impact (WIP), we have not yet implemented these
|
|
|
|
optimisations, but note the most important ones here.
|
|
|
|
|
|
|
|
* Use "open upvalues" [hard]
|
|
|
|
|
|
|
|
Right now, Tvix will immediately close over all upvalues that are
|
|
|
|
created and clone them into the `Closure::upvalues` array.
|
|
|
|
|
|
|
|
Instead of doing this, we can statically determine most locals that
|
|
|
|
are closed over *and escape their scope* (similar to how the
|
|
|
|
`compiler::scope::Scope` struct currently tracks whether locals are
|
|
|
|
used at all).
|
|
|
|
|
|
|
|
If we implement the machinery to track this, we can implement some
|
|
|
|
upvalues at runtime by simply sticking stack indices in the upvalue
|
|
|
|
array and only copy the values where we know that they escape.
|
|
|
|
|
|
|
|
* Avoid `with` value duplication [easy]
|
|
|
|
|
|
|
|
If a `with` makes use of a local identifier in a scope that can not
|
|
|
|
close before the with (e.g. not across `LambdaCtx` boundaries), we
|
|
|
|
can avoid the allocation of the phantom value and duplication of the
|
|
|
|
`NixAttrs` value on the stack. In this case we simply push the stack
|
|
|
|
index of the known local.
|
|
|
|
|
|
|
|
* Multiple attribute selection [medium]
|
|
|
|
|
|
|
|
An instruction could be introduced that avoids repeatedly pushing an
|
|
|
|
attribute set to/from the stack if multiple keys are being selected
|
|
|
|
from it. This occurs, for example, when inheriting from an attribute
|
|
|
|
set or when binding function formals.
|
|
|
|
|
|
|
|
* Split closure/function representation [easy]
|
|
|
|
|
|
|
|
Functions have fewer fields that need to be populated at runtime and
|
|
|
|
can directly use the `value::function::Lambda` representation where
|
|
|
|
possible.
|
|
|
|
|
2023-01-29 21:40:57 +01:00
|
|
|
* Apply `compiler::optimise_select` to other set operations [medium]
|
2022-09-02 14:46:14 +02:00
|
|
|
|
2023-01-29 21:40:57 +01:00
|
|
|
In addition to selects, statically known attribute resolution could
|
|
|
|
also be used for things like `?` or `with`. The latter might be a
|
|
|
|
little more complicated but is worth investigating.
|
2022-09-02 20:49:11 +02:00
|
|
|
|
2022-09-06 21:39:03 +02:00
|
|
|
* Inline fully applied builtins with equivalent operators [medium]
|
|
|
|
|
2023-01-25 14:39:24 +01:00
|
|
|
Some `builtins` have equivalent operators, e.g. `builtins.sub`
|
|
|
|
corresponds to the `-` operator, `builtins.hasAttr` to the `?`
|
2022-09-06 21:39:03 +02:00
|
|
|
operator etc. These operators additionally compile to a primitive
|
|
|
|
VM opcode, so they should be just as cheap (if not cheaper) as
|
|
|
|
a builtin application.
|
|
|
|
|
|
|
|
In case the compiler encounters a fully applied builtin (i.e.
|
2022-10-25 11:23:22 +02:00
|
|
|
no currying is occurring) and the `builtins` global is unshadowed,
|
2022-09-06 21:39:03 +02:00
|
|
|
it could compile the equivalent operator bytecode instead: For
|
2023-01-25 14:39:24 +01:00
|
|
|
example, `builtins.sub 20 22` would be compiled as `20 - 22`.
|
2022-09-06 21:39:03 +02:00
|
|
|
This would ensure that equivalent `builtins` can also benefit
|
|
|
|
from special optimisations we may implement for certain operators
|
|
|
|
(in the absence of currying). E.g. we could optimise access
|
|
|
|
to the `builtins` attribute set which a call to
|
|
|
|
`builtins.getAttr "foo" builtins` should also profit from.
|
|
|
|
|
2022-09-02 20:49:11 +02:00
|
|
|
* Avoid nested `VM::run` calls [hard]
|
|
|
|
|
|
|
|
Currently when encountering Nix-native callables (thunks, closures)
|
|
|
|
the VM's run loop will nest and return the value of the nested call
|
|
|
|
frame one level up. This makes the Rust call stack almost mirror the
|
|
|
|
Nix call stack, which is usually undesirable.
|
|
|
|
|
|
|
|
It is possible to detect situations where this is avoidable and
|
|
|
|
instead set up the VM in such a way that it continues and produces
|
|
|
|
the desired result in the same run loop, but this is kind of tricky
|
|
|
|
to get right - especially while other parts are still in flux.
|
|
|
|
|
|
|
|
For details consult the commit with Gerrit change ID
|
|
|
|
`I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial
|
|
|
|
attempt at doing this was reverted.
|
2022-09-06 22:25:57 +02:00
|
|
|
|
|
|
|
* Avoid thunks if only identifier closing is required [medium]
|
|
|
|
|
|
|
|
Some constructs, like `with`, mostly do not change runtime behaviour
|
|
|
|
if thunked. However, they are wrapped in thunks to ensure that
|
|
|
|
deferred identifiers are resolved correctly.
|
|
|
|
|
|
|
|
This can be avoided, as we statically analyse the scope and should
|
|
|
|
be able to tell whether any such logic was required.
|
2022-10-26 11:22:08 +02:00
|
|
|
|
|
|
|
* Intern literals [easy]
|
|
|
|
|
|
|
|
Currently, the compiler emits a separate entry in the constant
|
|
|
|
table for each literal. So the program `1 + 1 + 1` will have
|
|
|
|
three entries in its `Chunk::constants` instead of only one.
|
2022-12-28 12:52:34 +01:00
|
|
|
|
|
|
|
* Do some list and attribute set operations in place [hard]
|
|
|
|
|
|
|
|
Algorithms that can not do a lot of work inside `builtins` like `map`,
|
|
|
|
`filter` or `foldl'` usually perform terribly if they use data structures like
|
|
|
|
lists and attribute sets.
|
|
|
|
|
|
|
|
`builtins` can do work in place on a copy of a `Value`, but naïvely expressed
|
|
|
|
recursive algorithms will usually use `//` and `++` to do a single change to a
|
|
|
|
`Value` at a time, requiring a full copy of the data structure each time.
|
|
|
|
It would be a big improvement if we could do some of these operations in place
|
|
|
|
without requiring a new copy.
|
|
|
|
|
|
|
|
There are probably two approaches: We could determine statically if a value is
|
|
|
|
reachable from elsewhere and emit a special in place instruction if not. An
|
|
|
|
easier alternative is probably to rely on reference counting at runtime: If no
|
|
|
|
other reference to a value exists, we can extend the list or update the
|
|
|
|
attribute set in place.
|
|
|
|
|
|
|
|
An **alternative** to this is using [persistent data
|
|
|
|
structures](https://en.wikipedia.org/wiki/Persistent_data_structure) or at the
|
|
|
|
very least [immutable data structures](https://docs.rs/im/latest/im/) that can
|
|
|
|
be copied more efficiently than the stock structures we are using at the
|
|
|
|
moment.
|