tvl-depot/third_party/immer/doc/design.rst
Vincent Ambo 1213b086a1 merge(3p/immer): Subtree merge at 'ad3e3556d' as 'third_party/immer'
Change-Id: I9636a41ad44b4218293833fd3e9456d9b07c731b
2020-07-15 08:23:32 +01:00

268 lines
11 KiB
ReStructuredText

.. _design:
Design
======
This is a library of **immutable containers**.
These containers have all their methods marked ``const``. Instead of
mutating them *in place*, they provide manipulation functions that
*return a new transformed value*, leaving the original value
unaltered. In the context of data-structures, this property of
preserving old values is called **persistence**.
Most of these containers use data-structures in which these operations
can be done *efficiently*. In particular, not all data is copied when
a new value is produced. Instead, the new values may share,
internally, common data with other objects. We sometimes refer to
this property as **structural sharing**. This behaviour is
transparent to the user.
Assigment
---------
We are sorry, we lied. These containers provide *one mutating
operation*: **assignment** --- i.e. ``operator=``.
There is a good reason: without ``operator=`` everything becomes
complicated in C++. For example, one may not contain non-assignable
types in many standard containers, assignment would also be disabled
from your custom types holding immutable containers, and so on and so
forth.
C++ is a multi-paradigm language with an imperative core. Thus, it is
built on the foundation that *variables* can be mutated ---
i.e. assigned to. We don't want to ride against this tide. What we
want to prevent is in-place *object* manipulation. Because of C++
semantics, *variable* assignment is defined in terms of *object
mutation*, so we have to provide *this very particular mutating
operation*, but nothing else. Of course, you are free to mark your
variables ``const`` to completely forbid assignment.
.. warning::
**Assignment is not thread safe**. When a *mutable* variable is
shared across multiple threads, protect access using some other
mechanism.
For obvious reasons, all other methods, which are ``const``, are
thread-safe. It is safe to share *immutable* state across multiple
threads.
To ``const`` or not to ``const``
--------------------------------
Many C++ programmers, influenced by functional programming, are trying
to escape the evils of mutability by using ``const`` whenever
possible. We also do it ourselves in many of our examples to
reinforce the property of immutability.
While in general this is a good practice backed up with very good
intentions, it has one caveat: *it disables moveability*. It does so
even when ``std::move()`` is used. This makes sense, since moving from
an object may mutate it, and ``const``, my friends, prevents *all*
mutations. For example:
.. literalinclude:: ../example/vector/move.cpp
:language: c++
:start-after: move-bad/start
:end-before: move-bad/end
One may think that the variable ``v`` is moved into the
``push_back()`` call. This is not the case, because the variable
``v`` is marked ``const``. Of course, one may enable the move by
removing it, as in:
.. literalinclude:: ../example/vector/move.cpp
:language: c++
:start-after: move-good/start
:end-before: move-good/end
So, is it bad style then to use ``const`` as much as possible? I
wouldn't say so and it is advisable when ``std::move()`` is not used.
An alternative style is to not use ``const`` but adopt an `AAA-style
<aaa>`_ (*Almost Always use Auto*). This way, it is easy to look for
mutations by looking for lines that contain ``=`` but no ``auto``.
Remember that when using our immutable containers ``operator=`` is the
only way to mutate a variable.
.. _aaa: https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/
.. admonition:: Why does ``const`` prevent move semantics?
For those adventurous into the grainy details C++, here is why.
``std::move()`` does not move anything, it is just *a cast* from
normal *l-value* references (``T&``) to *r-value* reference
(``T&&``). This is, you pass it a variable, and it returns a
reference to its object disguised as an intermediate result. In
exchange, you promise not to do anything with this variable later
[#f1]_. It is the role of the thing that *receives the moved-from
value* (in the previous example, ``push_back``) to actually do
anything interesting with it --- for example, steal its contents
😈.
So if you pass a ``T&`` to ``std::move()`` you get a ``T&&`` and,
unsurprisingly, if you pass a ``const T&`` you get a ``const T&&``.
But the receivers of the moved-from value (like constructors or our
``push_back()``) maybe be moved-into because they provide an
overload that expects ``T&&`` --- without the ``const``! Since a
``const T&&`` can not be converted into a ``T&&``, the compiler
looks up for you another viable overload, and most often finds a
copy constructor or something alike that expects a ``const T&`` or
just ``T``, to which a ``const T&&`` can be converted. The code
compiles and works correctly, but it is less efficient than we
expected. Our call to ``std::move()`` was fruitless.
.. [#f1] For the sake of completeness: it is actually allowed to do stuff
with the variable *after another value is assigned to it*.
.. _move-semantics:
Leveraging move semantics
-------------------------
When using :ref:`reference counting<rc>` (which is the default)
mutating operations can often be faster when operating on *r-value
references* (temporaries and moved-from values). Note that this
removes *persistence*, since one can not access the moved-from value
anymore! However, this may be a good idea when doing a chain of
operations where the intermediate values are not important to us.
For example, let's say we want to write a function that inserts all
integers in the range :math:`[first, last)` into an immutable vector.
From the point of view of the caller of the function, this function is
a *transaction*. Whatever intermediate vectors are generated inside
of it can be discarded since the caller can only see the initial
vector (the one passed in as argument) and the vector with *all* the
elements. We may write such function like this:
.. literalinclude:: ../example/vector/iota-move.cpp
:language: c++
:start-after: myiota/start
:end-before: myiota/end
The intermediate values are *moved* into the next ``push_back()``
call. They are going to be discarded anyways, this little
``std::move`` just makes the whole thing faster, letting ``push_back``
mutate part of the internal data structure in place when possible.
If you don't like this syntax, :doc:`transients<transients>` may be
used to obtain similar performance benefits.
.. admonition:: Assigment guarantees
From the language point of view, the only requirement on moved from
values is that they should still be destructible. We provide the
following two additional guarantees:
- **It is valid to assign to a moved-from variable**. The variable
gets the assigned value and becomes usable again. This is the
behaviour of standard types.
- **It is valid to assign a moved-from variable to itself**. For
most standard types this is *undefined behaviour*. However, for our
immutable containers types, expressions of the form ``v =
std::move(v)`` are well-defined.
Recursive types
---------------
Most containers will fail to be instantiated with a type of unknown
size, this is, an *incomplete type*. This prevents using them for
building recursive types. The following code fails to compile:
.. code-block:: c++
struct my_type
{
int data;
immer::vector<my_type> children;
};
However, we can easily workaround this by using an ``immer::box`` to wrap
the elements in the vector, as in:
.. code-block:: c++
struct my_type
{
int data;
immer::vector<immer::box<my_type>> children;
};
.. admonition:: Standard containers and incomplete types
While the first example might seem to compile when using some
implementations of ``std::vector`` instead of ``immer::vector``, such
use is actually forbidden by the standard:
**17.6.4.8** *Other functions (...)* 2. the effects are undefined in
the following cases: (...) In particular---if an incomplete type (3.9)
is used as a template argument when instantiating a template
component, unless specifically allowed for that component.
.. _batch-update:
Efficient batch manipulations
-----------------------------
Sometimes you may write a function that needs to do multiple changes
to a container. Like most code you write with this library, this
function is *pure*: it takes one container value in, and produces a
new container value out, no side-effects.
Let's say we want to write a function that inserts all integers in the
range :math:`[first, last)` into an immutable vector:
.. literalinclude:: ../example/vector/iota-slow.cpp
:language: c++
:start-after: include:myiota/start
:end-before: include:myiota/end
This function works as expected, but it is slower than necessary.
On every loop iteration, a new value is produced, just to be
forgotten in the next iteration.
Instead, we can grab a mutable view on the value, a :ref:`transient`.
Then, we manipulate it *in-place*. When we are done with it, we
extract back an immutable value from it. The code now looks like
this:
.. _iota-transient:
.. literalinclude:: ../example/vector/iota-transient.cpp
:language: c++
:start-after: include:myiota/start
:end-before: include:myiota/end
Both conversions are :math:`O(1)`. Note that calling ``transient()``
does not break the immutability of the variable it is called on. The
new mutable object will adopt its contents, but when a mutation is
performed, it will copy the data necessary using *copy on write*.
Subsequent manipulations may hit parts that have already been copied,
and these changes are done in-place. Because of this, it does not
make sense to use transients to do only one change.
.. tip::
Note that :ref:`move semantics<move-semantics>` can be used instead to
support a similar use-case. However, transients optimise updates
even when reference counting is disabled.
.. _std-compat:
Standard library compatibility
------------------------------
While the immutable containers provide an interface that follows a
functional style, this is incompatible with what the standard library
algorithms sometimes expect. :ref:`transients` try to provide an
interface as similar as possible to similar standard library
containers. Thus, can also be used to interoperate with standard
library components.
For example the :ref:`myiota() function above<iota-transient>` may as
well be written using standard library tools:
.. literalinclude:: ../example/vector/iota-transient-std.cpp
:language: c++
:start-after: include:myiota/start
:end-before: include:myiota/end