1213b086a1
Change-Id: I9636a41ad44b4218293833fd3e9456d9b07c731b
268 lines
11 KiB
ReStructuredText
268 lines
11 KiB
ReStructuredText
.. _design:
|
|
|
|
Design
|
|
======
|
|
|
|
This is a library of **immutable containers**.
|
|
|
|
These containers have all their methods marked ``const``. Instead of
|
|
mutating them *in place*, they provide manipulation functions that
|
|
*return a new transformed value*, leaving the original value
|
|
unaltered. In the context of data-structures, this property of
|
|
preserving old values is called **persistence**.
|
|
|
|
Most of these containers use data-structures in which these operations
|
|
can be done *efficiently*. In particular, not all data is copied when
|
|
a new value is produced. Instead, the new values may share,
|
|
internally, common data with other objects. We sometimes refer to
|
|
this property as **structural sharing**. This behaviour is
|
|
transparent to the user.
|
|
|
|
Assigment
|
|
---------
|
|
|
|
We are sorry, we lied. These containers provide *one mutating
|
|
operation*: **assignment** --- i.e. ``operator=``.
|
|
|
|
There is a good reason: without ``operator=`` everything becomes
|
|
complicated in C++. For example, one may not contain non-assignable
|
|
types in many standard containers, assignment would also be disabled
|
|
from your custom types holding immutable containers, and so on and so
|
|
forth.
|
|
|
|
C++ is a multi-paradigm language with an imperative core. Thus, it is
|
|
built on the foundation that *variables* can be mutated ---
|
|
i.e. assigned to. We don't want to ride against this tide. What we
|
|
want to prevent is in-place *object* manipulation. Because of C++
|
|
semantics, *variable* assignment is defined in terms of *object
|
|
mutation*, so we have to provide *this very particular mutating
|
|
operation*, but nothing else. Of course, you are free to mark your
|
|
variables ``const`` to completely forbid assignment.
|
|
|
|
.. warning::
|
|
|
|
**Assignment is not thread safe**. When a *mutable* variable is
|
|
shared across multiple threads, protect access using some other
|
|
mechanism.
|
|
|
|
For obvious reasons, all other methods, which are ``const``, are
|
|
thread-safe. It is safe to share *immutable* state across multiple
|
|
threads.
|
|
|
|
To ``const`` or not to ``const``
|
|
--------------------------------
|
|
|
|
Many C++ programmers, influenced by functional programming, are trying
|
|
to escape the evils of mutability by using ``const`` whenever
|
|
possible. We also do it ourselves in many of our examples to
|
|
reinforce the property of immutability.
|
|
|
|
While in general this is a good practice backed up with very good
|
|
intentions, it has one caveat: *it disables moveability*. It does so
|
|
even when ``std::move()`` is used. This makes sense, since moving from
|
|
an object may mutate it, and ``const``, my friends, prevents *all*
|
|
mutations. For example:
|
|
|
|
.. literalinclude:: ../example/vector/move.cpp
|
|
:language: c++
|
|
:start-after: move-bad/start
|
|
:end-before: move-bad/end
|
|
|
|
One may think that the variable ``v`` is moved into the
|
|
``push_back()`` call. This is not the case, because the variable
|
|
``v`` is marked ``const``. Of course, one may enable the move by
|
|
removing it, as in:
|
|
|
|
.. literalinclude:: ../example/vector/move.cpp
|
|
:language: c++
|
|
:start-after: move-good/start
|
|
:end-before: move-good/end
|
|
|
|
So, is it bad style then to use ``const`` as much as possible? I
|
|
wouldn't say so and it is advisable when ``std::move()`` is not used.
|
|
An alternative style is to not use ``const`` but adopt an `AAA-style
|
|
<aaa>`_ (*Almost Always use Auto*). This way, it is easy to look for
|
|
mutations by looking for lines that contain ``=`` but no ``auto``.
|
|
Remember that when using our immutable containers ``operator=`` is the
|
|
only way to mutate a variable.
|
|
|
|
.. _aaa: https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/
|
|
|
|
.. admonition:: Why does ``const`` prevent move semantics?
|
|
|
|
For those adventurous into the grainy details C++, here is why.
|
|
``std::move()`` does not move anything, it is just *a cast* from
|
|
normal *l-value* references (``T&``) to *r-value* reference
|
|
(``T&&``). This is, you pass it a variable, and it returns a
|
|
reference to its object disguised as an intermediate result. In
|
|
exchange, you promise not to do anything with this variable later
|
|
[#f1]_. It is the role of the thing that *receives the moved-from
|
|
value* (in the previous example, ``push_back``) to actually do
|
|
anything interesting with it --- for example, steal its contents
|
|
😈.
|
|
|
|
So if you pass a ``T&`` to ``std::move()`` you get a ``T&&`` and,
|
|
unsurprisingly, if you pass a ``const T&`` you get a ``const T&&``.
|
|
But the receivers of the moved-from value (like constructors or our
|
|
``push_back()``) maybe be moved-into because they provide an
|
|
overload that expects ``T&&`` --- without the ``const``! Since a
|
|
``const T&&`` can not be converted into a ``T&&``, the compiler
|
|
looks up for you another viable overload, and most often finds a
|
|
copy constructor or something alike that expects a ``const T&`` or
|
|
just ``T``, to which a ``const T&&`` can be converted. The code
|
|
compiles and works correctly, but it is less efficient than we
|
|
expected. Our call to ``std::move()`` was fruitless.
|
|
|
|
.. [#f1] For the sake of completeness: it is actually allowed to do stuff
|
|
with the variable *after another value is assigned to it*.
|
|
|
|
.. _move-semantics:
|
|
|
|
Leveraging move semantics
|
|
-------------------------
|
|
|
|
When using :ref:`reference counting<rc>` (which is the default)
|
|
mutating operations can often be faster when operating on *r-value
|
|
references* (temporaries and moved-from values). Note that this
|
|
removes *persistence*, since one can not access the moved-from value
|
|
anymore! However, this may be a good idea when doing a chain of
|
|
operations where the intermediate values are not important to us.
|
|
|
|
For example, let's say we want to write a function that inserts all
|
|
integers in the range :math:`[first, last)` into an immutable vector.
|
|
From the point of view of the caller of the function, this function is
|
|
a *transaction*. Whatever intermediate vectors are generated inside
|
|
of it can be discarded since the caller can only see the initial
|
|
vector (the one passed in as argument) and the vector with *all* the
|
|
elements. We may write such function like this:
|
|
|
|
.. literalinclude:: ../example/vector/iota-move.cpp
|
|
:language: c++
|
|
:start-after: myiota/start
|
|
:end-before: myiota/end
|
|
|
|
The intermediate values are *moved* into the next ``push_back()``
|
|
call. They are going to be discarded anyways, this little
|
|
``std::move`` just makes the whole thing faster, letting ``push_back``
|
|
mutate part of the internal data structure in place when possible.
|
|
|
|
If you don't like this syntax, :doc:`transients<transients>` may be
|
|
used to obtain similar performance benefits.
|
|
|
|
.. admonition:: Assigment guarantees
|
|
|
|
From the language point of view, the only requirement on moved from
|
|
values is that they should still be destructible. We provide the
|
|
following two additional guarantees:
|
|
|
|
- **It is valid to assign to a moved-from variable**. The variable
|
|
gets the assigned value and becomes usable again. This is the
|
|
behaviour of standard types.
|
|
|
|
- **It is valid to assign a moved-from variable to itself**. For
|
|
most standard types this is *undefined behaviour*. However, for our
|
|
immutable containers types, expressions of the form ``v =
|
|
std::move(v)`` are well-defined.
|
|
|
|
Recursive types
|
|
---------------
|
|
|
|
Most containers will fail to be instantiated with a type of unknown
|
|
size, this is, an *incomplete type*. This prevents using them for
|
|
building recursive types. The following code fails to compile:
|
|
|
|
.. code-block:: c++
|
|
|
|
struct my_type
|
|
{
|
|
int data;
|
|
immer::vector<my_type> children;
|
|
};
|
|
|
|
However, we can easily workaround this by using an ``immer::box`` to wrap
|
|
the elements in the vector, as in:
|
|
|
|
.. code-block:: c++
|
|
|
|
struct my_type
|
|
{
|
|
int data;
|
|
immer::vector<immer::box<my_type>> children;
|
|
};
|
|
|
|
.. admonition:: Standard containers and incomplete types
|
|
|
|
While the first example might seem to compile when using some
|
|
implementations of ``std::vector`` instead of ``immer::vector``, such
|
|
use is actually forbidden by the standard:
|
|
|
|
**17.6.4.8** *Other functions (...)* 2. the effects are undefined in
|
|
the following cases: (...) In particular---if an incomplete type (3.9)
|
|
is used as a template argument when instantiating a template
|
|
component, unless specifically allowed for that component.
|
|
|
|
.. _batch-update:
|
|
Efficient batch manipulations
|
|
-----------------------------
|
|
|
|
Sometimes you may write a function that needs to do multiple changes
|
|
to a container. Like most code you write with this library, this
|
|
function is *pure*: it takes one container value in, and produces a
|
|
new container value out, no side-effects.
|
|
|
|
Let's say we want to write a function that inserts all integers in the
|
|
range :math:`[first, last)` into an immutable vector:
|
|
|
|
.. literalinclude:: ../example/vector/iota-slow.cpp
|
|
:language: c++
|
|
:start-after: include:myiota/start
|
|
:end-before: include:myiota/end
|
|
|
|
This function works as expected, but it is slower than necessary.
|
|
On every loop iteration, a new value is produced, just to be
|
|
forgotten in the next iteration.
|
|
|
|
Instead, we can grab a mutable view on the value, a :ref:`transient`.
|
|
Then, we manipulate it *in-place*. When we are done with it, we
|
|
extract back an immutable value from it. The code now looks like
|
|
this:
|
|
|
|
.. _iota-transient:
|
|
|
|
.. literalinclude:: ../example/vector/iota-transient.cpp
|
|
:language: c++
|
|
:start-after: include:myiota/start
|
|
:end-before: include:myiota/end
|
|
|
|
Both conversions are :math:`O(1)`. Note that calling ``transient()``
|
|
does not break the immutability of the variable it is called on. The
|
|
new mutable object will adopt its contents, but when a mutation is
|
|
performed, it will copy the data necessary using *copy on write*.
|
|
Subsequent manipulations may hit parts that have already been copied,
|
|
and these changes are done in-place. Because of this, it does not
|
|
make sense to use transients to do only one change.
|
|
|
|
.. tip::
|
|
|
|
Note that :ref:`move semantics<move-semantics>` can be used instead to
|
|
support a similar use-case. However, transients optimise updates
|
|
even when reference counting is disabled.
|
|
|
|
.. _std-compat:
|
|
Standard library compatibility
|
|
------------------------------
|
|
|
|
While the immutable containers provide an interface that follows a
|
|
functional style, this is incompatible with what the standard library
|
|
algorithms sometimes expect. :ref:`transients` try to provide an
|
|
interface as similar as possible to similar standard library
|
|
containers. Thus, can also be used to interoperate with standard
|
|
library components.
|
|
|
|
For example the :ref:`myiota() function above<iota-transient>` may as
|
|
well be written using standard library tools:
|
|
|
|
.. literalinclude:: ../example/vector/iota-transient-std.cpp
|
|
:language: c++
|
|
:start-after: include:myiota/start
|
|
:end-before: include:myiota/end
|