Qt and Trivial Relocation (Part 3) Trivial relocability for vector erasure, and types with write-through reference semantics
In the last post of this series we started exploring how to erase an element from the middle of a vector.
- We discussed that in principle there are several different possible ways to implement erase().For instance, a vector could move-assign over the elements to be erased:
Alternatively, a vector could use rotations or some other algorithms to move the elements to be destroyed to the end:
In both cases we’re then left with an element in the rightmost position that needs to be destroyed. We can destroy it, update the vector’s size, and that’s it!
- Perhaps with a bit of arrogance, we have therefore claimed that the specific strategy for erasure used does not really matter. A vector implementation will just use a “good one”, and we can always change it if we find out it’s slower than an alternative.Keep this in mind, because it is going to be important really soon.
- We have also discussed that QVector optimizes erasure of trivially relocatable types by using a memmove. If the type is trivially relocatable, QVector destroys the element to be erased, and moves bytes from the tail over:
We concluded the post wondering: is this optimization legitimate? Why can QVector replace move assignments with a byte copy?
This doubt stems from the fact that our definition of relocation did not cover move assignments; it only covered move construction. Is Qt wrong? Is our definition wrong?
The reference semantics backstab
Let’s start by analyzing erase()‘s behavior once more.
Do you remember our claim that the specific strategy used does not really matter; that is, that they are all equivalent? Well, not so fast! It is actually quite imprecise to say that they are all equivalent.
They may be, as long as we deal with types which have value semantics. If we instead use a type that has reference semantics, the choices are absolutely not equivalent, and will yield different outcomes. This is because the semantics of assignment for (certain) reference types are write-through: they assign through the reference (instead of rebinding the reference).
Since we are implementing erasure in terms of assignments (or swaps, which boil down to assignments), this means that the precise sequence of operations done by erase will be visible due to its side-effects; and it also means that changing the strategy will produce different outcomes!
An example of a Standard Library type with these write-through reference semantics is std::tuple<int &>.
When one assigns such a tuple over another one, the reference inside the tuple will assign-through. This semantic is absolutely deliberate; it is used, for instance, by std::tie to build a tuple of references that can be used to unpack a given tuple:
int a; double b; std::tie(a, b) = getTuple(); // std::tie produces std::tuple<int &, double &> // assignment will write through, over a and b
For the sake of the example, instead of dealing with std::tuple, let’s strip it down to the bare minimum into a struct IntRef, so that we can focus on the behavior that we care about:
struct IntRef { int &m_ref; IntRef &operator=(const IntRef &other) { // write-through semantics for assignment: m_ref = other.m_ref; return *this; } };
Here’s an example of this in action:
int a = 1; int b = 2; IntRef ra{a}; IntRef rb{b}; ra = rb; std::println("{}, {}", a, b); // prints 2, 2; we've "assigned through" ra.
The assignment on line 6 wrote 2 into the variable a.
Note that I had to explicitly write an assignment operator for IntRef, and implement the wanted semantics (write-through) myself. This is necessary because of the presence of a reference as a non-static data member (m_ref): the implicitly-generated assignment operator(s) are actually deleted in this case. This default by C++ is actually good, as reference members are a bad idea, and if you have them in a class you should not support assignment.
Let’s now generalize this to a std::vector<IntRef>:
int a = 1, b = 2, c = 3, d = 4, e = 5; IntRef ra{a}, rb{b}, rc{c}, rd{d}, re{e}; std::vector<IntRef> v{ra, rb, rc, rd, re}; v.erase(v.begin() + 1); std::println("{}, {}, {}, {}, {}", a, b, c, d, e);
What does this code print? Which is another way of asking: “what happens when we erase the second element from v?”.
Now this is a very tricky question to answer; depending on the implementation strategy chosen, we may end up with a completely different outcome:
- If we move assign the “tail” to the left one element at a time, and erase the last element, then the output is going to be 1, 3, 4, 5, 5.
- If we rotate the second element to the end, and destroy the end, then the output is… well, we don’t really know, do we? It depends on the implementation of std::rotate.It could be 1, 3, 4, 5, 2; but it could be something else as well.
Please take a moment to convince yourselves of this 🙂 Pencil and paper helps, otherwise, here’s a convenient Godbolt.
Vector erasure must be specified
The conclusion is that the implementation strategy does matter for types with write-through reference semantics! This is a very different outcome than what we might have expected (or desired, even).
For this reason, the Standard Library clearly documents the behavior of std::vector::erase in [vector.modifiers]:
Complexity: The destructor of T is called the number of times equal to the number of the elements erased, but the assignment operator of T is called the number of times equal to the number of elements in the vector after the erased elements.
This pretty much imposes a specific implementation strategy (move assign the tail to the left), so you can rely on the outcome, even in the presence of types with write-through reference semantics.
I’ll also add an extra consideration: even if this behavior wasn’t fully documented, due to Hyrum’s law, we still couldn’t “just change” the implementation of erase, because existing code will be affected — it will be able to tell the difference.
What about QVector? Does it document its erasure behavior? Well… let’s talk about this later, shall we?
Is IntRef trivially relocatable?
Now, here’s a million dollar question: would you say that IntRef is trivially relocatable or that it’s not?
Maybe you’re tempted to say “it is not” based on what you’ve just seen — IntRef has a “strange” behavior, different from most types that we “normally” use. That should be enough to disqualify it, shouldn’t it?
Let’s not rush to the answer just yet! For the benefit of the discussion, I am going to repeat our current definition of relocability: to move construct an object, and destroy the source object. Relocability is trivial when these two operations can be realized by simply copying bytes.
For example:
- int is trivially relocatable. std::unique_ptr<T> is trivially relocatable. QString is trivially relocatable.
- std::string is usually not trivially relocatable (self-referential due to SSO). std::list may also not be (the first node and/or a sentinel node of the list may contain a pointer to the std::list object itself).
- QObject isn’t relocatable (it is not movable at all).
So, what about our IntRef?
If we stick to this very definition the correct answer is that IntRef is absolutely, unequivocally trivially relocatable. We can replace a move construction of an IntRef object, followed by the destruction of the source, with a mere memcpy; the result is completely equivalent.
If you think about it, IntRef simply contains a reference — that is, a pointer. It’s not very different from the memory layout of a std::unique_ptr from this point of view, and we know we can relocate that.
IntRef does show its reference semantics on assignment, but remember, our current definition of (trivial) relocability does not talk about assignments at all.
As a corollary, this fact means that it would be perfectly safe to grow a vector<IntRef> using memcpy when it needs a reallocation!
What about erasure?
If IntRef is trivially relocatable, can we erase an IntRef object from a vector, using memmove to do so?
Again, based on what we have seen, the answer is no, we cannot. Erasure is fully specified to use assignments, which yield side effects for a type like IntRef. If we just moved bytes around, those side effects would not be realized, breaking our programs.
The problem
We have reached a contradiction in our design:
- IntRef is trivially relocatable as it satisfies our definition.
- Qt optimizes the erasure of trivial relocatable types by using memmove, but
- IntRef cannot be erased by copying bytes! It needs to be erased by using move assignments.
To solve this contradiction, something has to give in:
- maybe vector should use a different strategy when erasing elements;
- maybe types like IntRef should not be allowed to be stored in a vector;
- maybe Qt should not be using memmove when erasing objects of trivially relocatable type (but it can still optimize the reallocation of a vector);
- maybe Qt’s definition of trivial relocability does not match ours, and we need to fix our definitions.
We will explore these possibilities in the next blog post. Stay tuned!
Overview about all installments:
If you like this article and want to read similar material, consider subscribing via our RSS feed.
Subscribe to KDAB TV for similar informative short video content.
KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.