The Rules of the 5

The Rules of the 5

25 Jun 2018    

In this TotW, we’ll look at how the special member functions can affect code gen. But before we do that, let’s look at the semantics of how adding a special member affects other members. The following picture, taken from Howard Hinnant’s talk perfectly summarizes this:

Slide

Take a minute or two to go through the above image, or even checkout the talk if you’re interested.

Now, to start off, I’d like to clarify a few things. Be prepared, this is a long one!

Firstly, adding = default to a special member is pretty much the same as not declaring it and letting the compiler generate one for you. The best thing you can do is, don’t declare OR default any special member unless you absolutely need to.

Secondly, adding = default anywhere but the (first) declaration is very different from the above. The compiler doesn’t know till the link stage that the member function being declared isn’t anything fancy. Semantically, it means the class/struct will NOT be trivially constructible/copyable/movable. This can cause problems with value semantics and can be a major deciding factor for qualifying a class/struct as a POD type. Do NOT do this unless you actually need it (eg: when anchoring a virtual destructor).

// data.h
class Data
{
    int m_data;
    float m_bytes_loaded;
public:
    Data(); // Adding = default here makes it trivially constructible
};

// data.cpp

// NOT trivially constructible
// Even if this is moved into data.h and made non-inline
Data::Data() = default;

Lastly, constructors don’t really affect code gen/copying/moving that much. Even user defined constructors!

To begin, we’re going to take a look at a value type and see how each special member affects the code gen. Let’s take a simple String_view class as an example. This class is trivially copyable (and therefore trivially destructible) and trivially constructible as well. Since it’s trivially copyable, moving it won’t really make a difference.

class String_view
{
    const char* m_data;
    int m_size;
public:
    // Nothing here right now
};

And for our test, we’re going to use the following code:

// Tell compiler to not optimise these out
extern void print_sv(String_view v); // by value
extern String_view get_sv();

void test_local_scope()
{
    String_view view;
    
    // code here that populates view

    print_sv(view);
}

void test_extern_scope()
{
    auto view = get_sv();

    // some other code that uses view
    
    print_sv(view);
}

void test_get_copy_assign(String_view& view)
{
    auto sv = get_sv();

    // branches that may conditionally assign to view
    view = sv;
}

void test_extern_assign(String_view& view)
{
    view = get_sv();
    
    print_sv(view);
}

To continue, load the code above on godbolt with a diff viewer and follow along as you read. It’ll be easier to explain and you can see the difference yourself. (For testing this on all platforms you can use this link: https://godbolt.org/g/pJ5MiT )

Tip: You can disable the minimap sidebar in Godbolt if it’s intrusive: More -> Settings -> Show editor minimap

When you load up the link above you should see something like this:-

Screenshot

Make sure you see all 5 tabs, if not resize your windows. Also, often when loading a link you might get an error Remote Compilation failed. To get around this just add a whitespace or change the compiler (and back) for that source to recompile and you should see the assembly diff with no changes. Or just refresh the page once you have loaded the link, the code and layout should remain in the browser cache. Once you’re done, it’s now a good time to walk through the code and disassembly to familiarise yourself with it. Any changes you make when following along should be made to the source editor (#2) on the right so we can compare it against the unchanged, compiler generated version of the class.

A quick rundown of what each function is doing:

  • local_scope
    • Simply calls print_sv
  • extern_scope
    • Save rax then call get_sv which writes to rax & edx
    • Setup m_data as rdi and m_size as esi
    • Call print_sv
    • Note: The only copy being performed is to setup the function call as per the calling convention. Not really a copy since it’s necessary.
  • get_copy_assign
    • Save view reference into rbx
    • Call get_sv which writes to rax & edx
    • Copy the m_data into the reference/pointer location of view, prev saved into rbx
    • Copy the m_size into the reference/pointer location of view + 8, after the pointer m_data
    • Note: The only copy being performed is to copy the data into the reference view
  • extern_assign
    • Save view reference into rbx
    • Call get_sv which writes to rax & edx
    • Copy the m_data into the reference/pointer location of view, prev saved into rbx
    • Copy the m_size into the reference/pointer location of view + 8, after the pointer m_data
    • Setup m_data as rdi and m_size as esi
    • Call print_sv
    • Note: The only copy being performed is to copy the data into the reference view

For the time being, let’s ignore the copies being performed to setup the function call. So only one copy is being made in get_copy_assign and extern_assign. Also, GCC for the above case produces a similar result and additionally does some cleanup + buffering.

Destructive Destructors

Let’s begin by adding an explicitly defaulted ~String_view() = default; destructor to the String_view class (to the right hand #2 editor) to see if it changes anything. Go ahead and make that change now. You’ll notice that it doesn’t change anything. It’s still the same as the compiler generated one.

Next default the destructor out of the class scope like such:

class String_view
{
    // ...
public:
    ~String_view();
};

String_view::~String_view() = default;

You’ll now notice a ton of changes, we’ll come back to these in a moment, but do take note that defining it non-inline has led to more/worse code gen. This is because the class is now NOT trivially copyable! So declaring a destructor non-inline can actually hurt performance! There are cases when non-inline destructors can actually be required but that’s a discussion for another day.

Keeping the above changes on the right, add an empty inline destructor to the class on the left:

class String_view
{
    // ...
public:
    ~String_view() //empty inline destructor
    {
    }
};

You should now notice that there is no difference between these two cases. It’s similar to declaring an empty destructor (regardless of inline or not).

Revert the changes to the editor on the left and keep the one on the right as is. You should now have your left editor window same as what you started with and the one on the right with the destructor defined as default but non-inline.

Let’s go through what each function is doing at this point:

  • local_scope
    • Save stack space
    • Setup rdi with a pointer address somewhere on the stack
    • Call print_sv
    • Release stack space
    • Note: Extra instructions for stack management
  • extern_scope
    • Save stack space for
    • Setup rdi with a pointer address somewhere on the stack
    • Call print_sv
    • Using the previously used stack pointer for view load the returned data into xmm0
    • Copy that data into start of the saved stack (rsp)
    • Setup edi as the first argument, the pointer to the start of the saved stack (step above)
    • Release stack space
    • Note: As soon as the class stopped being trivially copyable, the compiler has to resort to passing everything around using the stack. 2 copies being made here.
  • get_copy_assign
    • Save stack space
    • Save view reference/pointer into rbx
    • Call get_sv, returned value is now at the base of pointer rsp
    • Copy the m_size from returned value rsp into parameter view (rbx + 8 bytes) using eax as temp
    • Copy the m_size from returned value rsp into parameter view (rbx) using rax as temp
    • Release stack space
    • Note: The value returned by get_sv is copied into and from the stack. 2 copies.
  • extern_assign
    • This one is a mix of the previous two cases
    • Note: The 3 copies here are
      • get_sv into the stack return value
      • Stack return value to reference view
      • New stack space / value for setting up print_sv

From the above, you can see that there are extra copies and a whole lot more instructions being used to address the non-triviality of the class. This is because of the Sys-V ABI specification, look at the last paragraph on Page 18. It specifies that if a class has a non-trivial copy member or destructor then the argument must be passed by an inivisible reference, which is why the caller must create a copy for the callee to consume. This is however only applicable to class that are small enough (<= 48 bytes, about 12 ints), if it’s anything bigger than that, then everything will be passed on the stack by default. ARM (32 & 64) follows a similar pattern. This also shows that making even such a tiny change can cause ABI breakages in C++!

But can we improve on this? What if we tell the compiler what & how to create a copy? Add the following code to the public label in the String_view class:

    String_view() = default;

    String_view(const String_view& other)
    : m_data(other.m_data), m_size(other.m_size)
    {}

    String_view& operator=(const String_view& other)
    {
        m_data = other.m_data;
        m_size = other.m_size;
        return *this;
    }

You’ll notice that it doesn’t really change much. If anything, it possibly makes it worse by not using packed SSE moves for copying. You can compare it against the compiler generated version by making the change to the left hand side. Make sure you revert the LHS before moving on.

What if we delete the copy members, define the move operations and excplicity std::move everything? Change the String_view class to:

#include <utility>

class String_view
{
    const char* m_data;
    int m_size;
public:
    String_view() = default;
    
    String_view(String_view&& other)
    : m_data(other.m_data), m_size(other.m_size)
    {}

    String_view& operator=(String_view&& other)
    {
        m_data = other.m_data;
        m_size = other.m_size;
        return *this;
    }

    String_view(const String_view& other) = delete;
    String_view& operator=(const String_view& other) = delete;

    ~String_view();
};

And change the test_ functions to:

void test_local_scope()
{
    String_view view;
    
    print_sv(std::move(view));
}

void test_extern_scope()
{
    auto view = get_sv();
    
    print_sv(std::move(view));
}

void test_get_copy_assign(String_view& view)
{
    auto sv = get_sv();

    view = std::move(sv);
}

void test_extern_assign(String_view& view)
{
    view = get_sv();
    
    print_sv(std::move(view)); // Not valid but just go with it for now
}

You’ll notice it has the exact same effect as declaring the copy members; after all it is a trivially copyable class.

This is a brief guide on how codegen is affected by langauge semantics but should hopefully give you enough information to allow you to experiment further.