Cost of RTTI

Cost of RTTI

17 Sep 2018    

Polymorphism is a widely used technique that allows for abstracting an interface with different behaviours underneath. The most common way of doing this is using virtual functions and dynamic_cast. While this is a language supported feature, it’s not the only possible implementation of polymorphism. Let’s look at how dynamic_cast works, a few downsides of using it and some alternative implementations.

How dynamic_cast works

To begin let’s take a brief look at how dynamic cast works. Given the following code sample:

class Shape
{
public:
virtual int get_area() {return 0;}
};

class Circle : public Shape
{
public:
   int get_area() override {return 1;}
};

class Square : public Shape
{
public:
   int get_area() override {return 2;}
};

Which can be loaded up on godbolt here. Load up the Godbolt link and scroll all the way down in the assembly window. You’ll see a bunch of labels typeinfo ** and vtable for **. These are of course the typeinfo struct generated for the classes as well as the vtables respectively.

Let’s first look at the create function there. If you look at the corresponding generated assembly for that function, you’ll notice that it brahces based on the input parameter (x) but when it returns a pointer it is based on the vtable of the object that was created, also showing that the pointer to the vtable goes at the start of the class layout, followed by other relevant members. Now, looking at the codegen for the compute_area function, which essential loads up the passed in pointer into a register and calls the function at that address, which will always correspond to the correct vtable.

Now, moving on to the get_pi function and it’s associated codegen, let’s look at the actual implementation of __dynamic_cast, the function that performs the runtime checking and returns the appropriate vtable or null of it fails. To do this, load up Clang’s dynamic cast implementation. Firstly, from the codegen of the get_pi function you’ll see that it populates extra registers before calling __dynamic_cast, those registers correspond to the parameters of the actual dynamic cast function. The __dynamic_cast essentially walks a graph (or list) of pointers corresponding to the hierarchy of the class. It starts off by getting the appropriate virtual tables and depending on the destination type either searches “up” the graph or “down” the graph depending on downcast, upcast or cross-cast. It has added checks to ensure that the type information that it is trying to traverse has been “marked visible” by the compiler. When it finally finds the correct type it assigns it to the local variable and returns.

Cons

Dynamic cast is slow. Which is no surprise considering the amount of checks that need to happen along with the jumps within the instruction cache. It also increases the binary size and memory used since type information will be generated for every vtable. Given the above issues, there are a lot of codebases that completely disable RTTI or roll out their own internal implementation.

Additionally, using dynamic_cast is generally a sign that your class interaction and design may be flawed. Querying the type of an object at runtime means the code probably doesn’t completely abide by the Liskov Substitution Principle which is important in keeping behaviour consistent across a class hierarchy.

Alternatives

There are quite a few alternative implementations that achieve similar results to dynamic_cast.

Checked Cast

This is a quick & easy but a dirty win. Given the following code:

const auto pointer = checked_cast<Derived*>(ptr);

The checked_cast function essentially performs a dynamic_cast in debug mode and during testing so you can catch errors but uses static_cast in release mode for full performance benefits. It is of course risky since an incorrect pointer will result in UB.

Enum Checking

This is another way of implementing a rather cheap and accurate polymorphic implementation. It involves assigning an enum value to all the derived classes and storing it in the base class. This is generally the preferable alternative.

Note: The following implementation is taken from LLVM

Given the following example:

// Assuming Shape is abstract
class Shape
{
public:
  enum class Kind
  {
    Shape,
    Circle,
    Square
  };
  
  Kind get_kind() const;
};

class Circle : public Shape
{
public:
  bool is_a(const Shape* ptr) {
    return ptr->get_kind() == Kind::Circle;
    }
};

class Square : public Shape
{
public:
  bool is_a(const Shape* ptr) {
    return ptr->get_kind() == Kind::Square;
    }
};

Now once the base class holds the value of the current Type you can query it using a custom dyn_cast function as such:

const auto circle = dyn_cast<Circle>(shape_ptr);

and a rough implementation of dyn_cast would be:

if (Circle::is_a(ptr))
{
  return static_cast<Circle*>(ptr);
}

return nullptr;

This allows for a very cheap and fast abstraction for dynamically checking the type.

Custom Template Design

There are a few other implementations out there which use templates with some custom packing and dispatching to achieve something similar. The main two examples are explained in the following talks:

Reduced Polymorphism

After all polymorphism tries to produce a form of type erasure and checking the type of the pointer often defeats that purpose. And also violates the Liskov Substituion Principle which is essential in ensuring coherent design across a codebase. In this talk by Sean Parent, he mentions how to try and get rid of inheritance, which might be a useful starting point if and when class hierarchies start to lose their meaning.