Back to TILs

C++ class size

Date: 2023-01-14Last modified: 2023-03-16

Table of contents

Introduction

Data alignment is a key feature in computing on modern computer hardware. The CPU reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data’s memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.

In addition to the performance, data alignment is also the assumption of many programming languages. Even though the programming languages try to take care of data alignment for us as much as possible, some low-level programming languages can have misaligned data access while the behavior is undefined.

Data alignment

A memory address a is said to be n-byte aligned when a is a multiple of n (where n is a power of 2).

Suppose we have a piece of m-byte data and a n-byte aligned address. If m is not divisible by n, the m-byte data will be padded to

m+n1n

byte data.

Accessing kn+1, kn+2, , (k+1)n bytes byte data all have the same latency, because the CPU reads data from memory n-byte a time and those data will usually be cached in CPU. That is to say, if the data storing on n-byte aligned address whose storage size m is not a multiple of the n, some of the memory access bandwidth is wasted.

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition one-byte memory accesses are always aligned. Theoretically, it is possible to access on a n-byte data on a memory address which is not a multiple of n, with much more memory access bandwidth wasted.

However, because C and C++ standards assumed aligned memory access, accessing a misaligned address might result in undefined behaviors.

Helpers

The helper macro CHECK_SIZE_AND_ALIGNMENT is used to display the size of a type or object and check its size and alignment.

#define CHECK_SIZE_AND_ALIGNMENT(type, size, alignment)                       \
  cout << setw(70) << "sizeof(" #type ") = " << sizeof(type) << " | "         \
       << alignment << endl;                                                  \
  static_assert(sizeof(type) == size,                                         \
                "The size of " #type " should be equal to " #size);           \
  static_assert(alignof(type) == alignment, "The align requirement of " #type \
                                            " should be equal to " #alignment)

The standard new operator is overrided to print the amount of memory allocated on each call.

bool is_new_print_enabled = false;
void* operator new(size_t size) {
  if (is_new_print_enabled) {
    cout << "Allocating " << size << " bytes\n";
  }
  return malloc(size);
}

The standard delete operator is overrided to show when it is called.

// void operator delete(void* memory) {
//   if (is_new_print_enabled) {
//     cout << "Deallocating memory from " << memory << "\n";
//   }
//   free(memory);
// }

Empty class/struct — size: 1, align: 1

Empty class/struct must require ONE byte of memory to make then distinguishable.

  struct EmptyStruct {};
  CHECK_SIZE_AND_ALIGNMENT(EmptyStruct, 1, 1);
  static_assert(sizeof(EmptyStruct) == 1,
                "The size of an empty struct should be equal to 1");
  class EmptyClass {};
  CHECK_SIZE_AND_ALIGNMENT(EmptyClass, 1, 1);
  static_assert(sizeof(EmptyStruct) == 1,
                "The size of an empty class should be equal to 1");
  EmptyClass ea, eb;
  static_assert(&ea != &eb,
                "The address of two empty class should be distinguishable");
  class EmptyClassWithDefaultConstructor {
   public:
    EmptyClassWithDefaultConstructor() = default;
  };
  CHECK_SIZE_AND_ALIGNMENT(EmptyClassWithDefaultConstructor, 1, 1);
  class EmptyClassWithDefaultConstructorAndDestructor {
   public:
    EmptyClassWithDefaultConstructorAndDestructor() = default;
    ~EmptyClassWithDefaultConstructorAndDestructor() = default;
  };
  CHECK_SIZE_AND_ALIGNMENT(EmptyClassWithDefaultConstructorAndDestructor, 1, 1);
  class EmptyClassWithDefaultConstructorAndVirtualDestructor {
   public:
    EmptyClassWithDefaultConstructorAndVirtualDestructor() = default;
    virtual ~EmptyClassWithDefaultConstructorAndVirtualDestructor() = default;
  };
  CHECK_SIZE_AND_ALIGNMENT(EmptyClassWithDefaultConstructorAndVirtualDestructor,
                           8, 8);
Memory usage by EmptyClassWithDefaultConstructorAndVirtualDestructor.
Fig. 1 - Memory usage by EmptyClassWithDefaultConstructorAndVirtualDestructor.

Union — many attributes on same memory location

  typedef union {
    char a;
    int b;
  } a_union_t;
  CHECK_SIZE_AND_ALIGNMENT(a_union_t, 4, 4);
  cout << "Offset of a_union_t::a " << offsetof(a_union_t, a) << endl;  // 0
  cout << "Offset of a_union_t::b " << offsetof(a_union_t, b) << endl;  // 0
Memory usage by a_union_t.
Fig. 2 - Memory usage by a_union_t.

Default member size alignment

  typedef struct {
    char a;
    int b;
  } b_struct_t;
  CHECK_SIZE_AND_ALIGNMENT(b_struct_t, 8, 4);

  cout << "Offset of b_struct_t::a " << offsetof(b_struct_t, a) << endl;  // 0
  cout << "Offset of b_struct_t::b " << offsetof(b_struct_t, b) << endl;  // 4
Memory usage by b_struct_t.
Fig. 3 - Memory usage by b_struct_t.
  class c_class {
   public:
    char a;
    int b;
    int f1() { return 1; }
    int f2() { return 2; }
    int f3() { return 3; }
    int f4() { return 4; }
  };
  CHECK_SIZE_AND_ALIGNMENT(c_class, 8, 4);
  cout << "Offset of c_class::a " << offsetof(c_class, a) << endl;  // 0
  cout << "Offset of c_class::b " << offsetof(c_class, b) << endl;  // 4
Memory usage by c_class.
Fig. 4 - Memory usage by c_class.
  class d_class {
   public:
    char a;
    int b;
    virtual ~d_class(){};
  };
  CHECK_SIZE_AND_ALIGNMENT(d_class, 16, 8);
  // warning: offset of on non-standard-layout type 'd_class'
  // [-Winvalid-offsetof]
  // cout << "Offset of d_class::a " << offsetof(d_class, a) << endl;  // ?
  // cout << "Offset of d_class::b " << offsetof(d_class, b) << endl;  // ?
  //
  // *** Dumping AST Record Layout
  //          0 | class d_class
  //          0 |   (d_class vtable pointer)
  //          8 |   char a
  //         12 |   int b
  //            | [sizeof=16, dsize=16, align=8,
  //            |  nvsize=16, nvalign=8]
Memory usage by d_class.
Fig. 5 - Memory usage by d_class.
  class e_class {
   public:
    char a;
    int b;
    ~e_class(){};
  };
  CHECK_SIZE_AND_ALIGNMENT(e_class, 8, 4);
  cout << "Offset of e_class::a " << offsetof(e_class, a) << endl;  // ?
  cout << "Offset of e_class::b " << offsetof(e_class, b) << endl;  // ?
  //
  // *** Dumping AST Record Layout
  //          0 | class e_class
  //          0 |   char a
  //          4 |   int b
  //            | [sizeof=8, dsize=8, align=4,
  //            |  nvsize=8, nvalign=4]
Memory usage by e_class.
Fig. 6 - Memory usage by e_class.
  class f_class {
   public:
    char a;   // 4
    int b;    // 4
    float c;  // 4
    ~f_class(){};
  };
  CHECK_SIZE_AND_ALIGNMENT(f_class, 12, 4);

  is_new_print_enabled = true;
  {
    auto f_class_obj_1 = new f_class;             // Allocating 12 bytes
    auto f_class_obj_2 = make_unique<f_class>();  // Allocating 12 bytes
    auto f_class_obj_3 = make_shared<f_class>();  // Allocating 32 bytes

    cout << "Offset of f_class::a " << offsetof(f_class, a) << endl;  // 0
    cout << "Offset of f_class::b " << offsetof(f_class, b) << endl;  // 4
    cout << "Offset of f_class::c " << offsetof(f_class, c) << endl;  // 8

    delete f_class_obj_1;
  }
  is_new_print_enabled = false;
Memory usage by f_class.
Fig. 7 - Memory usage by f_class.
  class g_class {
   public:
    char a;
    int b;
    float c;
    double d;
    char e;
    char f;
    ~g_class(){};
  };
  CHECK_SIZE_AND_ALIGNMENT(g_class, 32, 8);

  cout << "Offset of g_class::a " << offsetof(g_class, a) << endl;  // 0
  cout << "Offset of g_class::b " << offsetof(g_class, b) << endl;  // 4
  cout << "Offset of g_class::c " << offsetof(g_class, c) << endl;  // 8
  cout << "Offset of g_class::d " << offsetof(g_class, d) << endl;  // 16
  cout << "Offset of g_class::e " << offsetof(g_class, e) << endl;  // 24
  cout << "Offset of g_class::f " << offsetof(g_class, f) << endl;  // 25
Memory usage by g_class.
Fig. 8 - Memory usage by g_class.
  class g2_class {
   public:
    char a;
    char e;
    char f;
    int b;
    float c;
    double d;
    ~g2_class(){};
  };
  CHECK_SIZE_AND_ALIGNMENT(g2_class, 24, 8);
  cout << "Offset of g2_class::a " << offsetof(g2_class, a) << endl;  // 0
  cout << "Offset of g2_class::b " << offsetof(g2_class, b) << endl;  // 4
  cout << "Offset of g2_class::c " << offsetof(g2_class, c) << endl;  // 8
  cout << "Offset of g2_class::d " << offsetof(g2_class, d) << endl;  // 16
  cout << "Offset of g2_class::e " << offsetof(g2_class, e) << endl;  // 1
  cout << "Offset of g2_class::f " << offsetof(g2_class, f) << endl;  // 2
Memory usage by g2_class.
Fig. 9 - Memory usage by g2_class.

Force alignment

  struct float4_4_t {
    float data[4];
  };
  CHECK_SIZE_AND_ALIGNMENT(float4_4_t, 16, 4);
  float4_4_t f4;
  for (int i = 0; i < 4; ++i) {
    cout << "Offset float4_4_t[" << i
         << "] = " << long(&f4.data[i]) - long(&f4) << endl;
  }
  // Offset float4_4_t[0] = 0
  // Offset float4_4_t[1] = 4
  // Offset float4_4_t[2] = 8
  // Offset float4_4_t[3] = 12
Memory usage by float4_4_t.
Fig. 10 - Memory usage by float4_4_t.
  // Every object of type float4_32_t will be aligned to 32-byte boundary.
  // Might be useful for SIMD instructions.
  struct alignas(32) float4_32_t {
    float data[4];
  };
  CHECK_SIZE_AND_ALIGNMENT(float4_32_t, 32, 32);
  float4_32_t f32;
  for (int i = 0; i < 4; ++i) {
    cout << "Offset float4_32_t[" << i
         << "] = " << long(&f32.data[i]) - long(&f32) << endl;
  }
  // Offset float4_32_t[0] = 0
  // Offset float4_32_t[1] = 4
  // Offset float4_32_t[2] = 8
  // Offset float4_32_t[3] = 12
Memory usage by float4_32_t.
Fig. 11 - Memory usage by float4_32_t.

Memory Allocation

According to the [GNU

the address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). The default memory address alignment of array is determined by the alignment requirement of the element.

It is possible to use custom data alignment for allocated static memory and dynamic memory. alignas(T) can be used to specify the byte alignment of an static array and aligned_alloc can be used to specify the byte alignment of a buffer on dynamic memory.

  struct S1 {
    unsigned char buf1[sizeof(int) / sizeof(char)];
  };
  CHECK_SIZE_AND_ALIGNMENT(S1, 4, 1);
  struct S2 {
    alignas(int) unsigned char buf2[sizeof(int) / sizeof(char)];
  };
  CHECK_SIZE_AND_ALIGNMENT(S2, 4, 4);

Virtual table

For every class that contains virtual functions, the compiler constructs a virtual table, a.k.a vtable. The vtable contains an entry for each virtual function accessible by the class and stores a pointer to its definition. Only the most specific function definition callable by the class is stored in the vtable. Entries in the vtable can point to either functions declared in the class itself, or virtual functions inherited from a base class.

  class B {
   public:
    B() { puts("This is B's constructor"); }
    virtual ~B() { puts("This is B's destructor"); }
    virtual void bar() { puts("This is B's implementation of bar"); }
    virtual void qux() { puts("This is B's implementation of qux"); }
  };

  class C : public B {
   public:
    C() { puts("This is C's constructor"); }
    virtual ~C() { puts("This is C's destructor"); }
    void bar() override { puts("This is C's implementation of bar"); }
  };

  B* b = new C();
  b->bar();
  delete b;

  // This is B's constructor
  // This is C's constructor
  // This is C's implementation of bar
  // This is C's destructor
  // This is B's destructor
Virtual table example.
Fig. 12 - Virtual table example.

Note that the vpointer is just another class member added by the compiler and increases the size of every object that has a vtable by sizeof(vpointer).

Hopefully you have grasped how dynamic function dispatch can be implemented by using vtables: when a call to a virtual function on an object is performed, the vpointer of the object is used to find the corresponding vtable of the class. Next, the function name is used as index to the vtable to find the correct (most specific) routine to be executed.

Virtual destructor

Possible output

                                                sizeof(EmptyStruct) = 1 | 1
                                                 sizeof(EmptyClass) = 1 | 1
                           sizeof(EmptyClassWithDefaultConstructor) = 1 | 1
              sizeof(EmptyClassWithDefaultConstructorAndDestructor) = 1 | 1
       sizeof(EmptyClassWithDefaultConstructorAndVirtualDestructor) = 8 | 8
                                                  sizeof(a_union_t) = 4 | 4
Offset of a_union_t::a 0
Offset of a_union_t::b 0
                                                 sizeof(b_struct_t) = 8 | 4
Offset of b_struct_t::a 0
Offset of b_struct_t::b 4
                                                    sizeof(c_class) = 8 | 4
Offset of c_class::a 0
Offset of c_class::b 4
                                                    sizeof(d_class) = 16 | 8
                                                    sizeof(e_class) = 8 | 4
Offset of e_class::a 0
Offset of e_class::b 4
                                                    sizeof(f_class) = 12 | 4
Allocating 12 bytes
Allocating 12 bytes
Allocating 32 bytes
Offset of f_class::a 0
Offset of f_class::b 4
Offset of f_class::c 8
                                                    sizeof(g_class) = 32 | 8
Offset of g_class::a 0
Offset of g_class::b 4
Offset of g_class::c 8
Offset of g_class::d 16
Offset of g_class::e 24
Offset of g_class::f 25
                                                   sizeof(g2_class) = 24 | 8
Offset of g2_class::a 0
Offset of g2_class::b 4
Offset of g2_class::c 8
Offset of g2_class::d 16
Offset of g2_class::e 1
Offset of g2_class::f 2
                                                 sizeof(float4_4_t) = 16 | 4
Offset float4_4_t[0] = 0
Offset float4_4_t[1] = 4
Offset float4_4_t[2] = 8
Offset float4_4_t[3] = 12
                                                sizeof(float4_32_t) = 32 | 32
Offset float4_32_t[0] = 0
Offset float4_32_t[1] = 4
Offset float4_32_t[2] = 8
Offset float4_32_t[3] = 12
                                                         sizeof(S1) = 4 | 1
                                                         sizeof(S2) = 4 | 4
This is B's constructor
This is C's constructor
This is C's implementation of bar
This is C's destructor
This is B's destructor

References