Software development: Optimization with allocators in C++17

Polymorphic allocators in C++17 help optimize memory allocation for both performance and memory reuse.

Rainer Grimm has been working as a software architect, team and training manager for many years. He enjoys writing articles on the programming languages C++, Python and Haskell, but also enjoys speaking frequently at specialist conferences. On his blog Modern C++ he deals intensively with his passion C++.

performance

The following program comes from cppreference.com/monotonic_buffer_resource. I will expand and explain his performance test for Clang and the MSVC compiler.

// pmrPerformance.cpp
//

#include
#include
#include
#include
#include
#include #include

template
auto benchmark(Func test_func, int iterations) // (1)
{
const auto start = std::chrono::system_clock::now();
while (iterations– > 0)
test_func();
const auto stop = std::chrono::system_clock::now();
const auto secs = std::chrono::duration(stop – start);
return secs.count();
}

int main()
{
constexpr int iterations{100};
constexpr int total_nodes{2’00’000};

auto default_std_alloc = [total_nodes] // (2)
{
std::list list;
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto default_pmr_alloc = [total_nodes] // (3)
{
std::pmr::list list;
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto pmr_alloc_no_buf = [total_nodes] // (4)
{
std::pmr::monotonic_buffer_resource mbr;
std::pmr::polymorphic_allocator pa{&mbr};
std::pmr::list list{pa};
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto pmr_alloc_and_buf = [total_nodes] // (5)
{
// enough to fit in all nodes:
std::array<:byte total_nodes> buffer;
std::pmr::monotonic_buffer_resource mbr{buffer.data(),
buffer.size()};
std::pmr::polymorphic_allocator pa{&mbr};
std::pmr::list list{pa};
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

const double t1 = benchmark(default_std_alloc, iterations);
const double t2 = benchmark(default_pmr_alloc, iterations);
const double t3 = benchmark(pmr_alloc_no_buf , iterations);
const double t4 = benchmark(pmr_alloc_and_buf, iterations);

std::cout This performance test in (1) executes the functions in (2) – (5) a hundred times (constexpr int iterations{100}). Each call to the functions creates a std::pmr::list with two hundred thousand nodes (constexpr int total_nodes{2’00’000}). The nodes of the individual lists are allocated in different ways:

(2): std::list uses the global operator new(3): std::pmr::list uses the special memory resource std::pmr::new_delete_resource(4): std::pmr::list uses std::pmr::monotonic_buffer without a pre-allocated buffer on the stack(5): std::pmr::list uses std::pmr::monotonic_buffer with a pre-allocated buffer on the stack

The comment on the last function (5) claims that there is enough space on the stack to accommodate all nodes: “enough to fit in all nodes“. That was correct on my Linux PC, but not on my Windows PC. Under Linux the standard size for the stack is 8 MB, but under Windows it is only 1 MB. This meant that my program execution under Windows was using the MSVC compiler and the Clang compiler failed silently. I fixed the problem by changing the stack size of my MSVC and Clang executables using editbin.exe:

Here are finally the numbers. The reference value is the assignment with std::list (Line 2). Don’t compare the absolute numbers, but the relative numbers, because I used a virtualized Linux PC and a non-virtual Windows PC. Of course I activated maximum optimization. This means (/Ox) for the MSVC compiler and (-Ox) for the GCC and Clang compilers.

Interestingly, memory allocation with the std::pmr::new_delete_resource memory resource was always the slowest. On the contrary, std::pmr::monotonic_buffer represents the fastest memory allocation. This is especially true when using a pre-allocated buffer on the stack. On Windows, this makes memory allocation about ten times faster.

The Concepts introduced with C++20, along with the Ranges library, modules and coroutines, have redefined how to build modern C++ applications. From November 7th to 9th In 2023, Rainer Grimm will bring you up to date in his intensive workshop C++20: the new concepts will be comprehensively explained and will address the many useful functions that C++20 brings.

The storage resource std::pmr::new_delete_resource offers even more optimization.

Memory reuse

std::pmr::monotonic_buffer allows memory to be reused, so you can avoid freeing memory.

// reuseMemory.cpp

#include
#include
#include
#include
#include
#include

int main() {

std::array<:byte> buf;

for (int i = 0; i myVec{&pool};
for (int j = 0; j This program allocates a std::array with 2000 bytes: std::array<:byte>. This memory allocated by the stack is reused hundreds of times (1). The std::pmr::vector<:prm::string> uses the std::pmr::monotonic_buffer_resource with the upstream memory resource std::pmr::null_memory_resource (2). Finally, 16 strings are pushed onto the vector.

What’s next?

This article concludes my mini-series on polymorphic memory resources in C++17. In my next article I will jump forward three years and continue my journey through C++20. (rme)

To home page

C/C++

Software development: Optimization with allocators in C++17​

performance

What’s next?

Share this:

Related

KOYO – Would You Miss It?

Painting with salt: fall art projects for kids

You may also like