Home » Software development: Optimization with allocators in C++17​

Software development: Optimization with allocators in C++17​

by admin
Software development: Optimization with allocators in C++17​

Software development: Optimization with allocators in C++17​

Polymorphic allocators in C++17 help optimize memory allocation for both performance and memory reuse.

Advertisement

Rainer Grimm has been working as a software architect, team and training manager for many years. He enjoys writing articles on the programming languages ​​C++, Python and Haskell, but also enjoys speaking frequently at specialist conferences. On his blog Modern C++ he deals intensively with his passion C++.

The following program comes from cppreference.com/monotonic_buffer_resource. I will expand and explain his performance test for Clang and the MSVC compiler.

// pmrPerformance.cpp
//

#include
#include
#include
#include
#include
#include #include

template
auto benchmark(Func test_func, int iterations) // (1)
{
const auto start = std::chrono::system_clock::now();
while (iterations– > 0)
test_func();
const auto stop = std::chrono::system_clock::now();
const auto secs = std::chrono::duration(stop – start);
return secs.count();
}

int main()
{
constexpr int iterations{100};
constexpr int total_nodes{2’00’000};

auto default_std_alloc = [total_nodes] // (2)
{
std::list list;
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto default_pmr_alloc = [total_nodes] // (3)
{
std::pmr::list list;
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto pmr_alloc_no_buf = [total_nodes] // (4)
{
std::pmr::monotonic_buffer_resource mbr;
std::pmr::polymorphic_allocator pa{&mbr};
std::pmr::list list{pa};
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

auto pmr_alloc_and_buf = [total_nodes] // (5)
{
// enough to fit in all nodes:
std::array<:byte total_nodes> buffer;
std::pmr::monotonic_buffer_resource mbr{buffer.data(),
buffer.size()};
std::pmr::polymorphic_allocator pa{&mbr};
std::pmr::list list{pa};
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};

const double t1 = benchmark(default_std_alloc, iterations);
const double t2 = benchmark(default_pmr_alloc, iterations);
const double t3 = benchmark(pmr_alloc_no_buf , iterations);
const double t4 = benchmark(pmr_alloc_and_buf, iterations);

std::cout This performance test in (1) executes the functions in (2) – (5) a hundred times (constexpr int iterations{100}). Each call to the functions creates a std::pmr::list with two hundred thousand nodes (constexpr int total_nodes{2’00’000}). The nodes of the individual lists are allocated in different ways:

(2): std::list uses the global operator new(3): std::pmr::list uses the special memory resource std::pmr::new_delete_resource(4): std::pmr::list uses std::pmr::monotonic_buffer without a pre-allocated buffer on the stack(5): std::pmr::list uses std::pmr::monotonic_buffer with a pre-allocated buffer on the stack

See also  C++23: A new way of error handling with std::expected

The comment on the last function (5) claims that there is enough space on the stack to accommodate all nodes: “enough to fit in all nodes“. That was correct on my Linux PC, but not on my Windows PC. Under Linux the standard size for the stack is 8 MB, but under Windows it is only 1 MB. This meant that my program execution under Windows was using the MSVC compiler and the Clang compiler failed silently. I fixed the problem by changing the stack size of my MSVC and Clang executables using editbin.exe:

Here are finally the numbers. The reference value is the assignment with std::list (Line 2). Don’t compare the absolute numbers, but the relative numbers, because I used a virtualized Linux PC and a non-virtual Windows PC. Of course I activated maximum optimization. This means (/Ox) for the MSVC compiler and (-Ox) for the GCC and Clang compilers.

Interestingly, memory allocation with the std::pmr::new_delete_resource memory resource was always the slowest. On the contrary, std::pmr::monotonic_buffer represents the fastest memory allocation. This is especially true when using a pre-allocated buffer on the stack. On Windows, this makes memory allocation about ten times faster.

The Concepts introduced with C++20, along with the Ranges library, modules and coroutines, have redefined how to build modern C++ applications. From November 7th to 9th In 2023, Rainer Grimm will bring you up to date in his intensive workshop C++20: the new concepts will be comprehensively explained and will address the many useful functions that C++20 brings.

The storage resource std::pmr::new_delete_resource offers even more optimization.

See also  The formatting library in C++20: Details about the format string

Memory reuse

std::pmr::monotonic_buffer allows memory to be reused, so you can avoid freeing memory.

// reuseMemory.cpp

#include
#include
#include
#include
#include
#include

int main() {

std::array<:byte> buf;

for (int i = 0; i myVec{&pool};
for (int j = 0; j This program allocates a std::array with 2000 bytes: std::array<:byte>. This memory allocated by the stack is reused hundreds of times (1). The std::pmr::vector<:prm::string> uses the std::pmr::monotonic_buffer_resource with the upstream memory resource std::pmr::null_memory_resource (2). Finally, 16 strings are pushed onto the vector.

This article concludes my mini-series on polymorphic memory resources in C++17. In my next article I will jump forward three years and continue my journey through C++20. (rme)

To home page

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy