Software development: Optimization with allocators in C++17
Polymorphic allocators in C++17 help optimize memory allocation for both performance and memory reuse.
Advertisement
Rainer Grimm has been working as a software architect, team and training manager for many years. He enjoys writing articles on the programming languages C++, Python and Haskell, but also enjoys speaking frequently at specialist conferences. On his blog Modern C++ he deals intensively with his passion C++.
performance
The following program comes from cppreference.com/monotonic_buffer_resource. I will expand and explain his performance test for Clang and the MSVC compiler.
// pmrPerformance.cpp
//
#include
#include
#include
#include
#include
#include
#include
template
auto benchmark(Func test_func, int iterations) // (1)
{
const auto start = std::chrono::system_clock::now();
while (iterations– > 0)
test_func();
const auto stop = std::chrono::system_clock::now();
const auto secs = std::chrono::duration
return secs.count();
}
int main()
{
constexpr int iterations{100};
constexpr int total_nodes{2’00’000};
auto default_std_alloc = [total_nodes] // (2)
{
std::list
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};
auto default_pmr_alloc = [total_nodes] // (3)
{
std::pmr::list
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};
auto pmr_alloc_no_buf = [total_nodes] // (4)
{
std::pmr::monotonic_buffer_resource mbr;
std::pmr::polymorphic_allocator
std::pmr::list
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};
auto pmr_alloc_and_buf = [total_nodes] // (5)
{
// enough to fit in all nodes:
std::array<:byte total_nodes> buffer;
std::pmr::monotonic_buffer_resource mbr{buffer.data(),
buffer.size()};
std::pmr::polymorphic_allocator
std::pmr::list
for (int i{}; i != total_nodes; ++i)
list.push_back(i);
};
const double t1 = benchmark(default_std_alloc, iterations);
const double t2 = benchmark(default_pmr_alloc, iterations);
const double t3 = benchmark(pmr_alloc_no_buf , iterations);
const double t4 = benchmark(pmr_alloc_and_buf, iterations);
std::cout << std::fixed << std::setprecision(3) << "t1 (default std alloc): " << t1 << " sec; t1/t1: " << t1/t1 << 'n' << "t2 (default pmr alloc): " << t2 << " sec; t1/t2: " << t1/t2 << 'n' << "t3 (pmr alloc no buf): " << t3 << " sec; t1/t3: " << t1/t3 << 'n' << "t4 (pmr alloc and buf): " << t4 << " sec; t1/t4: " << t1/t4 << 'n'; }
This performance test in (1) executes the functions in (2) – (5) a hundred times (constexpr int iterations{100}). Each call to the functions creates a std::pmr::list
(2): std::list
The comment on the last function (5) claims that there is enough space on the stack to accommodate all nodes: “enough to fit in all nodes“. That was correct on my Linux PC, but not on my Windows PC. Under Linux the standard size for the stack is 8 MB, but under Windows it is only 1 MB. This meant that my program execution under Windows was using the MSVC compiler and the Clang compiler failed silently. I fixed the problem by changing the stack size of my MSVC and Clang executables using editbin.exe:
Here are finally the numbers. The reference value is the assignment with std::list
Interestingly, memory allocation with the std::pmr::new_delete_resource memory resource was always the slowest. On the contrary, std::pmr::monotonic_buffer represents the fastest memory allocation. This is especially true when using a pre-allocated buffer on the stack. On Windows, this makes memory allocation about ten times faster.
The Concepts introduced with C++20, along with the Ranges library, modules and coroutines, have redefined how to build modern C++ applications. From November 7th to 9th In 2023, Rainer Grimm will bring you up to date in his intensive workshop C++20: the new concepts will be comprehensively explained and will address the many useful functions that C++20 brings.
The storage resource std::pmr::new_delete_resource offers even more optimization.
Memory reuse
std::pmr::monotonic_buffer allows memory to be reused, so you can avoid freeing memory.
// reuseMemory.cpp
#include
#include
#include
#include
#include
#include
int main() {
std::array<:byte> buf;
for (int i = 0; i < 100; ++i) { // (1)
std::pmr::monotonic_buffer_resource pool{buf.data(),
buf.size(), // (2)
std::pmr::null_memory_resource()};
std::pmr::vector<:pmr::string> myVec{&pool};
for (int j = 0; j < 16; ++j) { // (3)
myVec.emplace_back("A short string");
}
}
}
This program allocates a std::array with 2000 bytes: std::array<:byte>. This memory allocated by the stack is reused hundreds of times (1). The std::pmr::vector<:prm::string> uses the std::pmr::monotonic_buffer_resource with the upstream memory resource std::pmr::null_memory_resource (2). Finally, 16 strings are pushed onto the vector.
What’s next?
This article concludes my mini-series on polymorphic memory resources in C++17. In my next article I will jump forward three years and continue my journey through C++20. (rme)
To home page