I designed this allocator as a tweak to the system described by Misha Shalem in his talk Practical Memory Pool Based Allocators for Modern C++: most of the code you will see in this post is a modification of the code presented by him at CppCon 2020.
If you are taking your first steps into the world of C++ custom memory allocators, I strongly recommend this talk, which I found really informative and interesting, and it gave me the core idea for writing my own allocator.
Context
I am building an Entity Component System in C++20 for fun and learning purposes. After implementing everything needed for a complete run of the system (entity creation and removal, component getters, archetype querying, and systems handling), I am now trying to optimize the entire thing.
std::set
is one of the most used STL containers that effectively implements an ordered set as a Red-Black tree. Under the hood, each node of such a tree is allocated individually with a single call to malloc(), which can become quite expensive if performed multiple times every frame. The immediate solution would be reserving a pool of memory blocks that can be queried and occupied by the std::set
and immediately returned without calling malloc()
every single time.
This leads us into the realm of C++ custom memory allocators. In his talk, Misha Shalem defines a bucket-based pool allocator, where we can define multiple memory pools, i.e., static portions of pre-allocated buckets of fixed sizes, from which we retrieve pointers for our containers. Each memory pool is identified by a unique numerical ID, and it implements a set of buckets whose sizes and block counts are defined in a struct named bucket_descriptor
. Each bucket has a pointer to the allocated data and a bitmask that tells which blocks are free.
Starting from this implementation, I modified the memory pool system to adapt it to my needs:
- I didn’t want the memory pools to be identified by inline IDs, since I am planning to use many of them and I don’t want to keep track of all the IDs.
- I didn’t want to dive into the LLVM analyzer implementation, so I wanted my buckets to be dynamic, i.e., I wanted them to be able to grow when needed.
- I wanted to reduce wasted memory as much as possible.
Implementation
From now on, I will assume that you watched the original presentation, since I am going to describe my implementations as variations to Shalem’s system. The allocator is part of my ECS project on github.
Dynamic buckets
First, I needed to modify the implementation of the bucket
struct to allow dynamic resizing of the allocated data. In this situation, I cannot use realloc()
, since a reallocation would make the pointers to previously allocated data invalid, because the entire portion of allocated memory would be moved to a new position.
So, given the initial implementation of a bucket that we can see in the presentation, I decided to create a bucket_container_t
type that contains a growing vector of dynamic buckets, which behave exactly like regular static buckets, except that, once the capacity is reached, a new one is added to the vector. So the bucket vector grows linearly, one by one, but I can consider implementing different growth strategies in the future.
Note: in the code below, I am skipping some of the function bodies, which I consider not properly relevant to the tweak to the original system I am trying to show, for the sake of clarity. If you are interested in some code that was not showed here, the complete implementation is available here for consultation.
// DynamicBucketAllocators.h
struct bucket_container_t
{
public:
const size_t blockSize;
const size_t blockCount;
bucket_container_t(size_t inBlockSize, size_t inBlockCount);
~bucket_container_t();
bool belongs_to_this(void* ptr) const noexcept;
[[nodiscard]] void* allocate(size_t bytes) noexcept;
void deallocate(void* ptr, size_t bytes) noexcept;
private:
size_t find_contiguous_blocks(size_t n) const noexcept;
void set_blocks_in_use(size_t index, size_t n) noexcept;
void set_blocks_free(size_t index, size_t n) noexcept;
bool is_block_in_use(size_t index) const noexcept;
struct dynamic_bucket_t
{
dynamic_bucket_t() {}
dynamic_bucket_t(std::byte* inData, std::byte* inLedger)
: data{inData}, ledger{inLedger} {}
std::byte* data{nullptr};
std::byte* ledger{nullptr};
};
void allocate_bucket_instance();
size_t calculate_bucket_instance_index(const size_t blockIndex) const;
size_t find_bucket_instance_index(void* ptr) const;
/** Contains allocated data. */
std::vector<dynamic_bucket_t> m_data;
};
// DynamicBucketAllocators.cpp
void bucket_container_t::allocate_bucket_instance()
{
dynamic_bucket_t instance;
const size_t dataSize = blockSize * blockCount;
instance.data = static_cast<std::byte*>(Malloc(dataSize));
assert(instance.data != nullptr);
const size_t ledgerSize = (blockCount + 7) >> 3;
instance.ledger = static_cast<std::byte*>(Malloc(ledgerSize));
assert(instance.ledger != nullptr);
std::memset(instance.data, 0, dataSize);
std::memset(instance.ledger, 0, ledgerSize);
m_data.push_back(instance);
}
void* bucket_container_t::allocate(size_t bytes) noexcept
{
// Calculate required number of blocks
const size_t n = 1 + ((bytes - 1) / blockSize);
size_t index = find_contiguous_blocks(n);
if (index == blockCount)
{
allocate_bucket_instance();
index = find_contiguous_blocks(n);
}
// Update the ledger
set_blocks_in_use(index, n);
const size_t bucketInstanceIndex = calculate_bucket_instance_index(index);
dynamic_bucket_t& bucketInstance = m_data[bucketInstanceIndex];
const size_t localIndex = index - (bucketInstanceIndex * blockCount);
return bucketInstance.data + (localIndex * blockSize);
}
void bucket_container_t::deallocate(void* ptr, size_t bytes) noexcept
{
const size_t bucketInstanceIndex = find_bucket_instance_index(ptr);
dynamic_bucket_t& bucketInstance = m_data[bucketInstanceIndex];
const std::byte* p = static_cast<std::byte*>(ptr);
const size_t dist = static_cast<size_t>(p - bucketInstance.data);
// Calculate block index
const size_t index = dist / blockSize + (bucketInstanceIndex * blockCount);
// Calculate the required number of blocks
const size_t n = 1 + ((bytes - 1) / blockSize);
// Update the ledger
set_blocks_free(index, n);
}
bool bucket_container_t::belongs_to_this(void* ptr) const noexcept
{
for (const dynamic_bucket_t& bucketInstance : m_data)
{
const std::byte* p = static_cast<std::byte*>(ptr);
if (p >= bucketInstance.data && p < (bucketInstance.data + (blockSize * blockCount)))
{
return true;
}
}
return false;
}
size_t bucket_container_t::find_contiguous_blocks(size_t n) const noexcept
{
for (size_t instanceIndex = 0; instanceIndex < m_data.size(); ++instanceIndex)
{
const size_t startingIndex = instanceIndex * blockCount;
const size_t finalIndex = startingIndex + blockCount;
for (size_t blockIdx = startingIndex; blockIdx < finalIndex; ++blockIdx)
{
if (!is_block_in_use(blockIdx))
{
size_t contiguousCount = 1;
size_t firstFreeBlockIdx = blockIdx;
if (contiguousCount >= n)
{
return firstFreeBlockIdx;
}
while (blockIdx + 1 < finalIndex && !is_block_in_use(blockIdx + 1))
{
contiguousCount++;
if (contiguousCount >= n)
{
return firstFreeBlockIdx;
}
blockIdx++;
}
}
}
}
return blockCount;
}
size_t bucket_container_t::calculate_bucket_instance_index(const size_t blockIndex) const
{
return blockIndex / blockCount;
}
size_t bucket_container_t::find_bucket_instance_index(void* ptr) const
{
const std::byte* p = static_cast<std::byte*>(ptr);
for (size_t index = 0; index < m_data.size(); ++index)
{
const dynamic_bucket_t& instance = m_data[index];
const std::byte* instanceEnd = instance.data + (blockCount * blockSize);
if (p >= instance.data && p < instanceEnd)
{
return index;
}
}
return m_data.size();
}
Defining dynamic memory pools
Now that we have our bucket containers, we can implement the actual memory pools. This is done with some template programming: first, we define a dynamic memory pool as an array of buckets, and this array is built from a tuple of bucket descriptors.
struct bucket_cfg16
{
static constexpr size_t s_blockSize = 16;
static constexpr size_t s_blockCount = 10000;
};
struct bucket_cfg32
{
static constexpr size_t s_blockSize = 32;
static constexpr size_t s_blockCount = 10000;
};
struct bucket_cfg1024
{
static constexpr size_t s_blockSize = 1024;
static constexpr size_t s_blockCount = 50000;
};
template<typename... TBuckets>
struct dynamic_bucket_descriptors
{
using type = std::tuple<TBuckets...>;
};
// Quick access to the std::tuple::type member
template<typename... TBuckets>
using dynamic_bucket_descriptors_t = typename dynamic_bucket_descriptors<TBuckets...>::type;
template<typename... TBuckets>
static constexpr size_t DynamicBucketCount =
std::tuple_size<dynamic_bucket_descriptors_t<TBuckets...>>::value;
template<typename... TBuckets>
using DynamicMemoryPool = std::array<bucket_container_t, DynamicBucketCount<TBuckets...>>;
Now, we define the structs GetDynamicBucketBlockSize
and GetDynamicBucketBlockCount
, which we use in the GetDynamicMemoryPool()
function to get the unique instance of each memory pool. This function is the core of the system: whenever we call it with a new sequence of bucket descriptors passed as template arguments, we actually create a static instance of a memory pool with such buckets.
template<typename TBucket>
struct GetDynamicBucketBlockSize : std::integral_constant<size_t, TBucket::s_blockSize>
{};
template<typename TBucket>
struct GetDynamicBucketBlockCount : std::integral_constant<size_t, TBucket::s_blockCount>
{};
template<typename... TBuckets>
DynamicMemoryPool<TBuckets...>& GetDynamicMemoryPool() noexcept
{
static DynamicMemoryPool<TBuckets...> instance
{
{
{
GetDynamicBucketBlockSize<TBuckets>::value, GetDynamicBucketBlockCount<TBuckets>::value
} ...
}
};
return instance;
}
template<typename... TBuckets>
constexpr bool IsDynamicMemoryPoolDefined() noexcept
{
return DynamicBucketCount<TBuckets...> != 0;
}
template<typename... TBuckets>
bool InitializeDynamicMemoryPool() noexcept
{
(void) GetDynamicMemoryPool<TBuckets...>();
return IsDynamicMemoryPoolDefined<TBuckets...>();
}
Finally, the AllocateFromDynamicBucket()
and the DeallocateFromDynamicBucket()
functions. Unlike those in the presentation, we don’t use any allocation strategy to identify the best bucket: the allocation function requires manually specifying which bucket we want to use. This is possible because we know the exact type of the objects that we will allocate.
template<typename... TBuckets>
[[nodiscard]] void* AllocateFromDynamicBucket(size_t bytes, size_t index)
{
DynamicMemoryPool<TBuckets...>& pool = GetDynamicMemoryPool<TBuckets...>();
if (index >= DynamicBucketCount<TBuckets...>)
{
throw std::bad_alloc();
}
bucket_container_t& bucket = pool[index];
if (void* ptr = bucket.allocate(bytes); ptr != nullptr)
{
return ptr;
}
throw std::bad_alloc();
}
template<typename... TBuckets>
void DeallocateFromDynamicBucket(void* ptr, size_t bytes) noexcept
{
DynamicMemoryPool<TBuckets...>& pool = GetDynamicMemoryPool<TBuckets...>();
for (bucket_container_t& bucket : pool)
{
if (bucket.belongs_to_this(ptr))
{
bucket.deallocate(ptr, bytes);
break;
}
}
}
Implementing the allocator
The first step for implementing a custom allocator that makes effective use of this dynamic memory pool system is identifying the type of the objects this system will allocate.
In the case of std::set
, when compiling with GCC, each new element of type T
added to the set is allocated as an instance of std::_Rb_tree_node<T>
, so we can define the bucket descriptor for the set allocator as follows:
template<typename T, typename TElement, size_t MaxElements>
class set_pool_memory_allocator
{
struct rbNodeBucket_t
{
static constexpr size_t s_blockSize = sizeof(std::_Rb_tree_node<TElement>);
static constexpr size_t s_blockCount = MaxElements;
};
...
};
Note the template parameters required by this set_pool_memory_allocator
class. When declaring the allocator in a set, it is necessary that T and TElement are both the type of the objects contained in the set, e.g., for a set of integers:
std::set<int, std::less<int>, set_pool_memory_allocator<int, int, 128>>;
This is necessary because the value of T is subject to change due to the rebinding allocator. In std::allocator_traits
, the same allocator can be used for allocating different objects in the same container, and the conversion is made with the rebind struct. For this reason, we need a template parameter that is guaranteed to be always set to the same type, so that we can use it to calculate the size of the bucket blocks effectively.
template<typename U>
struct rebind
{
using other = set_pool_memory_allocator<U, TElement, MaxElements>;
};
I will skip all the boilerplate code for std allocators and go directly to the allocate()
and deallocate()
functions. The complete implementation of the allocator can be found here.
The allocate()
function defines a simple dynamic allocator with just one bucket, so the selection of the bucket to use is straightforward.
pointer allocate(size_type n, const void* hint = nullptr)
{
if constexpr (IsDynamicMemoryPoolDefined<rbNodeBucket_t>())
{
return static_cast<pointer>(AllocateFromDynamicBucket<rbNodeBucket_t>(n
* sizeof(T), 0));
}
else if (m_upstreamResource != nullptr)
{
return static_cast<pointer>(m_upstreamResource->allocate(n * sizeof(T),
alignof(T)));
}
else
{
throw std::bad_alloc();
}
}
void deallocate(pointer p, size_type n)
{
if constexpr (IsDynamicMemoryPoolDefined<rbNodeBucket_t>())
{
DeallocateFromDynamicBucket<rbNodeBucket_t>(p, n * sizeof(T));
}
else if (m_upstreamResource != nullptr)
{
m_upstreamResource->deallocate(p, n * sizeof(T), alignof(T));
}
else
{
assert(false);
}
}
And that’s it. To make the declaration of sets using this allocator less verbose, we can define this template type alias:
template <typename TElement, size_t BlockCountPerChunk>
using pm_set = std::set<
TElement,
std::less<TElement>,
set_pool_memory_allocator<TElement, TElement, BlockCountPerChunk>
>;
// an example: a set of integers, with memory pre-allocated for 128 elements.
pm_set<int, 128> m_set;
From now on, all the variables of type pm_set
will implement sets that make use of this allocator, significantly reducing the calls to malloc()
.