Type-safe Pimpl implementation without overhead
by Malte Skarupke
I like the pimpl idiom because I like to keep my headers as clean as possible, and other people’s headers are dirty. Unfortunately the pimpl idiom never feels like a good solution because it has runtime overhead that wouldn’t be needed if I didn’t care about clean headers so much.
If you’re not familiar with the Pimpl idiom, it stands for “pointer to implementation” and you use it in C/C++ headers to use a class without having to include the other header in your header. You can also use it to hide your implementation from your users so that you can change the internals of your class and nobody has to know. It’s used all over the place but it has one disadvantage: You always need an extra heap allocation and every method performs an extra pointer dereference.
This code fixes that, so that there can be zero runtime overhead. Here’s how to use it:
class btRigidBody;
class MyRigidBody
{
// ...
ForwardDeclaredStorage<btRigidBody, 768> bulletBody;
};
And the code is below:
#pragma once
#include <utility>
namespace detail
{
template<size_t ExpectedSize, size_t ActualSize, size_t ExpectedAlignment, size_t ActualAlignment>
inline void compare_size()
{
static_assert(ExpectedSize == ActualSize, "The size for the ForwardDeclaredStrage is wrong");
static_assert(ExpectedAlignment == ActualAlignment, "The alignment for the ForwardDeclaredStrage is wrong");
}
template<size_t ExpectedSize, size_t ActualSize, size_t ExpectedAlignment, size_t ActualAlignment>
struct size_comparer
{
inline size_comparer()
{
// going through one additional layer to get good error messages
// if I put the assert down one more template layer, gcc will show the
// sizes in the error message
compare_size<ExpectedSize, ActualSize, ExpectedAlignment, ActualAlignment>();
}
};
}
struct forwarding_constructor {};
template<typename T, size_t Size, size_t Alignment = 16>
struct ForwardDeclaredStorage
{
ForwardDeclaredStorage()
{
new (&Get()) T();
}
template<typename... Args>
ForwardDeclaredStorage(forwarding_constructor, Args &&... args)
{
new (&Get()) T(std::forward<Args>(args)...);
}
ForwardDeclaredStorage(const ForwardDeclaredStorage & other)
{
new (&Get()) T(other.Get());
}
ForwardDeclaredStorage(const T & other)
{
new (&Get()) T(other);
}
ForwardDeclaredStorage(ForwardDeclaredStorage && other)
{
new (&Get()) T(std::move(other.Get()));
}
ForwardDeclaredStorage(T && other)
{
new (&Get()) T(std::move(other));
}
ForwardDeclaredStorage & operator=(const ForwardDeclaredStorage & other)
{
Get() = other.Get();
return *this;
}
ForwardDeclaredStorage & operator=(const T & other)
{
Get() = other;
return *this;
}
ForwardDeclaredStorage & operator=(ForwardDeclaredStorage && other)
{
Get() = std::move(other.Get());
return *this;
}
ForwardDeclaredStorage & operator=(T && other)
{
Get() = std::move(other);
return *this;
}
~ForwardDeclaredStorage()
{
detail::size_comparer<Size, sizeof(T), Alignment, alignof(T)> compare_size{};
Get().~T();
}
T & Get()
{
return reinterpret_cast<T &>(*this);
}
const T & Get() const
{
return reinterpret_cast<const T &>(*this);
}
private:
__attribute__((aligned(Alignment))) unsigned char storage[Size];
};
This uses a well-known hack where you put the necessary storage into your class, and then placement-new the forward declared object into the storage. But the benefit of this template is that it’s all type-safe and the default copy/move constructor, destructor and assignment operators all do the right thing.
I use the forwarding_constructor struct as a required argument for the forwarding constructor, because constructors with perfect forwarding can otherwise mess up overload resolution. You use it like this:
ForwardDeclaredStorage<widget> a(forwarding_constructor{}, args, for, widget);
The downside of the ForwardDeclaredStorage compared to regular pimpl is that you have to keep the size given in the template in sync with the struct that you’re forward-declaring. So a change in the implementation can still cause a recompilation of all users. In my case that doesn’t matter because I use this to hide the libraries that I’m using, and the size of their structs only change when I update the library version. And there’s a static_assert there to prevent me from getting the size or the alignment wrong. (Funny thing: I only added the assert for alignment because I felt bad about publishing this when I was only checking the size. Turns out that the code example from the beginning of the post is actually incorrect, because the btRigidBody class is 64 byte aligned for reasons unknown and unenforced)
The license for this code is this:
This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to <http://unlicense.org/>
Are you certain that theres no undefined bahavior here, in the implicit instantiation of a type template parameter with an incomplete type? I think this is safe, but I’m uncertain.
It is totally legal to instantiate a template with an incomplete type. The compiler will complain if you actually try to use the type while it’s incomplete. So the constructor and destructor and others have to be defined in files that include the type. So you can for example also use std::unique_ptr for pimpl, and even std::vector and other complex templates will work. But std::optional would not work because now you need to know the type to be able to know how big the type is.
>”So the constructor and destructor and others have to be defined in files that include the type”
I just do not get this. How do you expect destructor to use sizeof operation for incomplete type?
~ForwardDeclaredStorage()
{
detail::size_comparer compare_size{};
Get().~T();
}
The only way how i can see this possible is to include definition of T to the “file that use that type”. But what is the point of all this than?
It’s for the pimpl pattern: In the header you name a type but you don’t define it. In the cpp file you implement the type. Now anyone who is including your header doesn’t need to know the implementation details, because they can’t see the type that’s used. See the example with MyRigidBody and btRigidBody in the blog post: btRigidBody comes from the Bullet Physics library, which I don’t want to include in my header. So I forward declare it instead. In my cpp file I still have to include the bullet physics header, but none of the users of MyRigidBody have to include it.
Oh and of course all constructors and the destructor of MyRigidBody have to be in the cpp file as well, they can’t be in the header. If the destructor of MyRigidBody was defined in the header, then I would get a compiler error on that sizeof that you’re pointing out. But if the destructor is defined in the cpp file, where I included the bullet physics header, everything will work. The easiest way to see how it works would be to try it.
Or just familiarize yourself with how the pimpl pattern works first, and then use this to make it slightly faster.
But this misses the other important reason to use pimpl, which is to hide the size of your implementation from your users, making it easier to perform changes without breaking binary compatibility.
Yep. Good point, don’t use this in headers that you give to people whose code you can’t recompile.
In my case my headers won’t ever leave my project, so I didn’t think about binary compatibility. I was just thinking that I don’t want windows.h to mess up my auto-complete.
But, The size of the implementation can vary from system to system, so while your code may compile on your platform, not necessarily compile on an other, right?
Ah, that is a good thought.
Luckily the static assert in the destructor should make sure that the size matches. So if sizes differ on different platforms, you will get a static assert and your code won’t compile. In that case you can do one of two things:
1. Either make sure that the size of your struct is the same on all platforms.
2. Or pass in a different size depending on the platform, maybe using #ifdef or using template metaprogramming.
This solution does not just eliminate some runtime overhead.
It is also really practical in some other scenarios:
If you want to use PIMPL for let’s say an OS-Abstraction layer in an RTOS embedded environment you can use the dynamic approach only at startup because there is no real memory management and you want to avoid memory fragmentation.
So if you use objects which can’t be created at startup, this approach would help you as well.
In a concrete case:
The RTOS provides a Lock-functionality which locks task scheduling, interrupts etc. to guarantee an exclusive access.
So you can pack the disabling-function into the constructor of the implementation-class. And you can use the locks scope-wise by just constructing the Lock-class with its concrete Lock-implementation in the scope. If the scope is left, the destructor is called and the enabling-function for scheduling is called inside it.
Of course there are other solutions and patterns for this problem, but If you want to stay with the pimpl-idiom all the way for your abstraction-layer, your solution enables the use of PIMPL-made OS-Abstractions which will just be put on the stack without any dynamic allocation in embedded RTOS-Environments.
Thank you!
Hi Malte.
I don’t know about other containers but post C++ 20, this approach doesn’t work with vectors.
Since push_back/emplace_back are now constexprs, there is a compile-time error about “use of undefined type”. I can bypass this by using vector<Widget*> instead of vector which defeats the purpose of using this approach.
Do you know of a way to bypass this?
Thank you!
I’m not sure when exactly you get that error. Can you create a simple example in godbolt that shows the problem?
#include
#include
template
struct Cat
{
~Cat()
{
//Error happens here “use of undefined type”
std::cout << sizeof(T) << std::endl;
}
};
struct Man
{
//Error gets fixed if ManImpl is outside this scop
struct ManImpl;
Cat cat;
};
struct Dog
{
void AddCat()
{
//This compiles fine if include Man.h
std::vector mancats = std::vector();
};
int main()
{
auto dog = Dog();
dog.AddCat();
}
struct ManImpl
{
std::vector Hand;
};
This should recreate the problem in msvc, clang and gcc. I assume its because push_back is constexpr now and may run at compile time during template instantiation.
If ManImpl is forward declared outside the Man class or if defined inside Man the problem is resolved. Defining ManImpl class inside Man.h instead of Man.cpp goes against pimpl idiom.
I do not understand how forward declaring ManImpl outside the Man class scope fixes the issue though. Will you help me understand this?
Is my assumption correct and how do I solve this?
The link:
https://godbolt.org/z/jnx4qrceo
After a little digging, I found out constexpr may not be the issue, but the below definitely is.
[vector.overview]/3
An incomplete type T may be used when instantiating vector if the allocator satisfies the allocator completeness requirements 17.6.3.5.1. T shall be complete before any member of the resulting specialization of vector is referenced.
Well while declaring vector with T is fine, using any member function will require T to be complete. So, vector<T*>.push_back() is possible but vector.push_back() isn’t producing “use of undefined type” error.
Assuming we use vector<T*> from now onwards, this approach is better than using the modern unique_ptr approach because there is one less indirection. But cache locality is still not there between for adjacent Ts inside vector. I do not know how much it affects performance, but at least, when T is made in stack the pimpl will be in stack and vice versa. But I’ll probably move back to the old byte buffer in the class with Rule of Five way.
I’m still having trouble understanding the example. Maybe because there is no header/implementation separation in godbolt. Which of these are supposed to be in the header and which are in the implementation file?
AddCat would have to be in the implementation because it has to see ManImpl. But that’s not a C++20 thing or a vector thing.
Most functions on the vector will need to actually see ManImpl. The main thing you get from this is that you can declare a vector in a header without needing to see ManImpl. But when you call push_back or the destructor of std::vector, you need to see the actual object, so that has to happen in an implementation file.
Here is something that compiles:
https://godbolt.org/z/axoTTEfsh
The implementation of the destructors and of AddCat has to see ManImpl.
I understand your point and how it compiles without any issues. But, I’m having the error despite the push_back being in an implementation file with the Man.h header file included.
I think you’ll need to put up a real example somewhere that is split across multiple files. Like on github.
My first thought is that including Man.h shouldn’t matter because we’re talking about a compiler error caused by not seeing ManImpl.h. But maybe you meant ManImpl.h. So my second thought is that the destructor of your “struct Man” can’t be in the header. But now I’m just guessing. This needs a real example.
I made a github project to demonstrate the issue. The link is as follows:
https://github.com/guts117/FowardDeclaredStorageUndefinedTest
I think the move constructor and move assignment of Man have to be in the cpp file instead of the header. They can still be “= default”, just not in the header, because they need to see Impl.
I’m not able to test it now (on my phone) but if it doesn’t work, I can check later.
Moving the move constructor and move assignment definition did solve the issue.
Funnily enough it did occur to me that it might have been the issue before. It was a mistake on my part that I only thought the constructor and destructor needed to be in the implementation file because until now I was using vector of pointers instead and everything was working fine.
Thank you very much. You sir are a life saver.