Type-safe Pimpl implementation without overhead
by Malte Skarupke
I like the pimpl idiom because I like to keep my headers as clean as possible, and other people’s headers are dirty. Unfortunately the pimpl idiom never feels like a good solution because it has runtime overhead that wouldn’t be needed if I didn’t care about clean headers so much.
If you’re not familiar with the Pimpl idiom, it stands for “pointer to implementation” and you use it in C/C++ headers to use a class without having to include the other header in your header. You can also use it to hide your implementation from your users so that you can change the internals of your class and nobody has to know. It’s used all over the place but it has one disadvantage: You always need an extra heap allocation and every method performs an extra pointer dereference.
This code fixes that, so that there can be zero runtime overhead. Here’s how to use it:
class btRigidBody; class MyRigidBody { // ... ForwardDeclaredStorage<btRigidBody, 768> bulletBody; };
And the code is below:
#pragma once #include <utility> namespace detail { template<size_t ExpectedSize, size_t ActualSize, size_t ExpectedAlignment, size_t ActualAlignment> inline void compare_size() { static_assert(ExpectedSize == ActualSize, "The size for the ForwardDeclaredStrage is wrong"); static_assert(ExpectedAlignment == ActualAlignment, "The alignment for the ForwardDeclaredStrage is wrong"); } template<size_t ExpectedSize, size_t ActualSize, size_t ExpectedAlignment, size_t ActualAlignment> struct size_comparer { inline size_comparer() { // going through one additional layer to get good error messages // if I put the assert down one more template layer, gcc will show the // sizes in the error message compare_size<ExpectedSize, ActualSize, ExpectedAlignment, ActualAlignment>(); } }; } struct forwarding_constructor {}; template<typename T, size_t Size, size_t Alignment = 16> struct ForwardDeclaredStorage { ForwardDeclaredStorage() { new (&Get()) T(); } template<typename... Args> ForwardDeclaredStorage(forwarding_constructor, Args &&... args) { new (&Get()) T(std::forward<Args>(args)...); } ForwardDeclaredStorage(const ForwardDeclaredStorage & other) { new (&Get()) T(other.Get()); } ForwardDeclaredStorage(const T & other) { new (&Get()) T(other); } ForwardDeclaredStorage(ForwardDeclaredStorage && other) { new (&Get()) T(std::move(other.Get())); } ForwardDeclaredStorage(T && other) { new (&Get()) T(std::move(other)); } ForwardDeclaredStorage & operator=(const ForwardDeclaredStorage & other) { Get() = other.Get(); return *this; } ForwardDeclaredStorage & operator=(const T & other) { Get() = other; return *this; } ForwardDeclaredStorage & operator=(ForwardDeclaredStorage && other) { Get() = std::move(other.Get()); return *this; } ForwardDeclaredStorage & operator=(T && other) { Get() = std::move(other); return *this; } ~ForwardDeclaredStorage() { detail::size_comparer<Size, sizeof(T), Alignment, alignof(T)> compare_size{}; Get().~T(); } T & Get() { return reinterpret_cast<T &>(*this); } const T & Get() const { return reinterpret_cast<const T &>(*this); } private: __attribute__((aligned(Alignment))) unsigned char storage[Size]; };
This uses a well-known hack where you put the necessary storage into your class, and then placement-new the forward declared object into the storage. But the benefit of this template is that it’s all type-safe and the default copy/move constructor, destructor and assignment operators all do the right thing.
I use the forwarding_constructor struct as a required argument for the forwarding constructor, because constructors with perfect forwarding can otherwise mess up overload resolution. You use it like this:
ForwardDeclaredStorage<widget> a(forwarding_constructor{}, args, for, widget);
The downside of the ForwardDeclaredStorage compared to regular pimpl is that you have to keep the size given in the template in sync with the struct that you’re forward-declaring. So a change in the implementation can still cause a recompilation of all users. In my case that doesn’t matter because I use this to hide the libraries that I’m using, and the size of their structs only change when I update the library version. And there’s a static_assert there to prevent me from getting the size or the alignment wrong. (Funny thing: I only added the assert for alignment because I felt bad about publishing this when I was only checking the size. Turns out that the code example from the beginning of the post is actually incorrect, because the btRigidBody class is 64 byte aligned for reasons unknown and unenforced)
The license for this code is this:
This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to <http://unlicense.org/>
Are you certain that theres no undefined bahavior here, in the implicit instantiation of a type template parameter with an incomplete type? I think this is safe, but I’m uncertain.
It is totally legal to instantiate a template with an incomplete type. The compiler will complain if you actually try to use the type while it’s incomplete. So the constructor and destructor and others have to be defined in files that include the type. So you can for example also use std::unique_ptr for pimpl, and even std::vector and other complex templates will work. But std::optional would not work because now you need to know the type to be able to know how big the type is.
>”So the constructor and destructor and others have to be defined in files that include the type”
I just do not get this. How do you expect destructor to use sizeof operation for incomplete type?
~ForwardDeclaredStorage()
{
detail::size_comparer compare_size{};
Get().~T();
}
The only way how i can see this possible is to include definition of T to the “file that use that type”. But what is the point of all this than?
It’s for the pimpl pattern: In the header you name a type but you don’t define it. In the cpp file you implement the type. Now anyone who is including your header doesn’t need to know the implementation details, because they can’t see the type that’s used. See the example with MyRigidBody and btRigidBody in the blog post: btRigidBody comes from the Bullet Physics library, which I don’t want to include in my header. So I forward declare it instead. In my cpp file I still have to include the bullet physics header, but none of the users of MyRigidBody have to include it.
Oh and of course all constructors and the destructor of MyRigidBody have to be in the cpp file as well, they can’t be in the header. If the destructor of MyRigidBody was defined in the header, then I would get a compiler error on that sizeof that you’re pointing out. But if the destructor is defined in the cpp file, where I included the bullet physics header, everything will work. The easiest way to see how it works would be to try it.
Or just familiarize yourself with how the pimpl pattern works first, and then use this to make it slightly faster.
But this misses the other important reason to use pimpl, which is to hide the size of your implementation from your users, making it easier to perform changes without breaking binary compatibility.
Yep. Good point, don’t use this in headers that you give to people whose code you can’t recompile.
In my case my headers won’t ever leave my project, so I didn’t think about binary compatibility. I was just thinking that I don’t want windows.h to mess up my auto-complete.
But, The size of the implementation can vary from system to system, so while your code may compile on your platform, not necessarily compile on an other, right?
Ah, that is a good thought.
Luckily the static assert in the destructor should make sure that the size matches. So if sizes differ on different platforms, you will get a static assert and your code won’t compile. In that case you can do one of two things:
1. Either make sure that the size of your struct is the same on all platforms.
2. Or pass in a different size depending on the platform, maybe using #ifdef or using template metaprogramming.
This solution does not just eliminate some runtime overhead.
It is also really practical in some other scenarios:
If you want to use PIMPL for let’s say an OS-Abstraction layer in an RTOS embedded environment you can use the dynamic approach only at startup because there is no real memory management and you want to avoid memory fragmentation.
So if you use objects which can’t be created at startup, this approach would help you as well.
In a concrete case:
The RTOS provides a Lock-functionality which locks task scheduling, interrupts etc. to guarantee an exclusive access.
So you can pack the disabling-function into the constructor of the implementation-class. And you can use the locks scope-wise by just constructing the Lock-class with its concrete Lock-implementation in the scope. If the scope is left, the destructor is called and the enabling-function for scheduling is called inside it.
Of course there are other solutions and patterns for this problem, but If you want to stay with the pimpl-idiom all the way for your abstraction-layer, your solution enables the use of PIMPL-made OS-Abstractions which will just be put on the stack without any dynamic allocation in embedded RTOS-Environments.
Thank you!