A faster implementation of std::function
by Malte Skarupke
As I wrote in my last post, I consider std::function to be a very important class that will change how you design your code, because it means that you have to use inheritance less often. In that post I was very impressed with the performance of std::function when compiled with optimizations. Unfortunately std::function can be far slower than a virtual function call in debug.
I wanted a std::function implementation that doesn’t have too big of a performance impact on your application when debugging it, so I wrote my own. It is also faster than all other implementations that I could find in release mode.
The code is in the public domain (I want all library writers to start using it) and here is a download link.
I mainly looked at the implementations of Visual Studio and libstdc++. Both use a small functor optimization where you store small functors inside the std::function itself rather than allocating them on the heap. libstdc++ only uses that optimization for function pointers and member function pointers though. Visual Studio uses it for anything that is 24 bytes in size or less. Which is incorrect because then they can no longer offer a noexcept swap, but since Visual Studio currently doesn’t support noexcept, my best guess is that they didn’t notice. I made the choice to use 16 bytes for a small functor optimization in which I only place objects that are no-throw move constructible.
I use the 8 bytes that I gain over Visual Studio for storing a function pointer in the struct that allows me to call the provided functor directly. I have that optimization from libstdc++. Any other operations use virtual function calls. But storing the pointer for operator() directly instead of in a vtable causes there to be one less pointer indirection and one less possible cache miss.
Except that I don’t actually use virtual functions. Instead I use a manager function that acts very similar to a vtable. That is an optimization that comes from boost, and it prevents RTTI code bloat. We never need RTTI for the internals of a std::function, but there is no way to prevent the compiler from generating the relevant information if you have a virtual function in your class. So you hand-roll your own virtual function implementation.
One optimization that I came up with myself is to have no conditional in operator(). The first line in all other implementations looks like this: “if (!*this) throw bad_function_call();” The optimization is to put that functionality inside a functor that gets assigned to an empty std::function. So that operator() looks the same no matter if the std::function is empty or not. The disadvantage of this is that operator bool() needs a virtual function call in my implementation. I consider that a good trade-off, as I expect functions to be checked far less often than they are called.
To get it to run slightly faster in debug I made sure that there were as few function calls as possible between calling operator() and your functor being called. The biggest performance problems in debug were in Visual Studio. Since Visual Studio seems to inline std::forward even in debug, I allowed myself the simplification of using std::forward.
Here are some measurements for performance:
|Virtual Function||std::function||My std::function|
|Visual Studio 2012 Debug||1917||4195||3262|
|Visual Studio 2012 Release||800||770||630|
|Clang 3.1 -O0 -g3||1864||3161||2513|
|Clang 3.1 -O3||564||474||456|
|GCC 4.7.2 -O0 -g3||1755||3363||2587|
|GCC 4.7.2 -O3||555||466||431|
(Smaller is better. Time in milliseconds, but that doesn’t really matter because I used different machines. So only compare within a row, not across rows)
The code for the tests is here. I rotated the order of the tests (to reduce the effect of the cache and of heap fragmentation) and chose the shortest time that each implementation was able to achieve.
I only tested with lambdas that benefit from the small functor optimization because I think that those will be most common.
My implementation beats std::function in all categories, and it beats a virtual function call in optimized builds. In debug builds it is roughly 50% slower than a virtual function call. Which is not ideal, but it is better than other implementations. I think I can live with this. The benefits of std::function still far outweigh the downsides, and I think that other code will slow you down more in debug.
You can define FUNC_NO_EXCEPTIONS to prevent the function from using exceptions. In that case you will get a segmentation fault instead when calling an empty std::function. You can define FUNC_NO_RTTI to compile the std::function if you have turned off RTTI in your compiler. In that case the target_type() and target() member functions will not be available. That one came from libc++.
Here is the download link again and now go and experiment with std::function. One thing I want to try is to introduce Python’s function decorators into C++. Not in terms of syntax, but in terms of semantics. Doing that would have been very difficult with virtual functions, but should be trivial with a std::function. And they should be very fast if you apply the decorator before you store your functor in a std::function.