Variadic Coroutines in C++ and D
by Malte Skarupke
In my last post I implemented a tiny coroutine class that is fully functional, but it doesn’t support arguments and return values. This time I provide an implementation which adds those features as well as exception safety.
I also used this as an example to compare D and C++; as an excuse to learn C++’s variadic template syntax and as a reason to see how you can improve on boost’s implementation by using variadic templates.
Let’s start off with the punchline. Here is my coroutine implementation in C++:
#pragma once #include <ucontext.h> #include <cstdint> #include <exception> #include <functional> #include <memory> #include <tuple> #ifndef CORO_DEFAULT_STACK_SIZE #define CORO_DEFAULT_STACK_SIZE SIGSTKSZ #endif namespace coro { namespace detail { /** * the coroutine_context holds the stack context and the current state * of the coroutine. it also starts the coroutine */ struct coroutine_context { coroutine_context(size_t stack_size, void (*coroutine_call)()) : stack{new unsigned char[stack_size]} { // create a context for the callee getcontext(&callee); callee.uc_link = &caller; // tell it to return to me when done callee.uc_stack.ss_size = stack_size; callee.uc_stack.ss_sp = stack.get(); // pass the this pointer as two arguments, because officially // makecontext will pass the arguments as ints (even though // inofficially it would work with a single argument) makecontext(&callee, coroutine_call, 2, reinterpret_cast<size_t>(this) >> 32, this); } void operator()() { if (returned) throw "This coroutine has already finished"; swapcontext(&caller, &callee); // continue here if yielded or returned if (exception) { std::rethrow_exception(std::move(exception)); } } protected: ucontext_t caller; ucontext_t callee; std::unique_ptr<unsigned char[]> stack; std::exception_ptr exception; bool returned = false; }; // a type that can store both objects and references template<typename T> struct any_storage { any_storage() = default; any_storage(T to_store) : stored{std::move(to_store)} { } any_storage & operator=(T to_store) { stored = std::move(to_store); return *this; } operator T &&() { return std::move(stored); } private: T stored = T{}; }; // specialization for void template<> struct any_storage<void> { }; // specialization for lvalue references template<typename T> struct any_storage<T &> { any_storage() = default; any_storage(T & to_store) : stored{&to_store} { } any_storage & operator=(T & to_store) { stored = &to_store; return *this; } operator T &() { return *stored; } private: T * stored = nullptr; }; // specialization for rvalue references template<typename T> struct any_storage<T &&> { any_storage() = default; any_storage(T && to_store) : stored{&to_store} { } any_storage & operator=(T && to_store) { stored = &to_store; return *this; } operator T &&() { return *stored; } private: T * stored = nullptr; }; // implements the shared code among all specializations of coroutine_yielder template<typename Result, typename... Arguments> struct coroutine_yielder_base : protected coroutine_context { coroutine_yielder_base(size_t stack_size, void (*coroutine_call)()) : coroutine_context{stack_size, coroutine_call} { } std::tuple<Arguments...> yield() { swapcontext(&this->callee, &this->caller); return std::move(this->arguments); } protected: any_storage<Result> result; std::tuple<any_storage<Arguments>...> arguments; }; /** * The coroutine_yielder is responsible for providing a yield * function */ template<typename Result, typename... Arguments> struct coroutine_yielder : protected coroutine_yielder_base<Result, Arguments...> { coroutine_yielder(size_t stack_size, void (*coroutine_call)()) : coroutine_yielder_base<Result, Arguments...>{stack_size, coroutine_call} { } std::tuple<Arguments...> yield(Result && result) { this->result = std::forward<Result>(result); return coroutine_yielder_base<Result, Arguments...>::yield(); } // copying yield std::tuple<Arguments...> yield(const Result & result) { this->result = result; return coroutine_yielder_base<Result, Arguments...>::yield(); } }; // specialization for void template<typename... Arguments> struct coroutine_yielder<void, Arguments...> : coroutine_yielder_base<void, Arguments...> { coroutine_yielder(size_t stack_size, void (*coroutine_call)()) : coroutine_yielder_base<void, Arguments...>{stack_size, coroutine_call} { } }; // specialization for lvalue reference template<typename Result, typename... Arguments> struct coroutine_yielder<Result &, Arguments...> : protected coroutine_yielder_base<Result &, Arguments...> { coroutine_yielder(size_t stack_size, void (*coroutine_call)()) : coroutine_yielder_base<Result &, Arguments...>{stack_size, coroutine_call} { } std::tuple<Arguments...> yield(Result & result) { this->result = result; return coroutine_yielder_base<Result &, Arguments...>::yield(); } }; // specialization for rvalue reference template<typename Result, typename... Arguments> struct coroutine_yielder<Result &&, Arguments...> : protected coroutine_yielder_base<Result &&, Arguments...> { coroutine_yielder(size_t stack_size, void (*coroutine_call)()) : coroutine_yielder_base<Result &&, Arguments...>{stack_size, coroutine_call} { } std::tuple<Arguments...> yield(Result && result) { this->result = std::move(result); return coroutine_yielder_base<Result &&, Arguments...>::yield(); } }; } template<typename Result, typename... Arguments> struct coroutine; template<typename Result, typename... Arguments> struct coroutine<Result (Arguments...)> : detail::coroutine_yielder<Result, Arguments...> { private: template<int N, int... S> struct starter; typedef starter<sizeof...(Arguments)> Starter; public: typedef typename detail::coroutine_yielder<Result, Arguments...> self; coroutine(std::function<Result (self &, Arguments...)> func, size_t stack_size = CORO_DEFAULT_STACK_SIZE) : detail::coroutine_yielder<Result, Arguments...>{stack_size, reinterpret_cast<void (*)()>(&Starter::coroutine_start)} , func{std::move(func)} { } // I don't need to specify these. the default behavior would result // in the same constructors and assignment operators being generated // but you get better error messages if I'm explicit about it coroutine(const coroutine &) = delete; coroutine & operator=(const coroutine &) = delete; coroutine(coroutine &&) = default; coroutine & operator=(coroutine &&) = default; operator bool() const { return !this->returned; } Result operator()(Arguments... args) { this->arguments = std::make_tuple(detail::any_storage<Arguments>{std::forward<Arguments>(args)}...); detail::coroutine_yielder<Result, Arguments...>::operator()(); return Starter::Finisher::return_result(*this); } private: // returning a value needs to be specialized for void return // this struct handles that template<typename R, int... S> struct finisher { static R return_result(coroutine & this_) { return std::forward<Result>(this_.result); } static void start_and_store_result(coroutine & caller) { caller.result = caller.coroutine_start<S...>(); } }; template<int... S> struct finisher<void, S...> { static void return_result(coroutine &) { } static void start_and_store_result(coroutine & caller) { caller.coroutine_start<S...>(); } }; template<int ...ArgCountList> Result coroutine_start() { return func(*this, std::forward<Arguments>(std::get<ArgCountList>(this->arguments))...); } // the calling of the coroutine needs to be specialized by // the number of arguments. this struct handles that template<int N, int... S> struct starter : starter<N - 1, N - 1, S...> { }; template<int... S> struct starter<0, S...> { typedef finisher<Result, S...> Finisher; static void coroutine_start(uint32_t this_pointer_left_half, uint32_t this_pointer_right_half) { coroutine & this_ = *reinterpret_cast<coroutine *>((static_cast<size_t>(this_pointer_left_half) << 32) + this_pointer_right_half); try { Finisher::start_and_store_result(this_); } catch(...) { this_.exception = std::current_exception(); } this_.returned = true; } }; std::function<Result (self &, Arguments...)> func; }; }
And here it is in D:
module coro.coroutine; import coro.ucontext; import std.algorithm; import std.conv; import std.typecons; template VariadicForward(string before, string toforward, string after, size_t i) { static if (i == 0) { immutable string VariadicForward = before ~ after; } else static if (i == 1) { immutable string VariadicForward = VariadicForward!(before, toforward, "move(" ~ toforward ~ "[" ~ to!string(i - 1) ~ "])" ~ after, i - 1); } else { immutable string VariadicForward = VariadicForward!(before, toforward, ", move(" ~ toforward ~ "[" ~ to!string(i - 1) ~ "])" ~ after, i - 1); } } class coroutine(Result, Arguments...) { private alias Tuple!Arguments.Types Types; this(Result delegate(coroutine!(Result, Arguments), Types) func, size_t stack_size = 8192) { this.func = func; stack = new ubyte[stack_size]; getcontext(&callee); callee.uc_link = &caller; callee.uc_stack.ss_size = stack_size; callee.uc_stack.ss_sp = stack.ptr; extern(C) void function() call = cast(void function())&coroutine_start; makecontext(&callee, call, 2, cast(size_t)(cast(void *)this) >> 32, this); } Result opCall(Types arguments) { if (returned) throw new Throwable("This coroutine has already finished"); static if (arguments.length) { foreach(i, ref argument; arguments) this.arguments[i] = move(argument); } swapcontext(&caller, &callee); if (exception) { throw exception; } else static if (!is(Result == void)) { return move(result); } } @property bool callable() const { return !returned; } static if (is(Result == void)) { alias yield_common yield; } else { Tuple!Arguments yield(ref Result result) { this.result = result; return yield_common(); } Tuple!Arguments yield(Result result) { this.result = move(result); return yield_common(); } } private Tuple!Arguments yield_common() { swapcontext(&callee, &caller); mixin(VariadicForward!("return Tuple!Arguments(", "arguments", ");", Types.length)); } private: Result delegate(coroutine!(Result, Arguments), Types) func; ubyte[] stack; ucontext_t caller; ucontext_t callee; Throwable exception; bool returned = false; static if (!is(Result == void)) { Result result; } Types arguments; extern(C) static void coroutine_start(uint this_pointer_left_half, uint this_pointer_right_half) { coroutine this_ = cast(coroutine)cast(void *)(((cast(size_t)this_pointer_left_half) << 32) + this_pointer_right_half); try { static if (is(Result == void)) { mixin(VariadicForward!("this_.func(this_, ", "this_.arguments", ");", Types.length)); } else { mixin(VariadicForward!("this_.result = this_.func(this_, ", "this_.arguments", ");", Types.length)); } } catch(Throwable ex) { this_.exception = ex; } this_.returned = true; } }
The most obvious point first: D is much cleaner.
– The main reason reason why D is much cleaner is static if. If C++ had static if, I could make my C++ implementation almost as clean as the one in D. Without it I am stuck with all these struct specializations and a weird inheritance based form of code re-use. Luckily the D developers are also active in C++ development and have proposed to add static if to the next version of the C++ standard. One can only hope that this will be done. This small example has convinced me that it is an absolutely necessary feature.
– I couldn’t get forwarding to work in the D implementation, because std.typecons.Tuple doesn’t forward it’s arguments. I think the person who implemented it must have just forgotten. There might also be a good reason for it, in which case I would have to remove all forwarding. Moving objects is hacky in D anyway, and this is one area where C++ is simply better because moving is part of the type system.
The forwarding that I did do required mixins, which are an improved version of the C preprocessor. I needed to do this because D’s syntax for variadic templates is not as powerful as that of C++. In C++ you can use the … keyword to declare where you want your statement unrolled. In D you can only say for example “move(arguments)” where in C++ you would have the choice between “std::move(arguments…)” and “std::move(arguments)…” The first one means “call function std::move with all arguments” and the second one means “call std::move for each argument.” I think D should adopt this.
If you’re interested here is a tiny explanation for how the mixin works: The template at the top of the file generates a string recursively. If a template has only one member and that one member has the template’s name, the template becomes an alias for it. In this case that’s a string. The mixin then just says “place that string here.”
– The D solution also does not support references. Everything has to be passed in by pointer or by value. I’m not sure if the problems I ran into were compiler bugs or desired behavior. You are not allowed to use the “ref” keyword inside the compile time arguments of a type: “coroutine!(ref int)” gives the error “expression expected, not ‘ref’.”
– C++ required a weird hack to unroll the argument tuple and pass it to the function: I have to first build a variadic list of integers in order to then unroll std::get correctly with that list. For example for four arguments I want to be able to say “std::get<0, 1, 2, 3>(/**/)…” which will be expanded to “std::get<0>(/**/), std::get<1>(/**/), std::get<2>(/**/), std::get<3>(/**/)” This hack is not needed in D because I don’t need to store the arguments as a tuple. Instead I use mixins, which aren’t the best thing, but are less verbose.
– D’s implementation is leaking memory. I tried to write a simple destructor “~this() { delete stack; }” but that was giving me an exception. I wasn’t in the mood to debug it. In D you’re leaking memory all the time when dealing with arrays, and at some point you just give up and accept that. I considered using std.container.Array which has deterministic memory behavior, but that one doesn’t seem to expose a pointer to it’s internal storage. I think the proper solution in D is to use malloc and free with GC.addRange and GC.removeRange, which doesn’t seem clean.
– In C++ I tested my code by writing test cases in the main function. In D I tested my code by writing unit tests. Which makes it safer to continue working with the D code in the future.
– Lastly: This would have been a giant pain before C++11. I think this has most of the important features of boost.coroutines in a fraction of the amount of code. C++ is improving.
All in all though these two solutions are pretty much equivalent. I would enjoy working with both. Both languages allowed me to implement this reasonably cleanly.
Finally, here are simple examples for how to use this.
int main() { std::vector<int> result; coroutine<int (int)> testArgument{[&result](coroutine<int (int)>::self & self, int i) { result.push_back(i * 10); for (int j = 0; j < 3; ++j) { result.push_back(std::get<0>(self.yield(j)) * 10); } return 3; }}; for (int i = 0; testArgument; ++i) { result.push_back(testArgument(i)); } assert((result == std::vector<int>{ 0, 0, 10, 1, 20, 2, 30, 3 })); }
unittest { int[] result; coroutine!(int, int, "i") arguments_test = new coroutine!(int, int, "i")((coroutine!(int, int, "i") self, int i) { result ~= i * 10; foreach(j; 0 .. 3) { result ~= self.yield(j).i * 10; } return 3; }); for (int i = 0; arguments_test.callable; ++i) { result ~= arguments_test(i); } assert(result == [0, 0, 10, 1, 20, 2, 30, 3]); }