Probably Dance

I can program and like games

The Best Game Stories Are Told in the Past Tense

I just finished playing Analogue: A Hate Story, and it is a great game. While playing it I noticed a few patterns of game stories that I can’t find collected anywhere, so I’ll do that here then.

The main one is that the best game stories are told in the past tense. Meaning most or all of the story has taken place before the player starts playing. Bioshock does this, Portal does it, Gone Home, To the Moon and Analogue also do it. It’s easy to come up with counter examples that also have a good story (just look at Christine Love’s previous game: don’t take it personally babe, it just ain’t your story) but it’s interesting that this pattern should prove so successful in an interactive medium.

Read the rest of this entry »

Source to Source Optimization

In April Olaf Krzikalla Gave a talk titled “Performing Source-to-Source Transformations with Clang” (Videos available here)

In that talk he showed an auto vectorizer that doesn’t perform the optimization in the assembly, but which instead spits out new source code that performs the optimized operations. Here is a picture from his talk:

A picture showing a source-to-source transformation. There is the source code for a simple loop operating on an array of floats on the left hand side, and the same loop using SIMD instructions on the right hand side.

And I want to explain why all compiler optimizations should do this.

Read the rest of this entry »

4GB per Vector

I recently had a problem where I had a vector that was growing once, being iterated over once, and then deallocated. And it was bothering me how much time I spent on reallocating. The problem was that I also could not predict well how big the vector would be ahead of time. It could vary by orders of magnitude for input that looked similar at first.

As I was researching the minor page faults that I was hitting during the reallocations I came across this stack overflow question. There’s a person there who speaks of resizing an array by requesting a lot of memory from the OS, but then only actually committing the memory as he wants the array to grow.

Which is smart, and I wonder why nobody has taken it further. I think all vectors should operate like that all the time. Thus I wrote my own vector class which always requests 4GB of memory from the OS, and which then only actually commits that memory one page at a time. After all we have this enormous address space in 64 bit that we won’t be using for a long time.

Read the rest of this entry »

Handmade Coroutines for Windows

In a previous post I implemented coroutines using the ucontext.h header. In this post I will show a replacement that should be much faster. I will also provide a Windows implementation.

I like to start off with some code, so here is the complete code for switching the stack in Linux:

switch_to_context:
    pushq %rbp
    movq %rsp, %rbp
    // store rbx and r12 to r15 on the stack. these will be restored
    // after we switch back
    pushq %rbx
    pushq %r12
    pushq %r13
    pushq %r14
    pushq %r15
    movq %rsp, (%rdi) // store stack pointer
switch_point:
    // set up the other guy's stack pointer
    movq %rsi, %rsp
    // and we are now in the other context
    // restore registers
    popq %r15
    popq %r14
    popq %r13
    popq %r12
    popq %rbx
    popq %rbp
    retq

Read the rest of this entry »

The problems with uniform initialization

C++11 made the {} syntax for initializing aggregates more widely usable. You can now also use it to call constructors and to initialize with a std::initializer_list. It also allows you to drop the type name in some circumstances. The general recommendation seems to be that you use it as much as possible. But when I have started doing that I have found that it sometimes doesn’t do what I want, and that it may make maintenance more difficult.

Here’s what it looks like:

#include <iostream>

struct Widget
{
    Widget(int m) : m{m} {}
    operator int() const { return m; }
    int m;
};

Widget decrement_widget(Widget w)
{
    return { w - 1 };
}

int main()
{
    int a{1};
    std::cout << a << std::endl; // "1"
    Widget w{5};
    std::cout << w << std::endl; // "5"
    std::cout << decrement_widget({5}) << std::endl; // "4"
}

It can be used to initialize everything (hence the name uniform initialization) and as you can see it makes some code more convenient because I don’t even need the name any more if the compiler should know it.

It makes your code look a bit weird at first because you have squiggly braces everywhere, but after a while I found that I prefer it because it sets initialization apart from function calls.

But I have stopped using it and I recommend that you don’t use it either.

Read the rest of this entry »

A faster implementation of std::function

As I wrote in my last post, I consider std::function to be a very important class that will change how you design your code, because it means that you have to use inheritance less often. In that post I was very impressed with the performance of std::function when compiled with optimizations. Unfortunately std::function can be far slower than a virtual function call in debug.

I wanted a std::function implementation that doesn’t have too big of a performance impact on your application when debugging it, so I wrote my own. It is also faster than all other implementations that I could find in release mode.

The code is in the public domain (I want all library writers to start using it) and here is a download link.

Read the rest of this entry »

The importance of std::function

In C++11 there is a new class in the standard library called std::function. It allows you to store anything that is callable. For example

#include <functional>
#include <iostream>
void print_world()
{
    std::cout << "World!" << std::endl;
}

int main()
{
    std::function<void ()> hello_world = []{ std::cout << "Hello "; };
    hello_world();
    hello_world = &print_world;
    hello_world();
}

Prints “Hello World!” as you would expect. The assignment to and storage of the two different types is handled internally by the std::function.

The amazing thing is that calling a std::function is fairly cheap: It’s one virtual function call. That has wide implications for how you will use (or not use) virtual functions in the future. In many cases where you used to have to use a virtual function it is now better to use a std::function. For example for an update loop.

Read the rest of this entry »

scope(exit), scope(failure) and scope(success)

scope(exit) is a great idea from the D programming language. I’ll refer to the D website for motivating examples because they do a good job. There have been many implementations in C++ and Ignacio Castaño showed an elegant implementation using lambdas which is using a bit too many macros for my taste. (it’s using one) There is also an exception-safe implementation by Jon Kalb which doesn’t use the same optimization that Castaño does. I wrote an implementation that combines the benefits of the two.

Also D has scope(exit), scope(failure) and scope(success), where failure runs only if an exception caused stack unwinding, success runs otherwise, and exit runs in any case. And I think it makes sense to offer the complete set. So to keep this post short here is an implementation of all three in C++:

#pragma once

#include <exception>

namespace detail
{

template<typename F>
struct ScopeExitRunner
{
    ScopeExitRunner(const F & to_run)
    try : to_run(to_run)
    {
    }
    catch(...)
    {
        // if the copy constructor for to_run threw,
        // call the argument immediately then rethrow
        to_run();
    }
    ScopeExitRunner(ScopeExitRunner &&) = default;
    ~ScopeExitRunner()
    {
        to_run();
    }
private:
    F to_run;
};

template<typename F>
struct ScopeFailureRunner
{
    ScopeFailureRunner(const F & to_run)
    try : to_run(to_run)
    {
    }
    catch(...)
    {
        to_run();
    }
    ScopeFailureRunner(ScopeFailureRunner &&) = default;
    ~ScopeFailureRunner() noexcept
    {
        static_assert(noexcept(to_run()), "the given function must be noexcept. it will only be run if there was an exception");
        static_assert(noexcept(to_run.~F()), "the given functor must have a noexcept destructor. it is likely that it will be run when there was an exception");
        if (::std::uncaught_exception())
        {
            to_run();
        }
    }
private:
    F to_run;
};

template<typename F>
struct ScopeSuccessRunner
{
    ScopeSuccessRunner(const F & to_run)
        : to_run{to_run}
    {
    }
    ScopeSuccessRunner(ScopeSuccessRunner && other) = default;
    ~ScopeSuccessRunner()
    {
        if (!::std::uncaught_exception())
        {
            to_run();
        }
    }
private:
    F to_run;
};

} // end namespace detail

template<typename T>
detail::ScopeExitRunner<T> AtScopeExit(const T & to_run)
{
    return detail::ScopeExitRunner<T>(to_run);
}
template<typename T>
detail::ScopeFailureRunner<T> AtScopeFailure(const T & to_run)
{
    return detail::ScopeFailureRunner<T>(to_run);
}
template<typename T>
detail::ScopeSuccessRunner<T> AtScopeSuccess(const T & to_run)
{
    return detail::ScopeSuccessRunner<T>(to_run);
}

And you use it for example when dealing with windows functions:

// ...
IUnknown * important = nullptr;
::SomeWindowsFunctionA(/*...*/, &important);
auto release_important = AtScopeExit([important]
{
    important->Release();
});
// ...

This code should be as efficient as if you had placed that call manually at each exit point of your function.

Read the rest of this entry »

Learning D Part 4: I’m done

D has a lot of features which I like very much. And it has a few design decisions that completely kill it for me.

  • In D I don’t like making my objects structs, and I don’t like making my objects classes. I would like to have more control over how my type behaves.
  • The garbage collector has too much of an impact on the core language and the standard library.
  • Everything has shared ownerhsip by default. C++11 introduced features that make it easier to clearly indicate who owns what. D went in the opposite direction.

The issue with all of these is that they are so fundamental to the language that you can not ignore them. In C++ I always have the option to ignore a feature. In D I do not.

Read the rest of this entry »

Variadic Coroutines in C++ and D

In my last post I implemented a tiny coroutine class that is fully functional, but it doesn’t support arguments and return values. This time I provide an implementation which adds those features as well as exception safety.

I also used this as an example to compare D and C++; as an excuse to learn C++’s variadic template syntax and as a reason to see how you can improve on boost’s implementation by using variadic templates.
Let’s start off with the punchline. Here is my coroutine implementation in C++:

#pragma once

#include <ucontext.h>
#include <cstdint>
#include <exception>
#include <functional>
#include <memory>
#include <tuple>

#ifndef CORO_DEFAULT_STACK_SIZE
#define CORO_DEFAULT_STACK_SIZE SIGSTKSZ
#endif

namespace coro
{
namespace detail
{

/**
 * the coroutine_context holds the stack context and the current state
 * of the coroutine. it also starts the coroutine
 */
struct coroutine_context
{
    coroutine_context(size_t stack_size, void (*coroutine_call)())
        : stack{new unsigned char[stack_size]}
    {
        // create a context for the callee
        getcontext(&callee);
        callee.uc_link = &caller; // tell it to return to me when done
        callee.uc_stack.ss_size = stack_size;
        callee.uc_stack.ss_sp = stack.get();
        // pass the this pointer as two arguments, because officially
        // makecontext will pass the arguments as ints (even though
        // inofficially it would work with a single argument)
        makecontext(&callee, coroutine_call, 2, reinterpret_cast<size_t>(this) >> 32, this);
    }

    void operator()()
    {
        if (returned) throw "This coroutine has already finished";

        swapcontext(&caller, &callee); // continue here if yielded or returned

        if (exception)
        {
            std::rethrow_exception(std::move(exception));
        }
    }

protected:
    ucontext_t caller;
    ucontext_t callee;
    std::unique_ptr<unsigned char[]> stack;
    std::exception_ptr exception;
    bool returned = false;
};

// a type that can store both objects and references
template<typename T>
struct any_storage
{
    any_storage() = default;
    any_storage(T to_store)
        : stored{std::move(to_store)}
    {
    }
    any_storage & operator=(T to_store)
    {
        stored = std::move(to_store);
        return *this;
    }
    operator T &&()
    {
        return std::move(stored);
    }

private:
    T stored = T{};
};
// specialization for void
template<>
struct any_storage<void>
{
};
// specialization for lvalue references
template<typename T>
struct any_storage<T &>
{
    any_storage() = default;
    any_storage(T & to_store)
        : stored{&to_store}
    {
    }
    any_storage & operator=(T & to_store)
    {
        stored = &to_store;
        return *this;
    }
    operator T &()
    {
        return *stored;
    }

private:
    T * stored = nullptr;
};
// specialization for rvalue references
template<typename T>
struct any_storage<T &&>
{
    any_storage() = default;
    any_storage(T && to_store)
        : stored{&to_store}
    {
    }
    any_storage & operator=(T && to_store)
    {
        stored = &to_store;
        return *this;
    }
    operator T &&()
    {
        return *stored;
    }

private:
    T * stored = nullptr;
};

// implements the shared code among all specializations of coroutine_yielder
template<typename Result, typename... Arguments>
struct coroutine_yielder_base
    : protected coroutine_context
{
    coroutine_yielder_base(size_t stack_size, void (*coroutine_call)())
        : coroutine_context{stack_size, coroutine_call}
    {
    }
    std::tuple<Arguments...> yield()
    {
        swapcontext(&this->callee, &this->caller);
        return std::move(this->arguments);
    }

protected:
    any_storage<Result> result;
    std::tuple<any_storage<Arguments>...> arguments;
};

/**
 * The coroutine_yielder is responsible for providing a yield
 * function
 */
template<typename Result, typename... Arguments>
struct coroutine_yielder
    : protected coroutine_yielder_base<Result, Arguments...>
{
    coroutine_yielder(size_t stack_size, void (*coroutine_call)())
        : coroutine_yielder_base<Result, Arguments...>{stack_size, coroutine_call}
    {
    }
    std::tuple<Arguments...> yield(Result && result)
    {
        this->result = std::forward<Result>(result);
        return coroutine_yielder_base<Result, Arguments...>::yield();
    }

    // copying yield
    std::tuple<Arguments...> yield(const Result & result)
    {
        this->result = result;
        return coroutine_yielder_base<Result, Arguments...>::yield();
    }
};
// specialization for void
template<typename... Arguments>
struct coroutine_yielder<void, Arguments...>
    : coroutine_yielder_base<void, Arguments...>
{
    coroutine_yielder(size_t stack_size, void (*coroutine_call)())
        : coroutine_yielder_base<void, Arguments...>{stack_size, coroutine_call}
    {
    }
};
// specialization for lvalue reference
template<typename Result, typename... Arguments>
struct coroutine_yielder<Result &, Arguments...>
    : protected coroutine_yielder_base<Result &, Arguments...>
{
    coroutine_yielder(size_t stack_size, void (*coroutine_call)())
        : coroutine_yielder_base<Result &, Arguments...>{stack_size, coroutine_call}
    {
    }
    std::tuple<Arguments...> yield(Result & result)
    {
        this->result = result;
        return coroutine_yielder_base<Result &, Arguments...>::yield();
    }
};
// specialization for rvalue reference
template<typename Result, typename... Arguments>
struct coroutine_yielder<Result &&, Arguments...>
    : protected coroutine_yielder_base<Result &&, Arguments...>
{
    coroutine_yielder(size_t stack_size, void (*coroutine_call)())
        : coroutine_yielder_base<Result &&, Arguments...>{stack_size, coroutine_call}
    {
    }
    std::tuple<Arguments...> yield(Result && result)
    {
        this->result = std::move(result);
        return coroutine_yielder_base<Result &&, Arguments...>::yield();
    }
};
}

template<typename Result, typename... Arguments>
struct coroutine;

template<typename Result, typename... Arguments>
struct coroutine<Result (Arguments...)>
    : detail::coroutine_yielder<Result, Arguments...>
{
private:
    template<int N, int... S>
    struct starter;
    typedef starter<sizeof...(Arguments)> Starter;
public:
    typedef typename detail::coroutine_yielder<Result, Arguments...> self;
    coroutine(std::function<Result (self &, Arguments...)> func, size_t stack_size = CORO_DEFAULT_STACK_SIZE)
        : detail::coroutine_yielder<Result, Arguments...>{stack_size, reinterpret_cast<void (*)()>(&Starter::coroutine_start)}
        , func{std::move(func)}
    {
    }
    // I don't need to specify these. the default behavior would result
    // in the same constructors and assignment operators being generated
    // but you get better error messages if I'm explicit about it
    coroutine(const coroutine &) = delete;
    coroutine & operator=(const coroutine &) = delete;
    coroutine(coroutine &&) = default;
    coroutine & operator=(coroutine &&) = default;

    operator bool() const
    {
        return !this->returned;
    }

    Result operator()(Arguments... args)
    {
        this->arguments = std::make_tuple(detail::any_storage<Arguments>{std::forward<Arguments>(args)}...);
        detail::coroutine_yielder<Result, Arguments...>::operator()();
        return Starter::Finisher::return_result(*this);
    }

private:
    // returning a value needs to be specialized for void return
    // this struct handles that
    template<typename R, int... S>
    struct finisher
    {
        static R return_result(coroutine & this_)
        {
            return std::forward<Result>(this_.result);
        }
        static void start_and_store_result(coroutine & caller)
        {
            caller.result = caller.coroutine_start<S...>();
        }
    };
    template<int... S>
    struct finisher<void, S...>
    {
        static void return_result(coroutine &)
        {
        }
        static void start_and_store_result(coroutine & caller)
        {
            caller.coroutine_start<S...>();
        }
    };

    template<int ...ArgCountList>
    Result coroutine_start()
    {
        return func(*this, std::forward<Arguments>(std::get<ArgCountList>(this->arguments))...);
    }

    // the calling of the coroutine needs to be specialized by
    // the number of arguments. this struct handles that
    template<int N, int... S>
    struct starter : starter<N - 1, N - 1, S...>
    {
    };

    template<int... S>
    struct starter<0, S...>
    {
        typedef finisher<Result, S...> Finisher;
        static void coroutine_start(uint32_t this_pointer_left_half, uint32_t this_pointer_right_half)
        {
            coroutine & this_ = *reinterpret_cast<coroutine *>((static_cast<size_t>(this_pointer_left_half) << 32) + this_pointer_right_half);
            try
            {
                Finisher::start_and_store_result(this_);
            }
            catch(...)
            {
                this_.exception = std::current_exception();
            }
            this_.returned = true;
        }
    };

    std::function<Result (self &, Arguments...)> func;
};

}

And here it is in D:

module coro.coroutine;

import coro.ucontext;
import std.algorithm;
import std.conv;
import std.typecons;

template VariadicForward(string before, string toforward, string after, size_t i)
{
    static if (i == 0)
    {
        immutable string VariadicForward = before ~ after;
    }
    else static if (i == 1)
    {
        immutable string VariadicForward = VariadicForward!(before, toforward, "move(" ~ toforward ~ "[" ~ to!string(i - 1) ~ "])" ~ after, i - 1);
    }
    else
    {
        immutable string VariadicForward = VariadicForward!(before, toforward, ", move(" ~ toforward ~ "[" ~ to!string(i - 1) ~ "])" ~ after, i - 1);
    }
}

class coroutine(Result, Arguments...)
{
	private alias Tuple!Arguments.Types Types;
	this(Result delegate(coroutine!(Result, Arguments), Types) func, size_t stack_size = 8192)
	{
		this.func = func;
		stack = new ubyte[stack_size];
		getcontext(&callee);
		callee.uc_link = &caller;
		callee.uc_stack.ss_size = stack_size;
		callee.uc_stack.ss_sp = stack.ptr;
		extern(C) void function() call = cast(void function())&coroutine_start;
		makecontext(&callee, call, 2, cast(size_t)(cast(void *)this) >> 32, this);
	}

	Result opCall(Types arguments)
	{
		if (returned) throw new Throwable("This coroutine has already finished");

		static if (arguments.length)
		{
			foreach(i, ref argument; arguments)
				this.arguments[i] = move(argument);
		}
	    swapcontext(&caller, &callee);

	    if (exception)
		{
			throw exception;
		}
		else static if (!is(Result == void))
		{
			return move(result);
		}
	}

	@property bool callable() const
	{
		return !returned;
	}

	static if (is(Result == void))
	{
		alias yield_common yield;
	}
	else
	{
		Tuple!Arguments yield(ref Result result)
		{
			this.result = result;
			return yield_common();
		}
		Tuple!Arguments yield(Result result)
		{
			this.result = move(result);
			return yield_common();
		}
	}

	private Tuple!Arguments yield_common()
	{
	    swapcontext(&callee, &caller);
    	mixin(VariadicForward!("return Tuple!Arguments(", "arguments", ");", Types.length));
	}

private:
	Result delegate(coroutine!(Result, Arguments), Types) func;
	ubyte[] stack;
	ucontext_t caller;
	ucontext_t callee;
	Throwable exception;
	bool returned = false;
	static if (!is(Result == void))
	{
		Result result;
	}
	Types arguments;

	extern(C) static void coroutine_start(uint this_pointer_left_half, uint this_pointer_right_half)
	{
		coroutine this_ = cast(coroutine)cast(void *)(((cast(size_t)this_pointer_left_half) << 32) + this_pointer_right_half);
	    try
	    {
	    	static if (is(Result == void))
	    	{
	    		mixin(VariadicForward!("this_.func(this_, ", "this_.arguments", ");", Types.length));
	    	}
	    	else
	    	{
	    		mixin(VariadicForward!("this_.result = this_.func(this_, ", "this_.arguments", ");", Types.length));
	    	}
	    }
	    catch(Throwable ex)
	    {
	    	this_.exception = ex;
	    }
	    this_.returned = true;
	}
}

The most obvious point first: D is much cleaner.

Read the rest of this entry »