My “Way Too Generic Serializer”

by Malte Skarupke

I wrote a serializer for my engine that is extremely convenient and generic.

A piece of serialization code in the engine looks something like this:

class PlaceMode
{
public:

    // ...

    SERIALIZE_FUNCTION(1)
    {
        SERIALIZE_START();
        if (version < 1) SERIALIZE_BASE(State);
        else             SERIALIZE_BASE(EditorState);
        SERIALIZE(archetypeNames);
        SERIALIZE(lastPosition);
        SERIALIZE_END();
    }

private:
    std::vector<std::string> archetypeNames;
    Vector4 lastPosition;
};

First thing: Don’t be repulsed by the macros. I will explain why they are necessary.

So last semester when we wrote our first engine, one of the requirements was to have text serialization for objects in the game. I did some research and I really liked the approach taken in the boost serialization library. In order to keep the serialization code for output and input in sync, they overload operator& to be either operator<< or operator>> depending on whether you’re loading or saving. A serialize function in boost will look something like this:

struct Circle
{
    float x, y, z;
    float radius;
    template<typename Archive>
    void serialize(Archive & ar, unsigned int version)
    {
        ar & x & y & z & radius;
    }
};

This will either load or save a circle. You can also serialize the content of a std::vector or other stl container by calling operator& on it. Extremely elegant, eh? You can use the version parameter to keep track of different versions of the struct. So that if you add or remove fields you can change how the serialize function behaves.
All of this I copied for my first semester engine. Boost did a whole lot more than I did, but I got these basics working. However once you actually start using that system you run into this:

22 serialization::archive 9 0 0 0 0 6 0 0 0 0 2 3 bob 6 24 4 1 0
0 0 0 3 0 5 1 0
1 0 0 0 0 34 135 52.560001 134 22 78.300003 11 24th Street 11 10th Avenue 5
2 35 137 23.455999 133 35 54.119999 12 State street 20 Cathedral Vista Lane 6 1 0
3 35 136 15.456 133 32 15.3 11 White House 3 bob 9 57 4 0 5 alice 11 2 4 0 3 ted 7 17 4
4 3 0 6
5 35 134 48.789001 133 32 16.23 16 Lincoln Memorial 6 3 5 2 3 ted 9 38 4 4 5 alice 11 47 4 4

That is the output of the boost serialization bus schedule demo. Boost can read and write this, humans can’t.

Turns out, that it was very easy to write some macros that use the old serialize system that make this more pretty:

#define SERIALIZE(variable) ar & #variable & ':' & variable & ",\n"
#define SERIALIZE_BASE(className) ar & static_cast<className &>(*this)

With those and some modifications to the archive classes to put brackets around things and indentations into things, the example from above would output the following:

{
    PlaceMode : 1
    {
        EditorState : 0
        enabled : true,
    }
    archetypeNames : [
        "gnome",
    ]
    lastPosition : (4, 0, 10, 1),
}

Which is much more readable.
This also enables consistency checking: on the Archive that’s reading I have overloaded operator&(const char *&) to just read a string and assert that it is equal to the given c-string. So the “ar & #variable” part of the SERIALIZE macro will write for example “archetypeNames” when saving and will assert that “archetypeNames” is the next word when loading.

The SERIALIZE_START and SERIALIZE_END macro declare a loop around the whole thing. With that I can have the text in a different order in the file than in the source without crashing the entire thing. (however to do that I also had to make the SERIALIZE macro more complicated and if I went into detail here, the scope of this article would just explode)

So now we have a file that we can actually read, debug and modify. If something goes wrong we can look at the file and understand what is wrong.

By this point I should probably also talk about some things that are going on behind the scene:
Each class has to define a getVersion() method that just returns the current version of this class. In the example above that would be 1 for PlaceMode and 0 for EditorState. When saving I always pass the current version to the serialize method. When loading I read the class name and version from the file, then call a factory with that information which creates an object and calls the serialize function.

For primitive types and stl containers I have special overloads of operator& in the archive classes. Template metaprogramming saves you a lot of duplicate code there. I don’t save identifiers for those types in the file because I always know at compile time what type it is. If someone calls operator& with a vector then I’m reading a vector next.

And in this semester we could handle most of the stuff we changed by just updating the serialize function and the version number. If you just add or remove a field, you change that in the serialize function, load the entire level, save the entire level, and everything is up to date again.

Now there were a couple of problems with this serializer. The biggest problem was that this is just a way too generic black box. The ResourceManager for example just loads an unordered_map<string, Resource *>. But what about those resources that are shared between levels and are already loaded? Well too bad: The serializer will just give you back a filled map. Not much you can do about that. So those shared resources: They get loaded twice and the duplicates get deleted.

I am currently fixing that by putting something I call “SerializeDecorators” in there. I think I can solve most of my problems if I can just inject a decorator that returns a simple instruction telling the serializer what to do next:

enum SerializeInstruction
{
    CONTINUE,
    SKIP_VALUE,
    STOP,
    STOP_AFTER_NEXT_VALUE
};

I don’t think that there are that many special cases, and that I can handle most of those with just this. The difficulty is giving the decorator some information about the item it is currently handling because, well it is operating inside a black box.

Second problem is template explosion. The serializer is seriously affecting our compile time. Everything in the archive and factory classes is templatized, with some template metaprogramming to handle special cases. And it is included everywhere and used by everything. I don’t currently know what I can do about that.

And the final problem is coupling. Everything now depends on the serializer. Making changes in there is scary because it gets used by everything. I now kind of like the ideas expressed in this blog post which says, that you’re sometimes better of just writing a save and load function for everything rather than having one central serializer. Which would of course also solve my other two problems.

So would I do stuff differently the next time? Probably not. This serializer was exactly the right solution for the two projects that I used it for. We had to make two games in two semesters. And one reason why the games turned out so well is, that I was able to get a level editor up that produced playable levels extremely quickly compared to other teams. It’s just nice to tell the software what to load or save and it will figure out how to do that for you. And that when you change something you just have to change one line. I am certain that this not only massively reduced the time to accomplish certain things, it also saved us a whole lot of frustration because we so rarely had bugs related to loading and saving. Which would have happened way more often had people had to implement serialize functions themselves.

And with this I think I am going to continue using this. It may be too inelegant for a larger scale environment, but for students I think this is the right thing. I’ll upload the source as soon as I am more positive that my recent modifications are relatively stable.

Probably Dance