Introducing the Asserting Mutex

by Malte Skarupke

Multithreaded code in C++ tends to become brittle over time. If you write your code well you’ll need almost no synchronization between different threads, but the price of that is that your code will be littered with undocumented conventions of when you can read or modify which state. In your average threaded C++ application there are countless potential race conditions, all of which never happen because people follow conventions about when to do what. Until someone doesn’t know about a convention that he has to follow or until you change the conventions and you forget to update one piece of code that you didn’t know about.

Enter the asserting mutex. The asserting mutex is a conditional breakpoint that triggers only if a potential race condition actually happens. I call it asserting mutex because you use it like a mutex to protect a critical section. It works very simply: If one thread locks the asserting mutex and a second thread attempts to lock the asserting mutex before the first thread unlocks it, you get an assert. And it guarantees that both threads will still be inside the critical section when you get the assert. The cost is one atomic increment and one atomic decrement, which is not free but cheap enough that you can place lots of asserting mutexes in your code before they cause problems. So you could use this to document many of your threading conventions. Used correctly this is a breakpoint that makes it very easy to find data races.

Here is the complete code:

#include <atomic>

struct AssertingMutex
{
    AssertingMutex()
        : is_in_use(0)
    {
    }

    void lock()
    {
        ASSERT(++is_in_use == 1);
    }
    void unlock()
    {
        ASSERT(--is_in_use == 0);
    }

private:
    std::atomic<signed char> is_in_use;
};

The license for the code is at the bottom of the post.

As an example of the kind of bug that you’ll run into let me explain one that I had recently which I solved using the AssertingMutex:

There was an object M and a system B. The object M adds itself to the system B. There also was an old memory optimization from past consoles where someone realized that the object M didn’t actually need all of its members, because the system B would have the same information once the object was added. So the object would just forward its getters to the getters of system B and this saved a little memory. Even at that time system B could run in a separate thread than object M, but this was fine because all modifications were synchronized and the data was immutable between synchronization points, and reading immutable data is safe without synchronization. This memory optimization happened five years ago and nobody remembered any of this, but this is still how the object behaves and none of of the conventions that made this work are enforced anywhere.

There are a lot of changes that could break this, and the specific one that caused problems for us was that the system switched to managing its internal structure in a packed object array. Which means that you can’t call any of the getters of the system while its update is running because it may be reordering its index array. So if your object was moved around in that packed object array exactly while another thread calls the getters of the object, you could possibly read the values of another object.

I don’t actually know how often this bug happened, probably once every few days. Rare enough that nobody ever noticed because the only difference in behavior you’d see would be that an object could incorrectly be culled for one frame, which means it would pop out of existence for one frame. Oh and the particular instance where I hit this bug was in the effect system, so you’d only see parts of an effect disappear for one frame.

In other words this bug would be impossible to track down, but I was lucky enough that this bug triggered an assert in system B when I was running in a debugger and the assert had a somewhat useful callstack which pointed me in the right direction. With that callstack I discovered the history of object M and system B from above and realized that there was a potential race on the packed object array. But as is so often the case in threading bugs, by the time that I hit the assert the conflicting thread was already doing something else which means I had no idea how I actually got into this state where the race had happened.

Enter the asserting mutex: Once I realized where the potential race was I added an asserting mutex to the right point and after less than ten minutes of playing I hit the assert inside the asserting mutex. One nice property of the asserting mutex is that it guarantees that both threads will still have the critical section in their callstack when the debugger pauses because of the assert. That immediately showed me under which conditions the race actually happened and armed with that information I could make a well informed decision about how to fix this. One solution would have been to add synchronization to the relevant code but that would have made it intolerably slow. So my solution ended up being to just back out the five year old memory optimization.

In this case the bug turned out to be harmless but I didn’t know that when I hit the initial assert. I don’t know how I could possibly have found this bug without the asserting mutex. Looking back on it with the full knowledge of how the pieces interact it now seems obvious where the bug came from, but at that time I think it would have been impossible to come to the correct conclusion just by reasoning about the code. Having a conditional breakpoint (which is what the asserting mutex is) that triggers only precisely when both threads are inside the critical section is incredibly valuable. It is equally valuable when you add the asserting mutex to a piece of code and it doesn’t trigger. Because then you know that this potential race does not happen and that you don’t need synchronization. I mean how good is it to be certain that you don’t need synchronization?

If you’ve read all of that here’s a reward in the form of an asserting mutex which allows multiple readers but only one writer:

edit: I updated the code thanks to help from Calsbeek in the comments. So don’t be surprised if our discussion down there doesn’t match this code.

struct AssertingReaderWriterMutex
{
    struct Writer
    {
        Writer()
            : counter(0)
        {
        }

        void lock()
        {
            DE_ASSERT(++counter == 1);
        }
        void unlock()
        {
            DE_ASSERT(--counter == 0);
        }

    protected:
        std::atomic<int> counter;
    };
    struct Reader : private Writer
    {
        friend struct AssertingReaderWriterMutex;

        void lock()
        {
            DE_ASSERT(--counter < 0);
        }
        void unlock()
        {
            DE_ASSERT(++counter <= 0);
        }
    };

    operator Writer &()
    {
        return GetWriter();
    }
    operator Reader &()
    {
        return GetReader();
    }
    Writer & GetWriter()
    {
        return reader;
    }
    Reader & GetReader()
    {
        return reader;
    }

private:
    Reader reader;
};

Since this one is a bit less obvious to use here is an example that will assert:

TEST(asserting_reader_writer_mutex, writer_first)
{
    AssertingReaderWriterMutex mutex;
    ASSERT_TRUE(DoesAssert([&]
    {
        std::lock_guard<AssertingReaderWriterMutex::Writer> lock(mutex);
        std::lock_guard<AssertingReaderWriterMutex::Reader> second_lock(mutex);
    }));
}

As you can see I am specializing std::lock_guard for either the ::Reader or ::Writer struct to indicate whether this is a reader lock or a writer lock. This choice of nested structs makes the AssertingReaderWriterMutex less readable, but it makes the client code more readable. And I think that’s a good trade-off.

The license for all code in this post is this:

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to <http://unlicense.org/>

It would be nice if you gave credit to Malte Skarupke anyway.

Enjoy