Introducing the Asserting Mutex

by Malte Skarupke

Multithreaded code in C++ tends to become brittle over time. If you write your code well you’ll need almost no synchronization between different threads, but the price of that is that your code will be littered with undocumented conventions of when you can read or modify which state. In your average threaded C++ application there are countless potential race conditions, all of which never happen because people follow conventions about when to do what. Until someone doesn’t know about a convention that he has to follow or until you change the conventions and you forget to update one piece of code that you didn’t know about.

Enter the asserting mutex. The asserting mutex is a conditional breakpoint that triggers only if a potential race condition actually happens. I call it asserting mutex because you use it like a mutex to protect a critical section. It works very simply: If one thread locks the asserting mutex and a second thread attempts to lock the asserting mutex before the first thread unlocks it, you get an assert. And it guarantees that both threads will still be inside the critical section when you get the assert. The cost is one atomic increment and one atomic decrement, which is not free but cheap enough that you can place lots of asserting mutexes in your code before they cause problems. So you could use this to document many of your threading conventions. Used correctly this is a breakpoint that makes it very easy to find data races.

Here is the complete code:

#include <atomic>

struct AssertingMutex
{
    AssertingMutex()
        : is_in_use(0)
    {
    }

    void lock()
    {
        ASSERT(++is_in_use == 1);
    }
    void unlock()
    {
        ASSERT(--is_in_use == 0);
    }

private:
    std::atomic<signed char> is_in_use;
};

The license for the code is at the bottom of the post.

As an example of the kind of bug that you’ll run into let me explain one that I had recently which I solved using the AssertingMutex:

There was an object M and a system B. The object M adds itself to the system B. There also was an old memory optimization from past consoles where someone realized that the object M didn’t actually need all of its members, because the system B would have the same information once the object was added. So the object would just forward its getters to the getters of system B and this saved a little memory. Even at that time system B could run in a separate thread than object M, but this was fine because all modifications were synchronized and the data was immutable between synchronization points, and reading immutable data is safe without synchronization. This memory optimization happened five years ago and nobody remembered any of this, but this is still how the object behaves and none of of the conventions that made this work are enforced anywhere.

There are a lot of changes that could break this, and the specific one that caused problems for us was that the system switched to managing its internal structure in a packed object array. Which means that you can’t call any of the getters of the system while its update is running because it may be reordering its index array. So if your object was moved around in that packed object array exactly while another thread calls the getters of the object, you could possibly read the values of another object.

I don’t actually know how often this bug happened, probably once every few days. Rare enough that nobody ever noticed because the only difference in behavior you’d see would be that an object could incorrectly be culled for one frame, which means it would pop out of existence for one frame. Oh and the particular instance where I hit this bug was in the effect system, so you’d only see parts of an effect disappear for one frame.

In other words this bug would be impossible to track down, but I was lucky enough that this bug triggered an assert in system B when I was running in a debugger and the assert had a somewhat useful callstack which pointed me in the right direction. With that callstack I discovered the history of object M and system B from above and realized that there was a potential race on the packed object array. But as is so often the case in threading bugs, by the time that I hit the assert the conflicting thread was already doing something else which means I had no idea how I actually got into this state where the race had happened.

Enter the asserting mutex: Once I realized where the potential race was I added an asserting mutex to the right point and after less than ten minutes of playing I hit the assert inside the asserting mutex. One nice property of the asserting mutex is that it guarantees that both threads will still have the critical section in their callstack when the debugger pauses because of the assert. That immediately showed me under which conditions the race actually happened and armed with that information I could make a well informed decision about how to fix this. One solution would have been to add synchronization to the relevant code but that would have made it intolerably slow. So my solution ended up being to just back out the five year old memory optimization.

In this case the bug turned out to be harmless but I didn’t know that when I hit the initial assert. I don’t know how I could possibly have found this bug without the asserting mutex. Looking back on it with the full knowledge of how the pieces interact it now seems obvious where the bug came from, but at that time I think it would have been impossible to come to the correct conclusion just by reasoning about the code. Having a conditional breakpoint (which is what the asserting mutex is) that triggers only precisely when both threads are inside the critical section is incredibly valuable. It is equally valuable when you add the asserting mutex to a piece of code and it doesn’t trigger. Because then you know that this potential race does not happen and that you don’t need synchronization. I mean how good is it to be certain that you don’t need synchronization?

If you’ve read all of that here’s a reward in the form of an asserting mutex which allows multiple readers but only one writer:

edit: I updated the code thanks to help from Calsbeek in the comments. So don’t be surprised if our discussion down there doesn’t match this code.

struct AssertingReaderWriterMutex
{
    struct Writer
    {
        Writer()
            : counter(0)
        {
        }

        void lock()
        {
            DE_ASSERT(++counter == 1);
        }
        void unlock()
        {
            DE_ASSERT(--counter == 0);
        }

    protected:
        std::atomic<int> counter;
    };
    struct Reader : private Writer
    {
        friend struct AssertingReaderWriterMutex;

        void lock()
        {
            DE_ASSERT(--counter < 0);
        }
        void unlock()
        {
            DE_ASSERT(++counter <= 0);
        }
    };

    operator Writer &()
    {
        return GetWriter();
    }
    operator Reader &()
    {
        return GetReader();
    }
    Writer & GetWriter()
    {
        return reader;
    }
    Reader & GetReader()
    {
        return reader;
    }

private:
    Reader reader;
};

Since this one is a bit less obvious to use here is an example that will assert:

TEST(asserting_reader_writer_mutex, writer_first)
{
    AssertingReaderWriterMutex mutex;
    ASSERT_TRUE(DoesAssert([&]
    {
        std::lock_guard<AssertingReaderWriterMutex::Writer> lock(mutex);
        std::lock_guard<AssertingReaderWriterMutex::Reader> second_lock(mutex);
    }));
}

As you can see I am specializing std::lock_guard for either the ::Reader or ::Writer struct to indicate whether this is a reader lock or a writer lock. This choice of nested structs makes the AssertingReaderWriterMutex less readable, but it makes the client code more readable. And I think that’s a good trade-off.

The license for all code in this post is this:

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to <http://unlicense.org/>

It would be nice if you gave credit to Malte Skarupke anyway.

Enjoy

4 Comments to “Introducing the Asserting Mutex”

John Calsbeek (@jcalsbeek) says:

February 8, 2014 at 20:34

I’ve considered something like this before. But if you end up using it all over the place I’d start to get worried about compiling it out in release builds, because you have implicit memory barriers from the atomic loads/stores which could mask problems until you have a build without all your debugging in it.

Also I’m pretty sure it’s completely possible to do the reader-writer version with only atomic read-modify-write operations: it’s exactly like a normal lockless reader-writer lock only it asserts instead of spins in case of contention.

- Malte Skarupke says:
  
  February 9, 2014 at 07:09
  
  How do you write a lockless reader writer lock? I tried to use a spinlock but it actually made the code slower than std::mutex. I didn’t find out why, but I assume that it’s because I used the wrong atomic operation, which is a very easy thing to do. And instead of worrying about that I decided to just use std::mutex which seemed plenty fast.
  
  I also tried to use no lock and to make the reader_count variable be an atomic, but in that case I’d get an assert when the last reader would decrement the counter and was just about to unlock the writer when a new reader comes in and tries to lock it again. That situation would be harmless if you used a normal mutex because then the new reader would just have to wait for a tiny amount of time, but I immediately get an assert.
  
  In regards to your comment about the presense of this hiding bugs: I actually haven’t left an AssertingMutex in our code just yet. I’ve used it for debugging twice and removed it each time after I found the bug, but I do plan on leaving a few of them around. To be honest I can’t reason about memory barriers well which is why I just use the default atomic operations, but I think if I used memory_order_relaxed instead then this would not introduce memory barriers and thus would not change semantics between debug and release code. I’m not sure though whether I could still guarantee that both threads will have the critical section in their callstack when you hit the assert. Do you know?
  
John Calsbeek (@jcalsbeek) says:

February 10, 2014 at 20:48

You should be able to do an atomic reader-writer lock with a single atomic int that has one sentinel value (meaning “locked for writing”). A writer attempts to atomically exchange the sentinel value with 0, spinning (or asserting) when the value is not 0. A reader reads the counter, spins while it is the sentinel value, otherwise increments the value it reads and attempts to atomically exchange it in (starting from the top if this fails).

I think a relaxed atomic *should* have no negative side effects in this case. But that relies both on your hardware and your compiler not silently using a stricter memory ordering. x86 doesn’t actually have relaxed operations. But I don’t know if that matters… if your compiler/hardware don’t have weaker barriers, is there any trouble you can get into? I don’t know the answer to that.

Even with the lock, you might not have the guarantee that both threads are in the critical section, right? You could detect the condition and begin the process of triggering the assert, but have the other thread leave the critical section and release the lock before the assert triggers and pauses all threads.

- Malte Skarupke says:
  
  February 17, 2014 at 11:15
  
  Alright I now implemented the reader writer lock with just one atomic int. Your sentinel value made me realize that I could just use the sign bit to indicate whether you are currently reading (negative number) or writing (positive number). This one is also easier to reason about being correct. I am pretty sure that no matter which combination of readers and writers you put onto this, you will always get at least one reader and one writer still in the critical section when you assert.

Probably Dance