Measuring Mutexes, Spinlocks and how Bad the Linux Scheduler Really is

This blog post is one of those things that just blew up. From a tiny observation at work about odd behaviors of spinlocks I spent months trying to find good benchmarks, (still not entirely successful) writing my own spinlocks, mutexes and condition variables and even contributing a patch to the Linux kernel. The main thing I’ll try to answer is to give some more informed guidance on the endless discussion of mutex vs spinlock. Besides that I found that most mutex implementations are really good, that most spinlock implementations are pretty bad, and that the Linux scheduler is OK but far from ideal. The most popular replacement, the MuQSS scheduler has other problems instead. (the Windows scheduler is pretty good though)

So this all started like this: I overheard somebody at work complaining about mysterious stalls while porting Rage 2 to Stadia. (edit disclaimer: This blog post got more attention than anticipated, so I decided to clarify that I didn’t work on the Rage 2 port to Stadia. As far as I know that port was no more or less difficult than a port to any other platform. I am only aware of this problem because I was working in the same office as the people who were working on the port. And the issue was easily resolved by using mutexes instead of spinlocks, which will become clear further down in the blog. All I did was further investigation on my own afterwards. edit end) The only thing those mysterious stalls had in common was that they were all using spinlocks. I was curious about that because I happened to be the person who wrote the spinlock we were using. The problem was that there was a thread that spent several milliseconds trying to acquire a spinlock at a time when no other thread was holding the spinlock. Let me repeat that: The spinlock was free to take yet a thread took multiple milliseconds to acquire it. In a video game, where you have to get a picture on the screen every 16 ms or 33 ms (depending on if you’re running at 60hz or 30hz) a stall that takes more than a millisecond is terrible. Especially if you’re literally stalling all threads. (as was happening here) In our case we were able to make the problem go away by replacing spinlocks with mutexes, but that leads to the question: How do you even measure whether a spinlock is better than a mutex, and what makes a good spinlock?

Read the rest of this entry »