Probably Dance

I can program and like games

Category: Programming

Transform Matrices are Great and You Should Understand Them

The title of this blog post is obvious for any game programmer, but I notice that people outside of games often write clumsy code because they don’t know how transform matrices work. Especially when people do some simple 2D rendering code, like if you just want to quickly visualize some data in a HTML canvas. People get tripped up on transform math a lot. As an example imagine drawing this simple graph:

It’s just an arbitrary graph with arbitrary numbers, the point is all the layout decisions that happened here: E.g. “Long First Label” extends to the left and pushes everything else over to the right by a bit. If you aren’t organized about how to express your transforms, your code ends up with lots of arbitrary offsets and multipliers. You can’t even calculate where to draw the labels on the y-axis without including an offset for potentially long labels on the x-axis. (and long labels on the y-axis can push over the x-axis, too. Uh-oh) Your first choice for visualization should probably be an existing tool (like I used to generate the graph above) but surprisingly often you’ll want to do something custom, and then you have to worry about transforms.

Game programmers have to do complicated transforms all the time, so they had to get organized about this and the result is the transform matrix. It’s remarkably simple and every programmer should probably know it, just to appreciate its beauty. There are two tricks to transform matrices:

Read the rest of this entry »

Beautiful Branchless Binary Search

I read a blog post by Alex Muscar, “Beautiful Binary Search in D“. It describes a binary search called “Shar’s algorithm”. I’d never heard of it and it’s impossible to google, but looking at the algorithm I couldn’t help but think “this is branchless.” And who knew that there could be a branchless binary search? So I did the work to translate it into a algorithm for C++ iterators, no longer requiring one-based indexing or fixed-size arrays.

In GCC it is more than twice as fast as std::lower_bound, which is already a very high quality binary search. The search loop is simple and the generated assembly is beautiful. I’m astonished that this exists and nobody seems to be using it…

Read the rest of this entry »

Fine-grained Locking with Two-Bit Mutexes

Lets say you want to have a mutex for every item in a list with 10k elements. It feels a bit wasteful to use a std::mutex for each of those elements. In Linux std::mutex is 40 bytes, in Windows it’s 80 bytes. But mutexes don’t need to be that big. You can fit a mutex into two bits, and it’s going to be fast. So we could fit a mutex into the low bits of a pointer. If your container already stores pointers, you might be able to store a mutex for each element with zero space overhead, not even any extra operations during initialization. You’d pay no cost until you use a mutex for the first time.

Lets start with a mutex that uses one byte. It’s easy now that C++ has added futexes to the standard:

Read the rest of this entry »

Finding the “Second Bug” in glibc’s Condition Variable

I continue to have no time for big programming projects, so here is a short blog post. Two years ago I looked into a bug in the glibc implementation of condition variables: Sometimes pthread_cond_signal() doesn’t do anything, which can easily hang your program. The bug is still not fixed, partially because a mitigation patch was available right away that seemed to make it go away. Except that people kept on showing up in the bug report saying that they still hit the bug sometimes, raising the suspicion that there might be a second bug. I finally got around to looking into this. I found that the mitigation patch only helps a little, it’s still the same bug, and the patch I submitted (unreviewed, don’t use yet) would actually fix it.

Read the rest of this entry »

Automated Game AI Testing

In 2018 I wrote an article for the book “Game AI Pro 4” called “Automated AI Testing: Simple tests will save you time.” The book has since been canceled, but the article is now available online on the Game AI Pro website.

The history of this is that in 2017 there was a round table at the Game Developer Conference about AI testing. And despite it being the year 2017, automated testing was barely even mentioned. It was a terrible round table. A coworker who sat in the audience with me said to me that I could have given a better talk because I had invested a lot of work into automated testing. So next year I submitted a talk about automated AI testing and was rejected. But they asked me to write for the book instead. Now the book is canceled, too. A copy of the article is below, with some follow ups at the end. It’s written for people who have never done automated testing, like the AI programmers at the round table. But I think the core trick of doing fiber-based control flow, that can wait for things to happen, could be widely useful:

Read the rest of this entry »

C++ Coroutines Do Not Spark Joy

C++20 added minimal support for coroutines. I think they’re done in a way that really doesn’t fit into C++, mostly because they don’t follow the zero-overhead principle. Calling a coroutine can be very expensive (requiring calls to new() and delete()) in a way that’s not entirely under your control, and they’re designed to make it extra hard for you to control how expensive they are. I think they were inspired by C# coroutines, and the design does fit into C# much better. But in C++ I don’t know who they are for, or who asked for this…

Before we get there I’ll have to explain what they are and what they’re useful for. Briefly, they’re very useful for code with concurrency. The classic example is if your code has multiple state machines that change state at different times: E.g. the state machine for reading data from the network is waiting for more bytes, and the code that provides bytes is also a state machine (maybe it’s decompressing) which in turn gets its bytes from another state machine (maybe it’s handling the TCP/IP layer). This is easier to do when all of these can pretend to operate on their own, as if in separate threads, maybe communicating through pipes. But the code looks nicer if the streamer can just call into the decompressor using a normal synchronous function call that returns bytes. Coroutines allow you to do that without blocking your entire system when more bytes aren’t immediately available, because code can pause and resume in the middle of the function.

One of the best things the C++ standard did is to define the word “coroutine” as different from related concepts like “fibers” or “green threads”. (this very much went against existing usage, so for example Lua coroutines are not the same thing as C++ coroutines. I think that’s fine, since the thing that was added to C++ could be useful, and there is a good case to be made that the existing usage was wrong) In the standard, a coroutine is simply a function that behaves differently when called multiple times: Instead of restarting from the start on each call, it will continue running after the return statement that last returned. To do this, it needs some state to store the information of where it last returned, and what the state of the local variables was at that point. In existing usage that meant that you need a program stack to store this state, but in C++ this is just stored in a struct.

To illustrate all of this, lets build a coroutine in plain C++, without using the language coroutines:

Read the rest of this entry »

Using TLA+ in the Real World to Understand a Glibc Bug

TLA+ is a formal specification language that you can use to verify programs. It’s different from other formal verification systems in that it’s very pragmatic. Instead of writing proofs, it works using the simple method of running all possible executions of a program. You can write assertions and if they’re not true for any possible execution, it tells you the shortest path through your program that breaks your assertion.

In fact it’s so pragmatic that it even allows you to write your code in a language that looks similar to C.

I recently heard of a bug in the glibc condition variable implementation and since I had used TLA+ before to verify my own mutexes and condition variables, I thought I would investigate. Can you use it to find this bug in real-world complex code? Yes you can, barely, and it wasn’t easy, but it gives me hope that program verification is getting really good and is already able to deal with big and messy code:

Read the rest of this entry »

On Modern Hardware the Min-Max Heap beats a Binary Heap

The heap is a data structure that I use all the time and that others somehow use rarely. (I once had a coworker tell me that he knew some code was mine because it used a heap) Recently I was writing code that could really benefit from using a heap (as most code can) but I needed to be able to pop items from both ends. So I read up on double-ended priority queues and how to implement them. These are rare, but the most common implementation is the “Interval Heap” that can be explained quickly, has clean code and is only slightly slower than a binary heap. But there is an alternative called the “Min-Max Heap” that doesn’t have pretty code, but it has shorter dependency chains, which is important on modern hardware. As a result it often ends up faster than a binary heap, even though it allows you to pop from both ends. Which means there might be no reason to ever use a binary heap again.

Read the rest of this entry »

Partial Scaling – How to do Half a Multiplication

Programmers have a habit of over-generalizing things, and so it happened that I found myself writing more generalized versions of rotation, translation and scaling, the three most common operations that you’d want to do to 3D objects. These were more generalized in that they had a parameter for how much you want to do a certain operation. Like “translate by (10, 0, 0), but only apply the operation 20%”. This is easy to do for translation: Just multiply by the percentage and only translate by (2, 0, 0). Rotations are also easy in many representations: If the angles are explicit, like in Euler angles, you can just multiply those by the percentage; if you’re using quaternions, you can slerp.

But scaling is more complicated. Internally scaling is just multiplication, but how do you do half a multiplication? What does it mean to say “multiply by 4, but only apply the operation 50%”? My first approach was to multiply by the power, so you’d get “multiply by 4^{0.5}=2” (or 4^{0.1} if you only want to apply 10% of the operation) and that seems to work when you’re close to 1, but the further away you are from 1, the more wrong it gets. The answer ends up being to take the median of multiplication, division and exponentiation, but let me further explain the problem first:

Read the rest of this entry »

Why Video Game AI does not Use Machine Learning

I used to be an AI programmer working on video games, and I’m currently trying to learn machine learning. As part of this I find myself having to repeatedly explain why video games don’t use machine learning. People seem to find it interesting enough because it’s not just the obvious reasons (machine learning is hard and far from solved for game playing) but it’s also about developer control and about making an understandable game for the player. Video game AI is designed to deliver a certain experience, which is more difficult to do with machine learning. So this blog post lists the main reasons why video game AI does not use machine learning.

Read the rest of this entry »