Probably Dance

I can program and like games

How to Adjust Your Kids Sleep Schedule for Daylight Savings in 15 Minute Increments

Daylight savings just ended so this post is coming exactly too late. But I was talking to a friend about how we just adjust 15 minutes per day, which is quite easy. She was saying “oh I wish I had that kind of foresight.” But we were talking Friday evening, and daylight savings didn’t end until Sunday, so there was plenty of time for adjustment. So since people don’t seem to realize that, here is the very simple plan for ending daylight savings:

Autumn

Lets say your kids bedtime is at 8pm (I wish) then

  • On Friday go to bed 15 minutes late, 8:15pm
  • On Saturday go to bed 30 minutes late, 8:30pm
  • On Sunday go to bed 15 minutes early, 7:45pm
  • On Monday go to bed at your usual time, 8pm

And with that you’re done. If you realize too late and it’s Saturday already, you can just do 20 minute increments:

Read the rest of this entry »

Avalanche Studios NYC Retrospective – An Ambitious Company Ruined by Bad Development Practices

I’ve wanted to write about this since the studio closed a year ago. Now that Contraband is also canceled I think it’s time, especially since Contraband was one of the big reasons why I left the company. The blog post turned out much bigger than expected though. There was a lot to get off my chest…

I worked at Avalanche Studios NYC from July 2012 to December 31st 2019, seven and a half years. I wasn’t there for most of Contraband’s development but it was obvious early on that it was going to be a very difficult project. If anything I’m surprised it lasted that long before being canceled.

The studio was born out of ambition. It failed because it could not deliver on that ambition. So this will necessarily be negative. But we had a good run and made two good games. I have so many memories and thoughts that I need to get written down somewhere, so lets celebrate the good and talk about the troubles.

Read the rest of this entry »

Installing a Mini-Split AC in a Brooklyn Apartment

Last year, 2024, we replaced four PTACs with a mini-split AC. I’ve been asked about it often enough (by neighbors, coworkers, friends) that I decided to write up the experience. Hopefully it’s useful for you, too.

Overall this cost us about $40k, including the cost for closing up the PTAC holes. We’ll probably never make the money back on electricity cost savings, so the main benefits are that we have more quiet and more stable temperatures now and overall I’m glad that we did it.

(I’ll use the term “heat pump” and “AC” interchangeably. Every “AC” mentioned in here can do both heating and cooling)

Read the rest of this entry »

Revisiting Knuth’s “Premature Optimization” Paper

The most famous quote from Knuth’s paper “Structured Programming with go to Statements” is this:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

People always use this quote wrong, and to get a feeling for that we just have to look at the original paper, and the context in which it was written. The paper isn’t actually about optimization. It’s about when we have to use goto statements because structured programming couldn’t express certain things at the time. Or at least it couldn’t do so efficiently, requiring extra checks, and that’s why Knuth has to talk about performance: The topic he is addressing is “Can we get rid of goto statements without sacrificing performance?”

Read the rest of this entry »

I’m Open-Sourcing my Custom Benchmark GUI

I think one of the reasons why I was able to do good performance work over the years is that at some point I started taking benchmarking seriously enough to write my own library. I used to use Google Benchmark, which is a fine library, but at some point you realize that you need a GUI to really scale up benchmarks1. Here is a github link, and this video gives a quick intro:

The main problems it tries to address is:

  1. Getting good numbers by running benchmarks repeatedly, visualizing them in context, picking a single opinionated good visualization, handling noise and even adding a bit of well-justified noise, and being careful about what statistics to do on the numbers.
  2. Dealing with the inevitable combinatorial explosion of benchmarks when you want to try different data structures (min-max heap vs interval heap vs binary heap) with different operations (make_heap, push, pop) on different types (int vs string), different compilers, debug build vs release build, different variants of the code (e.g. trying loop unrolling), different input lengths etc. The full combinatorial explosion might be millions or billions of possible benchmarks. I want to be able to get a first impression for a subset in a few minutes. And then if I want less noisy results I can let it run overnight. And then I can try a new variation and visualize it together with the overnight results in under a minute.
  3. Various ergonomic issues. Making it easy to select which numbers are together on the screen. Having the numbers as a graph first, CSV second. Being robust to the code crashing halfway through a long run: Record the partial results and be able to resume the same run. Making it easy to attach a profiler to one specific benchmark that I’m interested in.

This sounds complicated, and I have to admit that this is very much an app written by a programmer for a programmer, but the whole point of a GUI is that I can make this both more powerful and easier to use at the same time. In fact I think the patterns might be more widely useful for people who do slow-running experiments of other kinds (like training a ML model).

Read the rest of this entry »

Why Does Integer Addition Approximate Float Multiplication?

Here is a rough approximation of float multiplication (source):

float rough_float_multiply(float a, float b) {
    constexpr uint32_t bias = 0x3f76d000;
    return bit_cast<float>(bit_cast<uint32_t>(a) + bit_cast<uint32_t>(b) - bias);
}

We’re casting the floats to ints, adding them, adjusting the exponent, and returning as float. If you think about it for a second you will realize that since the float contains the exponent, this won’t be too wrong: You can multiply two numbers by adding their exponents. So just with the exponent-addition you will be within a factor of 2 of the right result. But this will actually be much better and get within 7.5% of the right answer. Why?

Read the rest of this entry »

Initial CUDA Performance Surprises

I am somehow very late to learning CUDA. I didn’t even know until recently that CUDA is just C++ with a small amount of extra stuff. If I had known that there is so little friction to learning it, I would have checked it out much earlier. But if you come in with C++ habits, you’ll write suboptimal code, so here are some lessons I had to learn to get things to run fast.

Memory Coalescing

If you have multiple threads operating on an array in C++, you probably want to iterate like this:

std::vector<T> vec = ...;
size_t per_thread = vec.size() / num_threads;
T * my_slice = vec.data() + per_thread * my_thread_i;
for (size_t i = 0; i < per_thread; ++i) {
    do_something(my_slice[i]);
}

Meaning each thread iterates over a contiguous chunk of memory. In CUDA this is going to be slow because you want the threads to load memory together. So if thread 0 loads bytes 0 to 15, then you want thread 1 to load bytes 16 to 31 and thread 2 to load bytes 32 to 47 etc. So the loop instead has to look like this:

T * data = ...;
size_t num_elements = ...;
for (int i = my_thread_i; i < num_elements; i += num_threads) {
    do_something(data[i]);
}

This is called “memory coalescing” where adjacent threads use adjacent memory. On a loop with a small body (dot product) this is 3x faster.

Read the rest of this entry »

How I use LLMs to program

Studies have shown that LLMs help novice programmers more than experienced programmers. This matches my experience. At work I see that interns or new hires have some LLM window open almost all the time. I use them maybe once a week. But you could say the same thing about Stack Overflow. I used it all the time when I started programming. Now I use it occasionally. While it’s easy to point at their obvious issues, I think they are also clearly a net-positive on average. So how do LLMs help me?

Big plus: Languages that I don’t use as often

I don’t often write SQL statements. I can obviously write the simple ones, but SQL is a language that has all the features you could ever possibly want, and I don’t know how to use them and don’t know how to google for them. So I ask a LLM. Similarly for javascript/css/html programming. I used to hate doing web frontend work, now it’s not so bad because LLMs can help me get out of the tricky edge cases.

I have also used LLMs to translate functionality from one language to another. E.g. if I know what a function is called in C++ but I can’t find an equivalent one in the standard library of another language, an LLM will often do a decent first pass of rewriting the C++ function in the other language.

Small minus: The code is overly generic

Read the rest of this entry »

Transform Matrices are Great and You Should Understand Them

The title of this blog post is obvious for any game programmer, but I notice that people outside of games often write clumsy code because they don’t know how transform matrices work. Especially when people do some simple 2D rendering code, like if you just want to quickly visualize some data in a HTML canvas. People get tripped up on transform math a lot. As an example imagine drawing this simple graph:

It’s just an arbitrary graph with arbitrary numbers, the point is all the layout decisions that happened here: E.g. “Long First Label” extends to the left and pushes everything else over to the right by a bit. If you aren’t organized about how to express your transforms, your code ends up with lots of arbitrary offsets and multipliers. You can’t even calculate where to draw the labels on the y-axis without including an offset for potentially long labels on the x-axis. (and long labels on the y-axis can push over the x-axis, too. Uh-oh) Your first choice for visualization should probably be an existing tool (like I used to generate the graph above) but surprisingly often you’ll want to do something custom, and then you have to worry about transforms.

Game programmers have to do complicated transforms all the time, so they had to get organized about this and the result is the transform matrix. It’s remarkably simple and every programmer should probably know it, just to appreciate its beauty. There are two tricks to transform matrices:

Read the rest of this entry »

Two Kids Put Me on a Two Sleep Schedule

I’m writing this at 2am, having already slept from 9pm to 1am. I will sleep again from 4am to 7am. This is apparently called “biphasic sleep”, which I first heard about in the 2022 article The forgotten medieval habit of ‘two sleeps’. I had read that article with interest, because this has happened to me occasionally over the years: I’d fall asleep early, then wake up to spend a few hours restless before finally falling back asleep in the early morning hours. After reading that this used to be normal, I finally decided to just let it happen when it does, which is about two or three times a week.

Read the rest of this entry »