Probably Dance

I can program and like games

Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)

I recently posted a blog post about a new hash table, and whenever I do something like that, I learn at least one new thing from my comments. In my last comment section Rich Geldreich talks about his hash table which uses “Fibonacci Hashing”, which I hadn’t heard of before. I have worked a lot on hash tables, so I thought I have at least heard of all the big important tricks and techniques, but I also know that there are so many small tweaks and improvements that you can’t possibly know them all. I thought this might be another neat small trick to add to the collection.

Turns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing. So let’s figure this out.

Read the rest of this entry »

A new fast hash table in response to Google’s new fast hash table

Hi, I wrote my new favorite hash table. This came about because last year I wrote the fastest hash table (I still make that claim) and this year one of the organizers of the C++Now conference asked me to give a talk. My problem was that Google had also announced a new fast hash table last year, and I wasn’t sure if mine would compare well against theirs.

The main benefit of Google’s hash table over mine was that Google’s has less memory overhead: It has a higher max_load_factor (meaning how full can the table get before it grows to a bigger array) and it has only 1 byte overhead per entry, where the overhead of my table depended on the alignment of your data. (if your data is 8 byte aligned, you’ll have 8 bytes overhead)

So I spent months working on that conference talk, trying to find something that would be a good response to Google’s hash table. Surprisingly enough I ended up with a chaining hash table that is almost as fast as my hash table from last year, while having even less memory overhead than Google’s hash table and which has this really nice property of having stable performance: Every hash table has some performance pitfalls, but this one has fewer than most and will cause problems less often than others will. So what that does is that it’s a hash table that’s really easy to recommend.

Read the rest of this entry »

A Summary of the Important Points in Capital in the Twenty-First Century

Capital in the Twenty First Century by Thomas Piketty was widely recognized as a very important book when it came out in 2013. Yet somehow now, in 2018, I rarely encounter people who have learned the lessons from the book. Of course I don’t expect most people to read the book, but since the lessons are so important for development of society, I would expect them to be spread by other means. In order to help that, I decided to write this blog post which summarizes the most important points. So here is the first point:

1. The more money you have, the more money you make

This seems to be a fundamental law of economics. It’s not something we have constructed. It’s even true in primitive societies: If there are two families, one family owns two cows, and one family owns ten cows, the family with ten cows doesn’t make five times as much money as the family with two cows, it makes more than that. That’s because it can more easily survive bad times (like if a cow gets sick) or it can invest in better tools to take care of cows, and those tools pay off more (like fences or a cow shed).

Read the rest of this entry »

Finding Floating Point Numbers for Exact Math

For the second time in my career, I ran into a problem where it’s actually useful to know how floating point numbers work. (first time was here) The problem is that sometimes you want floating point numbers that add well. So that you have the associativity guarantee that (a + b) + c == a + (b + c).

The use case here was for A* path finding, but this may also be applicable in other situations. I’ll explain a bit more of A* later, but as a motivation you just have to know that you can speed up A* a lot if you can reason about all the nodes in your graph that have the same “cost.” But since the “cost” is a sum of a lot of different floats, they will rarely have exactly the same value. At that point you could fudge numbers and say that “if two numbers are equal to within 0.0001, treat them as equal” but when you do that it’s easy to accidentally define an ordering that’s not a strict weak ordering. And I have literally seen a crash caused by an incorrect float ordering that was very rare and very hard to track down. So once bitten twice shy, I wanted to see if I couldn’t just force floating math to be exact. And turns out you often can.

Read the rest of this entry »

Where do top scientists come from? And what do taxes have to do with it?

I was reading this article recently, which talks about “Where star scientists choose to locate: the impact of US state taxes” It’s a summary of a paper about “the effect of state taxes on the geographical location of top earners.”

It’s a very interesting idea: The problem is that states often lower taxes with the hope of attracting business or talent, but there is very little evidence about whether that actually works. So the authors of that paper decided to find a group of influential people who are somewhat easy to track: people who apply for lots of patents, the so called “star scientists” from the title. So the authors built a huge database, tracking where the top 5% of scientists who applied for the most patents had moved to over the years.

And the authors claim that they found pretty clear evidence that people like to move from high-tax states to low-tax states, so the conclusion is that if you want to attract top scientists, you should lower taxes.

Except, I dug through the data and I found the opposite. Yes, top scientists do move to states that have lower taxes, but high tax states have such a large lead in the number of scientists, that that little bit of migration doesn’t matter. But we’ll have to get to that conclusion one step at a time.

Read the rest of this entry »

Games Are About Personal Development

Here’s an angle on the fundamental reason for why we play games: They are about personal development, learning about ourselves and about the world. This may not be a new angle, but I haven’t heard it stated this explicitly. Instead I have heard people say stupid things like “games teach hand-eye-coordination” which is true, but also bullshit because why would you spend this much time training your hand-eye-coordination? No I claim that games teach import life lessons, and that that is the fundamental reason why we play games.

I’m going to talk about video games, but this is also about games in general. Why do kids play with dolls? Because they want to learn about family life. (or about conflicts when playing with action figures) This is not explicit learning like we learn from a teacher, but you act out situations and adjust your behavior depending on how your play partner reacts. Why do we send our kids to football practice? Not because we think that they need to learn the valuable skill of kicking a ball into a net. No it’s because we want them to learn about working in teams and about pacing themselves and about playing fair and all that.

The things we learn are obvious in those scenarios. It’s well known that it’s important for kids to play in order to figure out how to act in the world in a safe environment. But I claim that the same thing is true for video games, and my example will be Super Mario World.

Read the rest of this entry »

Evidence For How To Make Great Games

Earlier this year I gave a talk about the Game Outcomes Project. I called that talk “Evidence For How To Make Great Games” because I think the Game Outcomes Project is the best data we have for what teams do that make great games. I wasn’t involved in the Game Outcomes Project, I just gave a talk about it because I really like it. Also I wanted to focus on different things than what they focused on in their own write-ups and talks.

People who saw the talk said that they really liked it, and they keep on telling me how much they liked it. So I decided to record the talk again and upload it.

The pitch for the talk is that the results of the Game Outcomes Project is the best evidence we have for what makes great game development teams and what makes bad game development teams. And I think that every game developer should know this stuff. So I talk about what you should focus on when making a game, and I give advice for how to get there. So the game outcomes project found “really successful teams do X” and I present that, and then also have a section at the end of the talk where I say “here is how you can actually get good at doing X.” Here is the talk:

Read the rest of this entry »

Collected Advice for Doing Scientific Research

I’ve been collecting advice from various sources about how to do research. I can’t find a good place that collects this, so let’s see if I can create one.

The idea is that this should be something similar as George Polya’s “How to Solve It” but for doing research instead of solving problems. There is a lot of overlap between those two ideas, so I will quote a lot from Polya, but I will also add ideas from other sources. I should say though that my sources are mostly from Computer Science, Math and Physics. So this list will be biased towards those fields.

My other background here is that I work in video game AI so I’ve read a lot of AI literature and have found parallels between solving AI problems and solving research problems. So I will try to generalize patterns that AI research has found about how to solve hard problems.

A lot of practical advice will be for getting you unstuck. But there will also be advice for the general approach to doing research.

Read the rest of this entry »

Less Weird Quaternions

I’ve always been frustrated by how mysterious quaternions are. They arise from weird equations that you just have to memorize, and are difficult to debug because as soon as you deviate too far from the identity quaternion, the numbers are really hard to interpret. Most people implement quaternions once and then treat them as a black box forever after. So I had put quaternions off as one of those weird complicated 4D mathematical constructs that mathematicians sometimes invent that magically works as long as I don’t mess with it.

That is until recently, when I came across the paper Imaginary Numbers are not Real – the Geometric Algebra of Spacetime which arrives at quaternions using only 3D math, using no imaginary numbers, and in a form that generalizes to 2D, 3D, 4D or any other number of dimensions. (and quaternions just happen to be a special case of 3D rotations)

In the last couple weeks I finally took the time to work through the math enough that I am convinced that this is a much better way to think of quaternions. So in this blog post I will explain…

  • … how quaternions are 3D constructs. The 4D interpretation just adds confusion
  • … how you don’t need imaginary numbers to arrive at quaternions. The term \sqrt{-1} will not come up (other than to point out the places where other people need it, and why we don’t need it)
  • … where the double cover of quaternions comes from, as well as how you can remove it if you want to (which makes quaternions a whole lot less weird)
  • … why you actually want to keep the double cover, because the double cover is what makes quaternion interpolation great

Unfortunately I will have to teach you a whole new algebra to get there: Geometric Algebra. I only know the basics though, so I’ll stick to those and keep it simple. You will see that the geometric algebra interpretation of quaternions is much simpler than the 4D interpretation, so I can promise you that it’s worth spending a little bit of time to learn the basics of Geometric Algebra to get to the good stuff.

Read the rest of this entry »

I Wrote The Fastest Hashtable

I had to get there eventually. I had a blog post called “I Wrote a Fast Hashtable” and another blog post called “I Wrote a Faster Hashtable.” Now I finally wrote the fastest hashtable. And by that I mean that I have the fastest lookups of any hashtable I could find, while my inserts and erases are also really fast. (but not the fastest)

The trick is to use Robin Hood hashing with an upper limit on the number of probes. If an element has to be more than X positions away from its ideal position, you grow the table and hope that with a bigger table every element can be close to where it wants to be. Turns out that this works really well. X can be relatively small which allows some nice optimizations for the inner loop of a hashtable lookup.

If you just want to try it, here is a download link. Or scroll down to the bottom of the blog post to the section “Source Code and Usage.” If you want more details read on.

Read the rest of this entry »