Probably Dance

June 16, 2018

Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)

I recently posted a blog post about a new hash table, and whenever I do something like that, I learn at least one new thing from my comments. In my last comment section Rich Geldreich talks about his hash table which uses “Fibonacci Hashing”, which I hadn’t heard of before. I have worked a lot on hash tables, so I thought I have at least heard of all the big important tricks and techniques, but I also know that there are so many small tweaks and improvements that you can’t possibly know them all. I thought this might be another neat small trick to add to the collection.

Turns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing. So let’s figure this out.

Read the rest of this entry »

103 Comments

May 28, 2018

A new fast hash table in response to Google’s new fast hash table

Hi, I wrote my new favorite hash table. This came about because last year I wrote the fastest hash table (I still make that claim) and this year one of the organizers of the C++Now conference asked me to give a talk. My problem was that Google had also announced a new fast hash table last year, and I wasn’t sure if mine would compare well against theirs.

The main benefit of Google’s hash table over mine was that Google’s has less memory overhead: It has a higher max_load_factor (meaning how full can the table get before it grows to a bigger array) and it has only 1 byte overhead per entry, where the overhead of my table depended on the alignment of your data. (if your data is 8 byte aligned, you’ll have 8 bytes overhead)

So I spent months working on that conference talk, trying to find something that would be a good response to Google’s hash table. Surprisingly enough I ended up with a chaining hash table that is almost as fast as my hash table from last year, while having even less memory overhead than Google’s hash table and which has this really nice property of having stable performance: Every hash table has some performance pitfalls, but this one has fewer than most and will cause problems less often than others will. So what that does is that it’s a hash table that’s really easy to recommend.

Read the rest of this entry »

38 Comments

April 28, 2018

A Summary of the Important Points in Capital in the Twenty-First Century

Capital in the Twenty First Century by Thomas Piketty was widely recognized as a very important book when it came out in 2013. Yet somehow now, in 2018, I rarely encounter people who have learned the lessons from the book. Of course I don’t expect most people to read the book, but since the lessons are so important for development of society, I would expect them to be spread by other means. In order to help that, I decided to write this blog post which summarizes the most important points. So here is the first point:

1. The more money you have, the more money you make

This seems to be a fundamental law of economics. It’s not something we have constructed. It’s even true in primitive societies: If there are two families, one family owns two cows, and one family owns ten cows, the family with ten cows doesn’t make five times as much money as the family with two cows, it makes more than that. That’s because it can more easily survive bad times (like if a cow gets sick) or it can invest in better tools to take care of cows, and those tools pay off more (like fences or a cow shed).

Read the rest of this entry »

6 Comments

January 25, 2018

Finding Floating Point Numbers for Exact Math

For the second time in my career, I ran into a problem where it’s actually useful to know how floating point numbers work. (first time was here) The problem is that sometimes you want floating point numbers that add well. So that you have the associativity guarantee that (a + b) + c == a + (b + c).

The use case here was for A* path finding, but this may also be applicable in other situations. I’ll explain a bit more of A* later, but as a motivation you just have to know that you can speed up A* a lot if you can reason about all the nodes in your graph that have the same “cost.” But since the “cost” is a sum of a lot of different floats, they will rarely have exactly the same value. At that point you could fudge numbers and say that “if two numbers are equal to within 0.0001, treat them as equal” but when you do that it’s easy to accidentally define an ordering that’s not a strict weak ordering. And I have literally seen a crash caused by an incorrect float ordering that was very rare and very hard to track down. So once bitten twice shy, I wanted to see if I couldn’t just force floating math to be exact. And turns out you often can.

Read the rest of this entry »

4 Comments