C++ | Probably Dance

August 28, 2019

A New Algorithm for Controlled Randomness

I don’t know if this problem has a proper name, but in game development you usually don’t want truly random numbers. I’ve seen the solution named a “pseudo random distribution” but “pseudo random” already has a different meaning outside of game design, so I will just call it controlled randomness for this blog post.

The description of the problem is this: An enemy is supposed to drop a crucial item 50% of the time. Players fight this enemy over and over again, and after fighting the enemy ten times the item still hasn’t dropped. They think the game is broken. Or an attack is supposed to succeed 90% of the time but misses three times in a row. Players will think the game is broken.

In this blog post I want to expand that problem to the situation where you not only have two choices (success or fail) but many choices. For example you want to create traffic on a road and spawn a bunch of random cars without having the same car too many times. The problem was already partially solved for the success/fail case, and in this blog post I will improve on that solution and present the solution for the case where there are many choices.

I will also allow you to control exactly how random or non-random you want the result to be. If you’re fine with a 90% success chance to fail three times in a row in certain situations, but want it to be more reliable in other situations, you will be able to tweak that with a number.

Read the rest of this entry »

4 Comments

May 28, 2018

A new fast hash table in response to Google’s new fast hash table

Hi, I wrote my new favorite hash table. This came about because last year I wrote the fastest hash table (I still make that claim) and this year one of the organizers of the C++Now conference asked me to give a talk. My problem was that Google had also announced a new fast hash table last year, and I wasn’t sure if mine would compare well against theirs.

The main benefit of Google’s hash table over mine was that Google’s has less memory overhead: It has a higher max_load_factor (meaning how full can the table get before it grows to a bigger array) and it has only 1 byte overhead per entry, where the overhead of my table depended on the alignment of your data. (if your data is 8 byte aligned, you’ll have 8 bytes overhead)

So I spent months working on that conference talk, trying to find something that would be a good response to Google’s hash table. Surprisingly enough I ended up with a chaining hash table that is almost as fast as my hash table from last year, while having even less memory overhead than Google’s hash table and which has this really nice property of having stable performance: Every hash table has some performance pitfalls, but this one has fewer than most and will cause problems less often than others will. So what that does is that it’s a hash table that’s really easy to recommend.

Read the rest of this entry »

38 Comments

February 26, 2017

I Wrote The Fastest Hashtable

I had to get there eventually. I had a blog post called “I Wrote a Fast Hashtable” and another blog post called “I Wrote a Faster Hashtable.” Now I finally wrote the fastest hashtable. And by that I mean that I have the fastest lookups of any hashtable I could find, while my inserts and erases are also really fast. (but not the fastest)

The trick is to use Robin Hood hashing with an upper limit on the number of probes. If an element has to be more than X positions away from its ideal position, you grow the table and hope that with a bigger table every element can be close to where it wants to be. Turns out that this works really well. X can be relatively small which allows some nice optimizations for the inner loop of a hashtable lookup.

If you just want to try it, here is a download link. Or scroll down to the bottom of the blog post to the section “Source Code and Usage.” If you want more details read on.

Read the rest of this entry »

88 Comments

January 17, 2017

Faster Sorting Algorithm Part 2

This is a follow up to my previous blog post about writing a faster sorting algorithm. I’m using this as a chance to go into detail on topics that I was asked about in the comments: I’ll clear up some misunderstandings and go into future work that needs to happen with this algorithm.

Somebody was nice enough to link my blog post on Hacker News and Reddit. While I didn’t do that, I still read most of the comments on those website. For some reasons the comments I got on my website were much better than the comments on either of those websites. But there seem to be some common misunderstandings underlying the bad comments, so I’ll try to clear them up.

Read the rest of this entry »

2 Comments

December 27, 2016

I Wrote a Faster Sorting Algorithm

These days it’s a pretty bold claim if you say that you invented a sorting algorithm that’s 30% faster than state of the art. Unfortunately I have to make a far bolder claim: I wrote a sorting algorithm that’s twice as fast as std::sort for many inputs. And except when I specifically construct cases that hit my worst case, it is never slower than std::sort. (and even when I hit those worst cases, I detect them and automatically fall back to std::sort)

Why is that an unfortunate claim? Because I’ll probably have a hard time convincing you that I did speed up sorting by a factor of two. But this should turn out to be quite a lengthy blog post, and all the code is open source for you to try out on whatever your domain is. So I might either convince you with lots of arguments and measurements, or you can just try the algorithm yourself.

Following up from my last blog post, this is of course a version of radix sort. Meaning its complexity is lower than O(n log n). I made two contributions:

I optimized the inner loop of in-place radix sort. I started off with the Wikipedia implementation of American Flag Sort and made some non-obvious improvements. This makes radix sort much faster than std::sort, even for a relatively small collections. (starting at 128 elements)
I generalized in-place radix sort to work on arbitrary sized ints, floats, tuples, structs, vectors, arrays, strings etc. I can sort anything that is reachable with random access operators like operator[] or std::get. If you have custom structs, you just have to provide a function that can extract the key that you want to sort on. This is a trivial function which is less complicated than the comparison operator that you would have to write for std::sort.

If you just want to try the algorithm, jump ahead to the section “Source Code and Usage.”

Read the rest of this entry »

73 Comments

December 2, 2016

Investigating Radix Sort

I recently learned how radix sort works, and in hindsight it’s weird that I never really learned about it before, and that it doesn’t seem to be widely used. In this blog post I claim that std::sort should use radix sort for large arrays, and I will provide a simple implementation that does that.

But first an explanation of what radix sort is: Radix sort is a O(n) sorting algorithm working on integer keys. I’ll explain below how it works, but the claim that there’s an O(n) searching algorithm was surprising to me the first time that I heard it. I always thought there were proofs that sorting had to be O(n log n). Turns out sorting has to be O(n log n) if you use the comparison operator to sort. Radix sort does not use the comparison operator, and because of that it can be faster.

The other reason why I never looked into radix sort is that it only works on integer keys. Which is a huge limitation. Or so I thought. Turns out all this means is that your struct has to be able to provide something that acts somewhat like an integer. Radix sort can be extended to floats, pairs, tuples and std::array. So if your struct can provide for example a std::pair<bool, float> and use that as a sort key, you can sort it using radix sort.

Read the rest of this entry »

10 Comments

June 3, 2016

C++11 Completed RAII, Making Composition Easier

The addition of move semantics in C++11 is not just a performance and safety improvement. It’s also the feature that completed RAII. And as of C++11 I believe that RAII is absolutely necessary to make object composition easy in the language.

To illustrate let’s look at how objects were composed before C++11, what problems we ran into, and how everything just works automatically since C++11. Let’s build an example of three objects:

struct Expensive
{
    std::vector<float> vec;
};
struct Group
{
    Group();
    Group(const Group &);
    Group & operator=(const Group &);
    ~Group();
    int i;
    float f;
    std::vector<Expensive *> e;
};
struct World
{
    World();
    World(const World &);
    World & operator=(const World &);
    ~World();
    std::vector<Group *> c;
};

Before C++11 composition looked something like this. It was OK to have a vector of floats, but you’d never have a vector of more expensive objects because any time that that vector re-allocates, you’d have a very expensive operation on your hand. So instead you’d write a vector of pointers. Let’s implement all those functions:

Read the rest of this entry »

1 Comment

November 9, 2014

plalloc: A simple stateful allocator for node based containers

Allocators in C++ are awkward classes. Allocators are usually zero size to make the containers small, and they have no state because that used to be difficult to do in C++. But nowadays we don’t really care about the size of our containers (we care about the size of the content of the containers) and since C++11 we have support for stateful allocators.

The problem I’m trying to solve is that node based containers have bad cache behavior because they allocate all over the place. So I wrote a small allocator which gives out memory from a contiguous block. It speeds up std::map and std::unordered_map. It’s called plalloc because I think this is a pool allocator.

Code is below:

Read the rest of this entry »

4 Comments

October 12, 2014

Comma Operator RAII Abuse

Here’s a neat little trick that one of my co-workers, Clint Levijoki, discovered. In C++ you often use an RAII wrapper that you place on the stack if you want to be sure that code gets run at a later point. One good example would be std::lock_guard which you use if you want to be sure that a lock gets released in the future, or scoped profiling which you use to stop a timer in the future. For example for this:

std::string foo();
void bar()
{
    std::string baz = foo();
    // do something with baz
}

If you want to profile foo() you’d write it like this:

std::string foo();
void bar()
{
    std::string baz;
    {
        ScopedProfiler profile_foo("foo()");
        baz = foo();
    }
    // do something with baz
}

Which is less pretty and slightly slower. Alternatively you can use the comma operator and do it like this:

std::string foo();
void bar()
{
    std::string baz = (ScopedProfiler("foo()"), foo());
    // do something with baz
}

And this will start a timer before calling foo(), and stop the timer after calling foo(). You could wrap it in a macro to make it more readable. And the benefit is obviously that you don’t have to destroy your function flow when you want to insert RAII objects.

Read the rest of this entry »

2 Comments

October 4, 2014

Another Opinion on “Almost Always Auto”

Herb Sutter has been promoting his almost always auto style again, and I think it is harmful. I would agree with “almost always auto” in Scala. I disagree with it in C++. And that’s because there is a slight difference in syntax for type inference between the two languages.

Here’s type deduction in Scala: