Learning D Part 3: Garbage Collection

by Malte Skarupke

Garbage collection is a red flag for any C++ programmer, simply because a garbage collector makes it more difficult to use the language for the kinds of things that C++ programmers like to program.

When I started programming in D, I was going to give the garbage collector the benefit of the doubt. The language has enough smart features that I figured that the designers must have had good reasons for including a GC.

But really, it’s a mess. In fact it’s a bigger mess than I have seen in other garbage collected languages. Fortunately that mess is solvable, and where it isn’t solvable the language designers could probably make it less of a problem.

The Problem with Garbage Collectors

OK so lets get the obvious problems out of the way: Garbage collection is unnecessary overhead, it stalls unpredictably, yada yada yada.

Those are the things people most often mention when you talk about garbage collectors, but when I had a problem where the garbage collector causes a program to slow down, it was usually trivial to fix. Just call the GC at a point when you are comfortable that it can take a bit of time. Instead the problem with the GC lies in what it does to the language.

1. Performance

If you profile in garbage collected languages, it looks like the garbage collector is a problem. But the problem really is that when people use garbage collected languages, they tend to generate garbage. When you search for performance advice in garbage collected languages, you will rarely find that people have problems with the GC being too slow or the GC stalling. Instead the problem is always too much garbage.

This may be a subtle distinction, but it is an important one. Another way of putting it is this: If you were to write the exact same program in C++ that you have written in Java, (and you’d manage all your memory perfectly, never forgetting to call delete) it would have the same performance problems, even though there is no GC around. The problem is that you’re calling new and delete too often. (of course you’d never write the exact same program in C++ that you’d write in Java or Python, simply because in C++ you’d instantly notice all the waste that is going on)

The GC of course causes some slow down on top of that, but that is less of a problem.

2. Memory Leaks

Memory leaks happen in garbage collected languages. The reason is that if you don’t have to manage your memory, you have to manage your references instead. Admittedly managing references tends to be an easier problem than managing memory, but it is still there. And hunting down the last reference to an object can be very tedious, especially if you have closure, local classes, signals/slots or other things that unexpectedly hold references.

The D Garbage Collector

All of that being said, let’s talk about D specifically. The designers of D are smart people, and being smart people they solve the biggest problem that garbage collectors tend to have: performance. You can write highly performant code in D despite it having a garbage collector. I actually think that the garbage collector would in theory not be an issue for performance in D. Well done.

Unfortunately they did worse on the Memory Leak front. D has so many ways of capturing references that it’ll be very frustrating to debug your memory problems. You’ll have to debug memory problems less often than in C++, but when you do it’ll be one of the bad kind of memory problems.

But I think even that is OK. You have to make trade-offs and I can see how you’d say that these are good trade-offs. My issue with the garbage collector in D is that it’s a mess:

1. It doesn’t call destructors of structs

unittest
{
    import core.memory;
    bool destroyed = false;
    class C
    {
        ~this() { destroyed = true; }
    }

    struct S
    {
        ~this() { destroyed = true; }
    }

    void allocate(T)()
    {
        new T;
    }

    assert(!destroyed);
    allocate!(C)();
    foreach (i; 0 .. 10) GC.collect();
    assert(destroyed); // succeeds

    destroyed = false;

    assert(!destroyed);
    allocate!(S)();
    foreach (i; 0 .. 10) GC.collect();
    assert(destroyed); // fails
}

This has been a known bug for more than three years. It’s only rated as “minor” importance, and until this gets fixed you pretty much need to memory manage all your structs manually. Which kind of defeats the point. (my theory for why this is a “minor” bug is that destructors not being called is less of a problem in garbage collected languages. Because any resource that isn’t being freed will eventually be collected anyways. It’d only be a problem if you had a non-trivial destructor, like the one above…)

2. You can not disable it

Every time that someone introduces the D language, you’ll hear them say that it has a garbage collector, but you don’t have to use it. That is mostly true, but unfortunately it is not true enough.

To start with, parts of the standard library and even parts of the core language generate garbage. For example: If you disable the GC, you can not use arrays. Which is kinda like telling someone to not use tables in lua. The built-in arrays are the only container type you’d ever want to use in the language. Yes, you can survive without them, and you can survive without the parts of the standard library that generate garbage, but really that’s unlikely to happen.

The next step is then to say “I’ll just manage all my memory manually, and ignore the GC for the most part.” And it is awesome that you can say that in D. Except you can’t really do that. In my last blog post I showed a short signal/slot implementation, which requires that you manage memory. It works as intended, but I would still never use it knowing that it leaks memory in a garbage collected language. In theory this isn’t a problem if you have good coding standards, but really it is just negligent to keep such an obvious point of failure around.

So you need to always think about the GC and sometimes work around the GC. For example the std.signals implementation uses realloc and free to get around the GC and then uses some compiler magic to get notified when objects get destroyed so that it can clean up. And that compiler magic only works with member functions of classes, otherwise you get undefined behavior. (segmentation faults) They are fighting with their own language in the standard library and disallowing many normal uses cases for signals/slots because you have to get around the GC sometimes.

So I wrote my own signal/slot implementation which uses malloc and free and is otherwise remarkably close to a signal/slot implementation I once wrote in C++. Except in D I felt like I was working against the language while writing it.

3. D is poorly written because of the GC

As if the previous example of std.signals wasn’t bad enough, there are problems in the core language. Here’s a guy talking about his experience building a game with the garbage collector and without the garbage collector. One thing he discovered is this:

Comparision of TypeInfo objects in druntime is done by building two strings and then comparing those two strings. This will always leak memory and do a lot of unneccesary allocations which are a performance bottleneck.

In the forum discussion about that blog post he gives an example of where that creates a problem:

class A {}
class B : A{}

A a = new A();
B b = new B();

if(a == b) //this will allocate
{
}

It generates garbage because for non-trivial comparisons you need to look at the types of the classes. This is so spectacularly bad, you could only write it in a garbage collected language. This was obviously fixed immediately, but the fact that it has gotten in in the first place shows that people stop caring about garbage in garbage collected languages. He also discovered that variadic functions always allocate their arugments on the heap (this seems to be fixed now, at least for trivial cases) and that all arrays are allocated on the heap:

int[5] a = [1, 2, 3, 4, 5]; // allocate on the heap, then memcpy to the stack
                            // and never release the heap memory

That is a 4 year old known bug. None of these things would have been acceptable without a GC. And I don’t mean because they would be memory leaks. I mean if you were to write the same code in C++ without a memory leak it would still be unacceptable. The disassembly for that fixed size array example is unbelievable, even in release.

4. You can not unit test it

Imagine writing a signal slot implementation that tries to not create memory leaks. You’d probably want to test that. Well you can’t. Getting that unit test example up there to work already requires two workaround: I allocate inside of a function and I call the garbage collector ten times. This doesn’t work:

unittest
{
    import core.memory;
    bool destroyed = false;
    class C
    {
        ~this() { destroyed = true; }
    }
    {
        C c = new C;
        assert(!destroyed);
    }
    foreach (i; 0 .. 10) GC.collect();
    assert(destroyed); // fails
}

Even the workaround of allocating within a function breaks once you introduce delegates:

void main()
{
    import core.memory;
    import std.stdio;
    bool destroyed = false;
    class C
    {
        ~this() { destroyed = true; }
        int i;
    }

    void allocate(T)()
    {
        auto t = new T;
        void delegate() print = { writeln("t.i ", t.i); };
        print();
    }
    assert(!destroyed);
    allocate!(C)();
    foreach (i; 0 .. 10) GC.collect();
    assert(destroyed); // fails (usually. it depends)
}

I could probably find another hack (maybe call a function inside of a function) to get that to work, but really I’m fighting against the language here. In the end I just had to say that there is no way to unit test whether my code keeps objects alive or not.

Conclusion

In D you have a garbage collector which doesn’t call all of your destructors, which you can’t really disable or ignore, and which can not be tested against. It is a complete mess. The documentation of D claims that “Garbage collected programs are often faster” and that garbage collected programs do not suffer from memory leaks. Both of which are mostly false. I have a very high opinion about the designers of the D language, but something about garbage collectors makes people’s IQ drop by 10 points when they get close to it. (I actually think that I am affected and that I must have missed something very obvious that would explain all of this)

If you’re writing something small or if you have strict coding standards, you can luckily avoid the garbage collector well enough that it probably doesn’t matter for you. Just write your code as if it was C++ code and call the GC every now and then to make sure that garbage doesn’t accumulate. But really I can’t help but think that D would be better off without a garbage collector. All the features that “require” that you keep the GC enabled can also be implemented without a GC. With the GC, I fear that D will never attract C++ programmers. Which is a shame, because D would otherwise be a great replacement for C++.

6 Comments to “Learning D Part 3: Garbage Collection”

John Calsbeek (@jcalsbeek) says:

October 14, 2012 at 08:31

So there are no allocated-on-the-stack arrays in D?

My definition for “Could this language replace every use of C++ in most arenas?” is “Could this language be used to write a OS kernel?”

Malte Skarupke says:

October 14, 2012 at 11:08

No you can allocate on the stack. That array example is just a bug.
```
{
    int[5] a = [1, 2, 3, 4, 5]; // generates garbage
}
{
    int[5] a;
    foreach (i; 0 .. 5)
    {
        a[i] = i + 1;
    } // same thing, but no garbage. everything is on the stack
}
```
And you can definitely write a OS kernel in D. D gives you the same control that C gives you. There is no C code that you can not write with the same performance in D.

At least in theory. In practice you’d better wait a couple more years before you attempt to write a kernel in D. At least until you can create arrays with the same performance characteristics as in C.

Malte Skarupke says:

October 14, 2012 at 13:28

I updated the post a tiny amount because it turns out the array initialization thing is a four year old known bug.

John Calsbeek (@jcalsbeek) says:

October 14, 2012 at 21:33

If D is suitable for kernel code, then not only could you run a D program without linking the standard library (which is where the collector presumably lives), but also never kicking off a GC thread or ever calling a GC entry point. Or for that matter allocating anything on the heap (since malloc is in the standard library).

Ignoring bugs, how is an array literal supposed to compile if it is, by “default,” heap-allocated?

Also, the D garbage collector is conservative, right? That’s probably what you’re experiencing—your object reference is still on the stack, even though it’s out of scope. Returning from the function moves the stack pointer up, which allows the collector to reclaim it.

Malte Skarupke says:

October 16, 2012 at 16:47

As I said, it’s gonna take them a couple more years before they get there. If they ever want to get there.
But the language in itself has nothing preventing it from being used for kernel code. You can turn off the GC and you don’t need to use the standard library. People have written programs that do that, except it’s currently more work than it should be.
You can also allocate everything on the stack, there’s just that bug with the array initialization.

I think that once D gets really good, it’ll be easier to write C code in D than in C, if you want to do that.

And for the conservative GC: Yes, I’m pretty sure it was that. That’s why I tried the workaround of going into a function to allocate. Not sure if they’ll ever fix that one.

Kumar Suhas says:

April 20, 2015 at 15:37

Nearly six years (!) later, the bug has been fixed in DMD 2.067: http://d.puremagic.com/issues/show_bug.cgi?id=2834

Probably Dance