Ideas for a Programming Language Part 3: No Shadow Worlds

by Malte Skarupke

This post is inspired by this text by Gilad Bracha. I recommend that you read it before reading this blog post. The idea of Shadow Worlds is that people keep inventing these constructs that are powerful enough that you want to use them as programming languages, but they are not powerful enough to have the features of high level programming languages that make programming enjoyable.

Before reading that blog post I knew that I wanted a macro system for my language. Looking at my old notes I had written down that a macro system is necessary, and that templates are not enough. However I wanted a type safe macro system from modern languages like Nim or Rust. Gilad Bracha’s blog post made me realize that you actually never want macros because they are just another Shadow World. Instead you want to do everything in your high level language.

My syntax for this is inspired by D and Scala, where every function has two argument lists: One with compile time arguments, and one with runtime arguments. In D it looks like this: foo!(compile_time_args)(run_time_args) and in Scala it looks like this: foo[compile_time_args](run_time_args). I slightly prefer Scalas syntax, even though it does mean that operator[] is not available for map lookups or array indexing. Scala uses operator() for that instead which actually is kinda nice once you get used to it. In this post I will use Scala’s syntax. (I also tried just using the template syntax from C++, but since the template brackets <> are placed before the function declaration instead of where the normal arguments are, everything looks kinda weird)

That syntax was invented for templates, and it works great for that in those languages, but I think it should also be used for macros. A macros is simply a function that has no runtime arguments. A common example where you need to use macros is for asserts, because you can’t get the file and line number of your function arguments. But in theory that’s an easy problem:

#define ASSERT(expr) if (!(expr))\
{\
    assert_callback(#expr, __FILE__, __LINE__);\
}\
else\
    static_cast<void>(0)

becomes

void assert[expression[bool] to_check]()
{
    if (!to_check.execute())
    {
        assert_callback(to_check.text, to_check.location.filename, to_check.location.line_number);
    }
}

Which requires no special syntax. Expressions are templatized by the value that they produce, and they just have their textual representation and location as members.

Another common source of macros is when you just want to turn something into a string. Your codebase probably has at least two cases where this is done for types, and at least five cases where this is done for enums. It usually looks something like this:

#define REGISTER_CLASS(type) template<> const char * GetClassName<type> { return #type; }

Where in a non-shadow world language you could just write this:

string GetClassName[type t]()
{
    return t.name;
}

No reason to make it any more complicated than that. Types simply have names and you can get them.

A more complicated case that has to be a macro in C++ is when you want to add members to a struct. For example let’s say that you have operator<, and you want to automatically add operator>, operator<= and operator>=. You can do this using the CRTP, but let’s try to be fancy. Given this class:

class SharedString
{
    std::shared_ptr<const std::string> str;
public:
    // ...

    bool operator<(const SharedString & other) const
    {
        return *str < *other.str;
    }
    bool operator<(const std::string & other) const
    {
        return *str < other;
    }
    bool operator<(const char * other) const
    {
        return *str < other;
    }
};
bool operator<(const std::string & lhs, const SharedString & rhs)
{
    return lhs < rhs.get();
}
bool operatpr<(const char * lhs, const SharedString & rhs)
{
    return lhs < rhs.get();
}

I want to add a macro that adds the remaining operators for SharedString, std::string and const char *. Using macros or templates we have to execute the same macro three times or inherit three times using the CRTP, because we have to overload the operator for three types. We have to do this three times because both macros and templates are shadow worlds that don’t provide a looping construct or object oriented programming where you can ask questions about objects. You can kinda emulate loops, but once you’ve heard about shadow worlds you feel a bit stupid every time that you are trying to emulate a loop in templates when you have got a high level programming language called C++ available which has perfectly good support for loops. So let’s write this in a non-shadow world language:

bool generic_greater[type t, type o](const t & lhs, const o & rhs)
{
    return rhs < lhs;
}
bool generic_less_equal[type t, type o](const t & lhs, const o & rhs)
{
    return !(rhs < lhs);
}
bool generic_greater_equal[type t, type o](const t & lhs, const o & rhs)
{
    return !(lhs < rhs);
}
void complete_operators[type t]
{
    const set[function] & less_thans = t.methods("operator<");
    namespace ts_space = t.namespace
    for (const function & less_than : less_thans)
    {
        type other = less_than.arguments(1).type;
        t.methods("operator>").insert(&generic_greater[t, other]);
        t.methods("operator<=").insert(&generic_less_equal[t, other]);
        t.methods("operator>=").insert(&generic_greater_equal[t, other]);
        ts_space.functions("operator>").insert(&generic_greater[other, t]);
        ts_space.functions("operator<=").insert(&generic_less_equal[other, t]);
        ts_space.functions("operator>=").insert(&generic_greater_equal[other, t]);
    }
}
complete_operators[SharedString];

The type has a map “methods” of type map[string, set[function]] which I can use to iterate over all methods on a type. Each function in the set then can tell me about its arguments and the type of those arguments. Once I have all the types I can generate all the required functions using a pretty normal syntax. All of this happens at compile time, as if I had used a macro for doing it.

Using a macro here would have lead to code that’s harder to read and harder to debug, but it probably would also compile more slowly because you have to go through the whole string processing apparatus of the compiler. Templates have less downsides, except that they use a modified syntax (this is one reason why people use macros when they could use templates: no need to learn a special syntax) and looping is annoying. But also templates use duck typing, which makes error message worse. If you get errors in the above function, those errors will hopefully be easier to reason about than macro- or template-error messages would be because everything is statically typed, you get a normal call stack and state is represented in stack local variables that you can examine in the watch window of your debugger. (I expect that to happen often, where if you get rid of shadow worlds, your existing tools become more useful)

However if you have worked in languages that allow modifications to classes like the above, you’ll probably not be satisfied at this point. Modifying classes like this actually introduces a whole new set of problems, mainly related to maintenance. So I looked back at templates and realized that they have one really nice property: you write normal structs, and you just replace a bit in the middle. To write a template like std::vector I can mostly write a normal struct and then only the pointers to the storage get replaced at compile time. It’s a style of programming that doesn’t really have a name. Declarative Programming is the closest thing I can name, but it’s too vague. But basically it’s the opposite of pattern matching, so maybe we should call it pattern expansion.

And as you look at other shadow worlds, you find that this style is actually common: In printf you write a normal string and only one bit in the middle is replaced. Of the three shadow worlds in Gilad Brachas article, the HTML template system also has this property.

So I think a crucial step in getting rid of shadow worlds is supporting this style of programming in the language so that templates, macros, printf and regular expressions can all use the same language feature. And it would have to be a normal language feature that can be abstracted, extended and composed.

And I think we’re almost there, but I actually can’t figure out the last step. So the blog post will have to end with an unsatisfying “I’m still trying to figure it out.” My current idea is that I use the same syntax for pattern matching that I use for pattern expansion. The idea is that like in boost.serialization you use one syntax to do both. So if you can pattern-expand a template struct, you can use the same syntax to pattern-match it. Similarly if you can pattern-match a string, you can pattern-expand using the same syntax instead of using printf. To be able to use the same syntax for templates and for strings the idea is to not have quotation marks “” around strings. They are not necessary when you store the source code in a database instead of a text file. And then all you have to do is find a way to support normal loops, normal conditionals and normal abstractions like classes and functions. It doesn’t sound that difficult, but whenever I work on it I can’t quite fit the pieces together and it’s very easy to accidentally enter a shadow world. But I think the idea is solid and I’ll solve the issues once I start implementing this. But the accidental creation of shadow worlds is an interesting observation.

One example of this that I still hope to influence is Jonathan Blow’s new language in which he demonstrates many of the same features that I am thinking about. However it also shows how easily you can accidentally introduce shadow worlds. For example the feature of “notes” on members that he showed in a recent video is clearly a shadow world. And several people immediately complain about it in the Q&A after the talk. But people complain about it only seeing specific problems that you might run into using the feature, not seeing that the feature is fundamentally flawed because it works differently than everything else in the language: There will be a lot of code duplication by users of the language simply because they will have to invent similar code once for notes and once for normal parts of the language. Me personally I will try to solve the issues that notes are trying to address by having compile time properties on variables, which are normal properties that you could also have on a struct. But I will talk about that when I talk about dependent typing.

Another thing that Jonathan Blow demonstrated in an early video was a bit of special syntax for functions that are called at compile times, like #check_call or #load. If those were just normal functions, I can use all the higher order functions that are already available for me in normal functions. Have you seen how people reimplement std::transform, std::for_each or more complicated features for macros or templates? It’s not pretty. (often it’s not obvious because the reimplementations will not be put into a reusable, named piece of code. So you have to read the code to see that a pattern is reimplemented several times) The fact that functions like #check_call or #load can only be called at compile time shouldn’t change how you can manipulate them. I’m not sure if he still has that special syntax since it hasn’t shown up in recent videos, but I think that once you’re aware of shadow worlds you tend to notice faster when you’re inventing a special feature when an existing feature could be adapted, which allows you to benefit from all the supporting features for that existing feature.

This post is mostly about macros and templates because those are the biggest shadow worlds in C++. But once you recognize shadow worlds, you find them all over the place. I’m seriously considering to not have regular expressions and to have an object oriented way of building state machines instead. How hard can it be to come up with a better design than regular expressions, eh? Jonathan Blow also already realized that the build process is usually a shadow world. There are also debug visualizers and in serialization Protobuf is more of a shadow world than boost::serialization. String formatting is another shadow world that I started addressing with format_it, and we’re creating even more new ones all the time. Part 4 of this series, Dependent Typing, will also be challenging to write because I must not accidentally create a shadow world.

I think that shadow worlds get created accidentally. The problem is that they make the implementer’s life a lot easier because then she can ignore all the complexities of the language by for example not supporting all types. Also whenever you create something new there is a strong desire to keep it simple this time. But it doesn’t work. You’re programming, and you always end up emulating the features that your high level language would have given you if there hadn’t been a shadow world. And if you add up all the time that has been spent trying to emulate loops in template and macros by using recursion, or trying to print complex types using printf, the initial time savings for the original implementer have long been used up.

So I am committing to having no shadow worlds in my language. Everything will use the normal high level language features that the language itself provides. If you can’t do conditionals, loop over something or encapsulate something in a reusable object or algorithm, I will change the design until you can.