Ideas for a Programming Language Part 5: More Flexible Types

by Malte Skarupke

Type systems tend to be too rigid. And dynamic typing just throws the baby out with the bathwater. So here are a few things I want:

1. I want to add information to an object at compile time without having to give it a new type. Is an integer in seconds or kilometers or kilometers per hour? Is a string SQL escaped or URL escaped or just plain text? Is a pointer known to not be zero? I also want to write logic based on this. If I multiply seconds with kilometers per hour, I want a result that makes sense. Most importantly, I want to do this easily: There is nothing stopping me from having a “url_escaped_string” type in C++, but then that string doesn’t work with any of my existing functions. I can add an implicit conversion operator, but then I lose my compile time information as soon as I use a function that accepts only one type of string. Or everything has to be templates and everything has to be duck-typing.

So what I envision is that every object has runtime properties and compile time properties. So a string might have a runtime property called size(), and it might have a compile time property called language() where I can set the language to be “SQL” or “URL” or similar and could then make sure at compile time that everything is escaped correctly. An int would not have any runtime properties, but it would still have compile time properties like its unit of measurement. We’ll start with that and then we’ll see what other features we’d want from that. (like functions being able to write conditions for their input parameters where they check the compile time properties of those objects and similar)

2. I want multi-stage initialization. Sometimes you need to create a bunch of empty shells and then initialize them all at once later. Or sometimes you need to initialize in three, four or more stages. In C++ I tend to avoid doing this because it can lead to really confusing code and is considered an anti-pattern. It shouldn’t be an anti-pattern. There should be a way to do it hygienically. I want to say that an object is only half initialized, and that you’re not allowed to call certain functions on it yet. I want to also be able to say that a function will finish initializing it and I am allowed to call all functions on the object after that function has finished. I think this will replace constructors. Constructors need special rules, special syntax even in some languages, and have to initialize every member, even if they just put an empty default value in there so that the destructor will not be confused. Multi-stage initialization seems a better approach. Then everything is just normal functions, and the object changes types without changing data.

3. I want to have interfaces with optional functions and optional data. Where an object can implement an interface and choose to not implement certain functions. If it doesn’t implement a function, I get a compiler error if I try to call that functions on that object through the interface. This is inspired by the talk “Generic Programming Must Go” by Andrei Alexandrescu where he has a great example for this in trying to come up with a generic allocator interface. In generic programming we often check for capabilities of objects. In Alexandrescu’s example he checks for example if an allocator is able to expand a previous allocation or if you have to re-allocate. The syntax for that is always ugly. It’s better in D than in C++, which is why Alexandrescu was happy with a solution that involves checking capabilities, but I think interfaces with optional functions on them could work better. Mainly to get guaranteed consistency and documentation of what’s available.

4. I want to separate data layout from struct description. For example: If I have the same data once in SOA form and once in AOS form, I want to be able to call the same functions on it without code duplication. This is inspired by Jonathan Blow’s language, but I think his solution is a bit too specific. In his solution if I change the object, it has to change everywhere. I think a better separation might be that interfaces are the default way of programming, and I can bind data to those interfaces. For example I can easily imagine having five different implementations of storing strings. From small string optimization to copy-on-write ref-counted to immutable ref-counted to substring pointers into a immutable buffer (when parsing) to compile time constant strings. I can do all of that in C++, and people have done all of that, but you tend to quickly end up with second class objects. Where if you want the full functionality you have to use std::string, or a string_view type, and if your object is neither of those, you’ll get code duplication. And string_view is a great step forward because it’s so cheap to convert to, but I think a generic interface with optional members (see above) might be better, because if your type can provide more functionality, you don’t want to degrade to string_view. Or if my string is just a { uint32 offset, uint32 size } in a mmapped file, I would like to keep it like that and not have to create a string_view. The annoying part isn’t that I have to create a string_view, but that whoever had the pointer to that offset and size now also has to be changed. In the end you end up mirroring the entire structure of the file in memory. (or you have to do in-place pointer patching, which means you have to add enough padding to fit pointers)

5. I want to use runtime information to constrain what functions people can call at compile time. One example from the above that would benefit from this is the multi-stage initialization. Let’s say I have a container that has objects in it at multiple stages of initialization. Some are uninitialized, some are fully initialized. And now I want to iterate over that container. I also want compile time guarantees that I am not accidentally doing something to an uninitialized object that I am only allowed to do to a fully initialized object. The container might know that it contains both types of objects, or it might not keep that information around, in any case we have to check at runtime which state the current object is in. So I need to make the initialization stage available at runtime. There are several ways of doing that and I don’t think that the language should prescribe one, so let’s just pick one and say that for this case I decide to put a runtime enum onto the object that mirrors the compile time property. If we have that, we might as well get rid of the compile time property, because we don’t want to have to keep two pieces of data in sync. Now the code always has to check at runtime how initialized my object is. And I want the same guarantees from that as if I had done the check at compile time. Meaning depending on what the result of the runtime check is, I am allowed to call different functions on the object at compile time. The only thing the compiler has to do is keep track of what properties the current scope has checked about the object. If the current scope is one in which the initialization state returned “fully initialized,” then the compiler will allow me to call more functions on the object. If the current scope is one where I have never checked, or the check returned false, the compiler will not allow me to call all functions on the object.

I think some of these will give a great deal of flexibility, and will reduce some of the pains of static typing. I also think some of these can happen transparently. Remember that I plan to store the source code in a database, not in a text file. So I don’t have to always have all information on the screen. It is totally possible for me to create an interface where you right click on an integer to get to its properties to change its unit of measurement. Where that may be a bad example, because units of measurements might be a property that you want to see all the time, (it may be more important than the actual type of the object. Often you care more that something is in meters or kilometers than you care if it’s an int or a float) but I expect that people will want to add lots of compile time properties to things. Like “what are the min- and max values of this property in our editor?” It may turn out to be a giant mess, but we’ll try and if it becomes a mess we’ll see what steps have to be taken. And since the source code is stored in a database, the complexity will be easier to deal with, because it will for example be trivial for you to find all the places where a property is assigned to, and then you can ask what unit tests cover that assignment code, put a breakpoint on the line and run the unit test to see what happens.

One concern that I can imagine people have might be that I’m going to make type systems even more complicated. But I think type systems, for example in C++, are too simple right now. That may sound crazy, “C++ has a type system that’s too simple,” but I think it’s true. Look at a few headers of your favourite boost library. The code will look nearly insane and will be very difficult to understand and debug, but what it’s actually doing is usually pretty simple. It does things like “can I write this object using iostreams? If yes, compile this code, otherwise compile that code” and it’s just that in the C++ type system, checking things like that is incredibly complicated. So what people are doing is simple, but the way they are doing it is very complicated. That suggests to me that the tool they are using is not appropriate for the job. In fact the C++ people know this: They are currently adding concepts, which will make the type system even more complicated, but I think that adding concepts will make template code so much simpler to write that a lot more people will do it. (and if you are someone who tries to stay away from template code, I would be willing to bet that you have literally hundreds of workarounds to get around cases where the type system is insufficient. It’s just that those workarounds are difficult to spot, because they may not look like workarounds. But believe me, your life could be so much easier)

So sometimes you have to add complexity to make things simpler. It’s like with storing the source code in a database. That makes big parts of the language a lot more complex, but it will make writing tools much simpler. And I will be able to add information that’s not expressed in the text. So you add complexity over here to gain simplicity over there, and you just have to keep balancing your global complexity budget as well as your various local complexity budgets and make sure that you’re not going too far to one side or the other. So I’ll make types more complicated 🙂