4GB per Vector
by Malte Skarupke
I recently had a problem where I had a vector that was growing once, being iterated over once, and then deallocated. And it was bothering me how much time I spent on reallocating. The problem was that I also could not predict well how big the vector would be ahead of time. It could vary by orders of magnitude for input that looked similar at first.
As I was researching the minor page faults that I was hitting during the reallocations I came across this stack overflow question. There’s a person there who speaks of resizing an array by requesting a lot of memory from the OS, but then only actually committing the memory as he wants the array to grow.
Which is smart, and I wonder why nobody has taken it further. I think all vectors should operate like that all the time. Thus I wrote my own vector class which always requests 4GB of memory from the OS, and which then only actually commits that memory one page at a time. After all we have this enormous address space in 64 bit that we won’t be using for a long time.
I’ve called it a “settled vector” because it’s memory never moves around. (and because I’m replaying The Settlers 2 at the moment…) The source code is here. It is for Linux only, (you could port it to Windows using VirtualAlloc) 64 bit only (obviously) and it is slightly untested, but it works well enough that I am using it and that I can measure performance.
It’s fun: The program that I’m using this for right now has 844GB of memory reserved, while it is only actually using 72MB. If you’re wondering why my programs don’t blow up when I reserve several hundred gigabytes of memory on a machine that only has 8GB: I am just reserving the memory, not using it. And as long as I don’t use it, the operating system doesn’t actually map that memory to physical memory. The only thing I’m doing is telling the OS that I would like to have a 4GB range of continuous memory that I may or may not end up actually using.
For performance I measured my use case from above: A vector that does not reserve memory, grows once and is then discarded.
- When the vectors are less than 64kb in size, std::vector is faster. I assume because the settled vector talks to the OS more.
- Otherwise the settled vector can be up to 35% faster for vectors that are many megabytes in size.
- Iterators, pointers and references are never invalidated.
- Easier to determine out of range errors, use after free etc: Just lock the pages.
- It’s trivial to lock and unlock the memory at runtime for your own safety. (for example if you want to make sure that certain state is not modified while another thread reads from it)
- Easy to see where a pointer comes from: Every vector has it’s own 4GB range that will never change.
- Need to allocate at least one page (4kb)
- Bad performance for a small vector
- May blow up the operating system depending on how well it is expecting this (hasn’t happened to me yet)
That being said I haven’t quite understood the implications of this myself just yet. I think that other things will come out of this. For example I still have the whole capacity() reserve() etc concept. I probably should just get rid of that and have a zero overhead push_back() instead.
So I will continue to use std::vector for small allocations, but for anything that’s bigger than 128kb, I will from now on reserve 4GB of memory.
I think you may only get 256 of these 4GB allocations on some current chips. (Since “64-bit” is usually implemented as 40- or 48-bit.)
Are you decommitting memory as the vector shrinks?
No I’m not decommitting until the vector gets destroyed. But I was actually thinking about not even decommitting then, so that I could lock the pages and catch all access-after-free bugs immediately. I knew that there was some limit which was lower than 64 bit, but I didn’t know it was that low. After looking it up it looks like Linux puts the limit at 48 bits because that’s what current hardware handles. And I only get access to half of that, so I could have 32767 of my 4GB vectors around.
I could also choose a more reasonable number than 4GB. I just picked that because it amused me and if you’re assuming a 64 bit address space, it doesn’t really matter how much you reserve. If I used 256MB instead I would still never have to worry about re-allocating the vector in my specific use case.
I believe current AMD hardware is further limited to 40 bits.
Even if you keep the address space around and locked, you probably still want to madvise that you aren’t using the memory anymore.
Yes, madvise is exactly what I was looking for. How do you always know about stuff like this? I had been googling and reading man pages and it never came up.
And while 40 bits is a bit disappointing in that it means that you actually have to think about how much memory you are reserving, it’s still enough to allow you to be completely wasteful for quite a while.
The virtual address size for 64-bit 80×86 is 48-bit; of which an application can only use half. This gives you 47 bits of address space for all 64-bit 80×86 CPUs (any Intel, AMD and VIA).
Physical address space size is entirely different and should not be confused with virtual address space size. It can range from 32-bit (some Intel Atom CPUs) to an architectural limit of 52-bit; and most current CPUs are using 40-bit or 48-bit. The physical address space size is almost completely irrelevant – it mostly determines how much RAM the computer could have installed in theory (e.g. physical address space size – space for memory mapped PCI = theoretical RAM size limit; assuming the motherboard doesn’t have a lower limit) and doesn’t effect how much RAM is actually installed or how much “virtual RAM” an application can use (e.g. swap space).
Do you still have the source of that? I am interested in watching it, now it is unavailable.
I’ve uploaded the files to github. They’re now available here:
Thanks for letting me know that the links were dead.