SSE2 in Visual Studio

by Malte Skarupke

So for some reason I thought I should be using SIMD in my student projects because it’s something that I should learn about, and I might as well do that when I’m not bothering other people with it.

It turns out that Visual Studio will hate you for it. It will hate you for using objects that have a specific alignment in memory.

The reason I’m writing this post is because I just had to write my most ingenious (not) workaround yet for getting a SSE2 class to work with Visual Studio:

class Vector4
{
    union {
        __m128 xyzw;
        struct { float x, y, z, w; };
    };
    /* ... */
#ifdef DEBUG
// for some reason visual studio sometimes copies a Vector4 to a non-aligned position
// in debug mode. which is illegal and crashes the game. it never uses that Vector4
// though, so just changing the copy constructor to not use SIMD is enough
inline Vector4(const Vector4 & other) : x(other.x), y(other.y), z(other.z), w(other.w) {}
#else
inline Vector4(const Vector4 & other) : xyzw(other.xyzw) {}
#endif

That is the code for the copy constructor of my Vector4 class. The comment explains what’s going on: Using SIMD instructions to copy a Vector4 sometimes results in segmentation faults in debug mode.

I came across this problem when using a library called luabind. Actually I came across multiple problems with the SIMD object, as you always will in Visual Studio. I had to rewrite some of luabind’s template metaprogramming because Visual Studio claims that you can’t do certain things with aligned objects. (the libraries code was probably legal. I think Visual Studio was wrong) And when everything finally compiled the game would crash whenever I called a function in lua that returns a Vector4.

The crash was deep inside a templated mess of code within luabind and there was no way that I could rewrite that piece of library code without spending days on the problem. Looking at the disassembly it was clear that the problem was a temporary which only existed in debug mode, and which was not used anywhere I could see. Compiling the game in Release mode confirmed my suspicion: The temporary was never created and the code ran just fine no matter how often I tried. (with segmentation faults in aligned objects you always have to run your program multiple times, because sometimes the game will run because the memory just happened to be aligned by chance)

It took me more than a day of having this problem at the back of my mind and sleeping about the problem before I finally had the answer when waking up this morning: I have to let Visual Studio create this illegal object. If the object is actually being used, the game would crash on the first SIMD instruction that follows the creation, so I would know instantly if doing this illegal stuff is wrong.

Well doing illegal stuff is right in this instance. I am creating an __m128 that is not aligned. Crash bug fixed.

I am still using SIMD instructions for every other operation with that Vector4. So I will still know if I’m doing something illegal in the future. It’s just that the game will no longer crash on creation of an illegal Vector4, but on usage of an illegal Vector4.

My recommendation: Wait for the next version of Visual Studio before you use SIMD. Or use D3DX Math which does not align it’s Vector4s and is very slightly slower than would be possible if the objects were aligned. But avoid using any aligned objects in Visual Studio 2010. You will save yourself from many more problems than I could describe here.

 

update (10/14/2011): I solved the problem. It turns out that it was just invalid code deep inside luabind. It should never have compiled. I guess Visual Studio was confused by the templates within templates that are ca. ten layers deep within templates. The issue was that a function was getting a Vector4 by value, which Visual Studio doesn’t allow because it doesn’t align function parameters. I spent a couple hours on this problem, learned some boost::mpl template magic, and inserted a special case for objects that need alignment of greater than 8 bytes. The code is too much of a mess to show here, so contact me if you want it. I’m also considering making it pretty to submit it to luabind.

If you’re interested: The reason why I was fixing this is that the bug was making waves. Shortly after Vector4, I encountered the same problem with Quaternion. So I had to make the Quaternion copy constructor slow as well. And that was even more of a problem because the Quaternion copy constructor renormalizes on each copy. Now the renormalization also had to be slow and could not use SIMD. And now two of my copy constructors were so slow that I had to start passing object by reference. But then you have to have two getters for everything (one const, one non-const) and luabind and my serialization system are confused by that. So I just had a whole bunch of work ahead of me. In the end fixing the problem wasn’t as big of a deal as I thought it would be. One evening of concentrated work, and it was fixed.