Less Weird Quaternions

by Malte Skarupke

I’ve always been frustrated by how mysterious quaternions are. They arise from weird equations that you just have to memorize, and are difficult to debug because as soon as you deviate too far from the identity quaternion, the numbers are really hard to interpret. Most people implement quaternions once and then treat them as a black box forever after. So I had put quaternions off as one of those weird complicated 4D mathematical constructs that mathematicians sometimes invent that magically works as long as I don’t mess with it.

That is until recently, when I came across the paper Imaginary Numbers are not Real – the Geometric Algebra of Spacetime which arrives at quaternions using only 3D math, using no imaginary numbers, and in a form that generalizes to 2D, 3D, 4D or any other number of dimensions. (and quaternions just happen to be a special case of 3D rotations)

In the last couple weeks I finally took the time to work through the math enough that I am convinced that this is a much better way to think of quaternions. So in this blog post I will explain…

  • … how quaternions are 3D constructs. The 4D interpretation just adds confusion
  • … how you don’t need imaginary numbers to arrive at quaternions. The term \sqrt{-1} will not come up (other than to point out the places where other people need it, and why we don’t need it)
  • … where the double cover of quaternions comes from, as well as how you can remove it if you want to (which makes quaternions a whole lot less weird)
  • … why you actually want to keep the double cover, because the double cover is what makes quaternion interpolation great

Unfortunately I will have to teach you a whole new algebra to get there: Geometric Algebra. I only know the basics though, so I’ll stick to those and keep it simple. You will see that the geometric algebra interpretation of quaternions is much simpler than the 4D interpretation, so I can promise you that it’s worth spending a little bit of time to learn the basics of Geometric Algebra to get to the good stuff.

Geometric Algebra

OK so what is this Geometric Algebra? It’s an alternative to linear algebra. Instead of matrices, there are multiple kinds of vectors, and there is a more powerful vector multiplication.

Let’s start with vector multiplication. In linear algebra we know two ways to multiply vectors: The dot product (producing a scalar) and the cross product (producing a vector). Where the dot product works for any number of dimensions, and the cross product only works in 3D. Geometric algebra also uses the dot product, but it adds a new product, the wedge product: x\wedge y. The result of the wedge product is not a vector or a scalar, but a plane. Specifically it’s the plane spanned by the two vectors. This plane is called a bivector because it’s the result of the wedge product of two vectors. There is also a trivector x \wedge y \wedge z which describes a volume. The general principle is that the wedge product increases the dimension of the vectors by one. Vectors (lines) turn into bivectors (planes), and bivectors turn into trivectors (volumes). When we do math in more than 3 dimensions, we can go even higher, but I’ll stick to 2D and 3D for this blog post.

Before I tell you how to actually evaluate the wedge product, I first have to tell you the properties that it has:

  1. It’s anti-commutative: a \wedge b = -b \wedge a
  2. The wedge product of a vector with itself is 0: a \wedge a = 0

The first property will make sense when we talk about rotations. The second product should already make sense if we just think of a bivector as a plane. There is no plane between a vector and itself, so it’s 0.

The other thing I have to explain is how vector multiplication works: In geometric algebra, the vector product is defined as the dot product plus the wedge product:

a*b = a \cdot b + a \wedge b

The result of the dot product a \cdot b is a scalar, and the result of the wedge product a \wedge b is a bivector. So how do we add a scalar to a bivector? We don’t, we just leave them as is. It works the same way as when adding polynomials 2x + 4x^2 or when adding apples and oranges 3 apples + 4 oranges or when working with complex numbers: 5 - 2i. We just leave both terms.

Note that usually I will leave out the star and just write a*b = ab.

In 3D space we have three basis vectors:

x = (1, 0, 0)

y = (0, 1, 0)

z = (0, 0, 1)

When multiplying these with each other we notice three properties of this new way of multiplying:

xx = x\cdot x + x \wedge x = 1 + 0 = 1

xy = x\cdot y + x \wedge y = 0 + x \wedge y = x \wedge y

yx = y \cdot x + y \wedge x =  0 + y \wedge x = y \wedge x = -x \wedge y = -xy

So when multiplying the basis vectors with each other, either the dot product or the wedge product is zero. We are left only with one of the two.

All other vectors can be expressed using the basis vectors. So the vector (10, 5, 0) can also be written as 10x + 5y and I will use the second notation more often, because it makes multiplication easier.

With that out of the way, we can finally give one real example of how vector multiplication works in geometric algebra. It’s actually pretty simple because we just multiply every component with every other component:

(10x + 5y) * (3x + y) = (10x * 3x + 10x * y + 5y * 3x + 5y*y)

= (30x^2 + 10xy + 15yx + 5y^2)

= (30 + 10xy - 15xy + 5)

= (35 - 5xy)

Let’s walk through a few of the steps I did there:

  • 10x*3x = 30x^2 = 30 because xx = x \cdot x + x \wedge x = 1 + 0 = 1.
  • 10x * y = 10xy because xy = x \cdot y + x \wedge y = 0 + x \wedge y, so the scalar part is zero, and can write the wedge product of basis-vectors shorter as x \wedge y = xy. This short-hand notation is only valid for vectors which are orthogonal to each other.
  • 5y*3x = 15yx = -15xy because yx = y \cdot x + y\wedge x = 0 + y \wedge x = -x \wedge y = -xy

So as promised the result of multiplying two vectors is a scalar (35) and a bivector (-5xy). A sum of different components like this is called a multivector.

When doing these multiplications you quickly notice that just as all vectors can be represented as combinations of x, y and z, all bivectors can be represented as combinations of xy, yz and zx. So I’ll just use these as my basis-bivectors. We could make different choices here, for example we could use xz instead of zx but I like how the bivectors circle around like that. The choice of bivectors doesn’t really matter, just as the choice of basis-vectors doesn’t really matter. We could for example have also chosen x, y and -z as our basis vectors. All the math works out the same, we just get different signs in a few places.

Once we have three basis-vectors and three basis-bivectors, we notice that we can represent all 3D multivectors as combinations of 8 numbers: 1 scalar, 3 vector-coefficients, 3 bivector-coefficients and 1 trivector-coefficient. If we did the same exercise in a different number of dimensions, we would find similar sets of numbers. In 2D space for example we have 1 scalar, 2 vector-coefficients and 1 bivector-coefficient. That makes sense, because in 2D there are only 2 directions, only 1 plane and no trivector because there is no volume. If we went to 4D we would have 1 scalar, 4 vector-coefficients, 6 bivector-coefficients, 4 trivector-coefficients and 1 quadvector-coefficient. I’m sure you can spot the pattern that would allow you to go to any number of dimensions. (but really these come out naturally depending on how many orthogonal basis-vectors you start with)

We’re almost finished with our introduction to geometric algebra, so I need to mention one final important property: vector multiplication is associative. Meaning (a*b)*c = a*(b*c) so we can choose which multiplication we want to do first.

OK with that we’re finished with the introduction, but I want to practice a few more multiplications so that you get the hang of it. Maybe do a few yourself. It takes a couple minutes, but then you have the rules ingrained into muscle memory. This practice section is optional though.

Vector Multiplication Practice

Let’s do some practice runs to build up an intuition for how these vectors and bivectors behave. You can skip this section entirely if you don’t care about geometric algebra and just want to get to rotations.

What happens if we multiply two similar bivectors?

2xy * 4xy = 8xyxy = 8x(yx)y = -8x(xy)y = -8(xx)yy = -8yy = -8

So what I did there is I used yx = -xy to re-order the basis-elements. Then everything collapses down because xx = yy = 1. So what we see here is that the dot product of a bivector is a negative number. Isn’t that interesting? In particular if we have a bivector of length 1 and multiply it with itself: 1xy*1xy = xyxy = -xxyy = -1 we see that xy^2 = -1. Remember how in quaternions there are these three components i, j and k which have i^2 = j^2 = k^2 = -1? We’re going to be using the bivectors for that. However it just so happens that the bivector is a mathematical construct whose square is -1. That does not mean that it is the result of \sqrt{-1}. I could build any number of mathematical constructs that square to -1, (for example trivectors also square to minus one) that doesn’t mean that they are all the square root of -1. How many square roots is -1 supposed to have?

Speaking of squaring a trivector, let’s try that to get practice at re-ordering these components:

xyz*xyz = xyzxyz = -xyxzyz = xxyzyz = yzyz = -yyzz = -zz = -1

Getting the hang of it yet? It’s all about re-ordering components until things collapse.

Let’s try multiplying two different bivectors:

xy * zx = xyzx = -xyxz = xxyz = yz

The result of two bivectors is another bivector. If we have more complicated bivectors that are made up of multiple basis-bivectors, the result is a scalar plus a bivector:

(2xy - 2yz) * (5yz + 0.5zx) = 2xy*5yz + 2xy * 0.5zx - 2yz * 5yz - 2yz * 0.5zx

= 10xyyz + xyzx - 10yzyz - yzzx

= 10xz + yz + 10 - yx

= 10 + xy + yz - 10zx

So this is a scalar (10) plus quite a complicated bivector (xy+yz -10zx).

What happens if we multiply across dimension. Like multiplying a vector with a bivector?

xy * 2x = 2xyx = -2xxy = -2y

If we multiply the plane with a vector that’s on the plane, we get another vector on the plane. In fact if we do this a few more times:

xy * -2y = -2xyy = -2x

xy * -2x = -2xyx = 2xxy = 2y

xy * 2y = 2xyy = 2x

We notice that after four multiplications we are back at the original vector 2x. So every multiplication with a bivector rotates by 90 degrees. If we multiply on the left side instead of multiplying on the right side, we would rotate in the other direction.

What if we multiply the plane with a vector that’s orthogonal to it?

xy * z = xyz

Well that’s disappointing, we just get the trivector. What if we multiply the trivector with the plane?

xyz * xy = xyzxy = -xyxzy = xxyzy = yzy = -yyz = -z

If we multiply the trivector with the plane, the plane collapses and we’re left with just the vector that’s normal to the plane. This works even for more complicated bivectors:

xyz * (0.707xy + 0.707zx) = 0.707xyzxy + 0.707xyzzx = -0.707z - 0.707y

Which is the normal of the original plane. What if we multiply a vector with the trivector?

xyz * x = xyzx = yz

If we multiply a vector with the trivector, the vector part collapses out and we’re left with the plane that the vector is normal to. This works even for more complicated vectors:

xyz * (-0.707y - 0.707z) = -0.707xyzy - 0.707xyzz = -0.707zx - 0.707xy

And with that we’re back at the original plane. Almost. The sign got flipped. If we had multiplied by -xyz we would have been back at the original plane.

So multiplying with the trivector turns planes into normals and normals into planes, because the other dimensions collapse out. This also allows us to define the cross product in geometric algebra: a\times b = -xyz*a\wedge b. So first we build a plane by doing the wedge product, then we get the normal by multiplying with the trivector.

Reflections

If you went through the practice chapter you will have already seen places where geometric algebra does rotations: bivectors rotate vectors on their plane by 90 degrees. It’s not quite clear how we can build arbitrary rotations with that though.

One thing that’s a little bit easier to do is reflections, and we will see that we can get from reflections to rotations.

Let’s say we want to reflect the vector a in the picture below on the normalized vector r, to get the resulting vector b:

To do that it’s useful to break the vector a into two parts: The part that’s parallel to r, a_\| and the part that’s perpendicular to r, a_\perp:

(forgive my crappy graphing skills)

These have a few properties:

a = a_\| + a_\perp

a_\| *r = r*a_\| (the result is a scalar and we can flip the order)

a_\perp* r = -r *a_\perp (the result is a bivector and flipping the order flips the sign)

From the picture it should be clear that if we subtract a_\| instead of adding it, we should get to b. Or in other words:

b = a - 2a_\|

= a_\| + a_\perp - 2a_\|

= a_\perp - a_\|

So how do we get these a_\| and a_\perp vectors? You may already know how to do it, but we actually never need to explicitly calculate them. Because we can actually represent this reflection as

b = -rar

How do we get to that magical formula? Let’s multiply it out:

-rar = -r(a_\perp + a_\|)r

= -r(a_\perp r + a_\|r)

= -ra_\perp r - ra_\|r

= r^2a_\perp - r^2a_\|

= a_\perp - a_\|

The important step is that a_\perp r = -ra_\perp, allowing us to re-order the elements until we’re left with r^2 = r\cdot r which is just 1, as long as r is normalized.

Rotations

The reflections above look kinda like rotations. In fact if all we want to do is rotate a single vector, we can always do that with a reflection. The problem is if we want to rotate multiple vectors, like in a 3d model, then the rotated model would be a mirror version of the original model.

The solution to that is to do a second reflection. There are many possible pairs of reflections that we could choose, but here is an easy one. First we reflect on the half-way vector between a and b, r=\frac{a+b}{|a+b|} (where writing pipes around a vector like |v| is the length of the vector, so \frac{v}{|v|} is a normalized vector):

rotate_half

So in this picture I am reflecting a on the vector r, which is half-way between a and b, landing us at -b. To get from -b to b we just have to do a second reflection with the vector b itself.  (which is a bit weird, but if you follow the equations it works out) Given that -rar is one reflection, brarb is two reflections. First we reflect on r, then we reflect on b.

Earlier we chose r = \frac{a+b}{|a+b|}. We can multiply this out and define

R = b\frac{a+b}{|a+b|}

= \frac{ba + 1}{|a+b|}

Then the rotation is written as b=Ra\overline R (where you could work out \overline R by multiplying out the other side, or you can just flip the sign on the bivector parts of R), and the inverse is written as a=\overline RbR.

Quaternions

And just like that we have quaternions. How? Where? I hear you asking. That R part in the last equation is a quaternion. If you multiply it all out, you will find that all the vector parts and trivector parts collapse to 0, and you’re just left with the scalar part and the bivector coefficients. And it just so happens that if you have a multivector which consists of only a scalar and the bivectors, multiplication behaves exactly like multiplication of quaternions.

Now isn’t that interesting? All we did was we did the math for reflections, and if we do two of those we get quaternions? No imaginary numbers, no fourth dimension, just 3d vector math. All we had to do was introduce that wedge product a \wedge b.

And you’ll notice that the way we apply R, by doing Ra\overline R looks an awful lot like how we multiply quaternions with vectors. To multiply a quaternion q with a vector a we do q*(0, a)*\overline q.

OK so let’s convince ourselves that these really are quaternions and work out the quaternion equations. They are i^2=j^2=k^2=ijk=-1. Our quaternion consists of a scalar and three bivectors, yz, zx, and xy. (I use them in this order because the yz plane rotates around the x axis, so it should come first). So let’s try this:

yz^2 = yzyz = -yyzz = -1

zx^2 = zxzx = -zzxx = -1

xy^2 = xyxy = -xxyy = -1.

Seems to work so far. But I actually don’t fulfill the equation ijk = -1 because for me yz*zx*xy = yzzxxy = yy = 1. I could fix that by choosing a different set of basis-bivectors. For example if I chose yz, xz and xy, then this would work out because yz*xz*xy = yzxzxy = -yzzxxy = -yy = -1. But I kinda like my choice of basis vectors and all the rotations work out the same way. If this bothers you, just choose different basis bivectors.

One super cool thing is that when doing the derivations using reflections, I never had to specify the number of dimensions. We could use 3D vectors or 2D vectors or any number of dimensions. So if we work out the math in 2D, what do you think we get? That’s right, we get complex numbers: One scalar and one bivector. Because that’s how you do rotations in 2D. But we could go to any number of dimensions using this method. (except in 1D this kinda collapses, because you can’t really rotate things in 1D)

Also we didn’t specify what we are rotating. We assumed that it was a vector, but we never required that. So this can rotate bivectors and it can rotate other quaternions.

Interpreting Geometric Algebra Quaternions

So we found a new way to derive quaternions. This new way is neat because we don’t need 4 dimensions and we don’t need imaginary numbers. But can we learn anything new from this? Already we have two possible new interpretations:

  1. A quaternion is the result of two reflections
  2. A quaternion is a scalar plus three bivectors

Maybe one of these has some interesting conclusions.

Before that I want to kill the 4D interpretation properly: There are two reasons why people say quaternions are 4D: The fact that quaternions have four numbers, and the fact that quaternions have double cover. I’ll talk about the double cover separately later, but here I briefly want to talk about the four numbers thing. There are lots of 3D constructs that have more than three numbers. For example a plane equation has four numbers: ax+by+cz+d = 0. Or if we want to do rotations using matrices in 3D, we need a 3×3 matrix. That’s 9 numbers. But nobody would ever suggest that we should think of a rotation matrix as a 9 dimensional hyper-cube with rounded edges of radius 3. So don’t think of quaternions as a 4 dimensional hypersphere of radius 1. Yes, there are some useful conclusions to draw from that interpretation (for example it explains why we have to use slerp instead of lerp) but it’s such a weird interpretation that it should come up very rarely.

With that out of the way let’s get to these two new interpretations:

1. Interpreting quaternions as two reflections. I couldn’t get much useful out of this. The first reflection is always on the vector half-way between the start of the rotation and the end of the rotation. The second reflection is always on the end of the rotation. I’ve played around with visualizing that, but the visualizations always looked predictable and didn’t offer any insights.

2. Interpreting quaternions as a scalar plus three bivectors. This interpretation on the other hand turned out to be a goldmine. Not only can you get an intuitive feeling for how this behaves, you can also get visualizations from this. This interpretation also allowed me to get rid of the double cover of quaternions.

So even though we have derived quaternions using reflections above, I will actually spend the rest of the blog post talking about quaternions as scalars and bivectors.

Scalars and Bivectors

A quaternion is made up of a scalar and three bivectors. We all know what a scalar does: Multiplying with a scalar makes a vector longer or shorter. I said above that multiplying with a bivector rotates a vector by 90 degrees on the plane of the bivector.

So how can we build up all possible rotations if all we have is a scalar and three rotations of exactly 90 degrees? The answer is that a bivector actually does slightly more: It rotates by 90 degrees, and then scales the vector.

I said that a bivector is a plane. But because of its rotating behavior, I actually like to visualize it as a curved line. So I visualize a vector as a straight line, and a bivector as a 90 degree curve. So here is a visualization of three different bivectors:

These are the bivectors 0.5xy (bottom), xy (middle) and 2xy (top). It’s a 90 degree rotation followed by a scale. I find this visualization particularly useful when chaining a bunch of operations together.

For example let’s say we want to rotate by 45 degrees on the xy plane. To do that we can multiply a vector with the quaternion 0.707 + 0.707xy. (that 0.707 is actually \frac{1}{\sqrt{2}}, but I’ll truncate it to 0.707 here) Now let’s multiply the vector 3x with that quaternion. That gives us

3x * (0.707 + 0.707xy) = 2.121x + 2.121y

Here’s how I would visualize that:

First we rotate by the bivector to get 2.121y:

So the bivector is a rotation by 90 degrees followed by a scale of 0.707.

Next we multiply the original vector with the scalar to get the vector 2.121x, which we add to the previous result:

Which then gives us the final vector of 2.121x + 2.121y:

Which is the original vector rotated by 45 degrees.

This way of visualizing makes it very clear that multiplication with a quaternion is just multiplication with a scalar and multiplication with a bivector. And this also shows how we got a 45 degree rotation, even though all we can do is 90 degree rotations followed by scaling. It also explains why we need the single scalar value, and why the three bivectors are not enough: We sometimes want to add some of the original vector back in to get the desired rotation.

One thing to note is that in here I chose to do the bivector multiplication first, and the scalar multiplication second. But the choice is kinda arbitrary as both of these happen at the same time, and they don’t depend on each other.

Let’s rotate that same vector again to show what this looks like when we didn’t start off with one of our basis vectors:

(2.121x + 2.121y) * (0.707 + 0.707xy) = 2.121x*0.707 + 2.121x * 0.707xy + 2.121y * 0.707 + 2.121y * 0.707xy

= 1.5x + 1.5xxy + 1.5y + 1.5yxy

= 1.5x + 1.5y + 1.5y - 1.5x

= 3y

So let’s visualize that:

First we rotate with the bivector, which puts us at -1.5x + 1.5y:

So once again this does a 90 degree rotation followed by a scale of 0.707.

Next we multiply the original vector by 0.707 and add the resulting vector 1.5x + 1.5y:

Which then gives us the final vector of 3y:

Which is exactly what we would expect after rotating by 45 degrees twice.

I think these visualizations also explain how we can get arbitrary rotations: For bigger rotations we just have to make the scalar component smaller as the bivector component gets bigger.

So far we have only looked at the xy plane. To visualize this in 3D, I wrote a small program in Unity that can do the above visualization for all three bivectors. Here is what that looks like for rotating from the vector 0.707y + 0.707z to the vector 0.707x + 0.707y. That gives me the particularly nice quaternion 0.5 - 0.5yz + 0.5zx - 0.5xy.

This is going to be hard to do in pictures because it’s a 3D construct, but I’ll give it a shot. Here is what the two vectors look like:

unity_vectors2.PNG

So I want to rotate from the vector on the left to the vector on the right.

Here is what the contribution of the -0.5xy bivector looks like:

unity_vectors2_xy

So this bivector is rotating on the xy plane. It takes the end point of the vector and rotates it 90 degrees down on the xy plane. It may be a bit hard to see, but imagine all the yellow lines lying on a xy plane.

The result of that 90 degree rotation is the vector 0.353x. (the lower edge of the plane) I used the end of that rotation to start our result vector. (see how I have a third short vector sticking out at the bottom now? That’s 0.353x)

Next I’m doing the contribution of the -0.5yz bivector:

unity_vectors2_xy_yz.PNG

The original vector was already rotated 45 degrees on the yz plane, so this rotation started off at a 45 degree angle and it rotated 90 degrees on the yz plane. Then it scaled the result by 0.5, giving us the result vector 0.353y - 0.353z. (the bottom of the teal plane)

I also added the result of that rotation to the result vector. (the shorter vector that was sticking out now has a corner in it, indicating that I added the new 0.353y - 0.353z)

Next we add the contribution of the 0.5 zx bivector:

unity_vectors2_xy_yz_zx.PNG

This took the end point of the original vector, and rotated it by 90 degrees on the zx plane. Then it scaled the result by 0.5, giving us the new vector 0.353x (the end of the purple plane). The reason why the purple plane is floating above the other planes is an artifact of my visualization: I start at the end point and then I only move on the zx plane, so I end up floating above everything else. I also added this to our result vector at the bottom there.

Finally I’m going to add the 0.5 scalar component into this:

unity_vectors2_finished.PNG

This just took the original vector and scaled it by 0.5, giving us 0.353y + 0.353z. I then added that to the results of the three bivector rotations. And as we can see, if we add up the contributions of the three bivectors and of the scalar part, we end up exactly at the end point of the vector that we were rotating into. (it may look like the last part is longer than 0.5 times the original vector, but that’s a trick of the perspective. The reason I picked this perspective is that you can see all three rotations from this angle)

So the rotation happened by doing three bivector multiplications and one scalar multiplication and adding all the results up.

Once again I want to point out that the order in which I added these up is arbitrary. All of these multiplications happen at the same time and don’t depend on each other, since they all just use the original vector as input. I chose to do this in the order xy, yz, zx, scalar, because that gave me a nice visualization.

I wanted to make the above visualization available for you to play with. I thought I could be really cool and upload a webgl version so that you can just play with it in your browser. So I built a webgl version, but then I found out that I can’t upload that to my wordpress account. So… I just put it in a zip file which you have to download and then open locally… Here it is.

There is an alternate visualization for the above rotation: Just as we would think of the vector 10x + 5y as a single vector, we can also think of the bivector -0.5yz + 0.5zx - 0.5xy as a single bivector. It’s the plane with the normal -0.5x + 0.5y - 0.5z, which is the plane spanned between the start vector and the end vector of the rotation. Then the visualization shows a 90 degree rotation on that plane, followed by a scaling of the length of this bivector. (which is \sqrt{0.5^2 + 0.5^2 + 0.5^2} = 0.866) That visualization looks like this:

So we rotate on this shared plane, then scale by 0.866, and finally add the original vector scaled by 0.5. This visualization as a single 90 degree rotation by the sum-bivector is equally valid as the visualization of the component bivectors. Just as we can visualize vectors either by their components, or as one line, we can visualize bivectors either by their components or as a single plane.

That finishes the part about visualization. As far as I know this is the first quaternion visualization that doesn’t try to visualize them as 4D constructs, and I think that really helps. Every component now has a distinct meaning and a picture. And we can see how the behavior of the whole quaternion is a sum of the behavior of its components.

Axis Angle

One quick aside I want to make is that sometimes people say that quaternions are related to the axis/angle representation of rotations. That is a good way to get people started with quaternions, but then it breaks down relatively quickly because the equations don’t make sense and the numbers behave weirdly. The scalar & bivector interpretation is actually related to the axis/angle interpretation, and it explains what’s really going on here. Because when I say that something rotates 90 degrees on a plane, we can also say that it rotates 90 degrees around the normal of the plane. So in this interpretation quaternions first: rotate 90 degrees around the normal, followed by being scaled down, and second: multiply the original vector times a scalar and add that. It’s not quite axis/angle, but we can see how it’s related and why the axis/angle interpretation sometimes seems to work.

With the scalar & bivector interpretation of quaternions, we have a good idea of what quaternions do. With that, we’re ready to tackle the final quaternion mystery:

Quaternion Double Cover

When I was working on this, a few friends asked me how the “scalar and bivector” explanation explains the double cover of quaternions. If you’re not familiar, the double cover means that for any desired rotation, there are actually two quaternions that represent that rotation. For example the quaternions that have 1 or -1 in the scalar part, and 0 for all the bivectors both represent a rotation by 0 degrees. (or by 360 degrees depending on how you look at it)

At first I responded that I hadn’t gotten to that part yet, but as I was working on this, the double cover just never came up. So eventually I decided to go looking for it, and… I couldn’t find it. It seemed like my quaternions didn’t have double cover. So I double checked everything and noticed that I have one difference: Remember how in order to multiply a quaternion R with a vector v we did this multiplication: Rv\overline R. I accidentally didn’t do that. I just did Rv.

And the simple multiplication actually works as long as you’re only rotating vectors on a plane that they actually lie on. For example rotating the vector x on the xy plane works out: xy*x = -y. The problems start if we’re rotating a vector that doesn’t completely lie on the plane that you’re rotating on. So let’s say I’m rotating the vector 2x + 2z on the xy plane:

xy(2x + 2z) = 2xyx + 2xyz

= -2y + 2xyz

That’s strange: Some of our vector part has disappeared, and instead we have a trivector. This is not good. You don’t want part of the vector to disappear after a rotation. Rotating with Rv\overline R fixes the problem, because the trivector part cancels out:

xy*(2x + 2z)*-xy = (2xyx + 2xyz)*-xy

= -2xyxxy - 2xyzxy

= -2x + 2z

So now the part that’s on the plane (the x component) got rotated, but the part that’s not on the plane (the z component) was left unchanged. This is exactly what we want.

But look at what happened: The first rotation was a 90 degree rotation and the part that’s on the plane ended up at -2y. And now we did a full 180 degree rotation and that part ended up at -2x. How did that happen?

Well it actually makes sense. We are multiplying with the quaternion twice after all. Of course it would do a double rotation. It’s clearest if you multiply it all out, but the short explanation is that the conjugate allows us to rotate roughly in the same direction while multiplying from the other side: Ra \approx a\overline R, and we went ahead and just multiplied on both sides Ra\overline R. So if we multiply on both sides of course we get twice the rotation.

This is literally where the half-angles of quaternions and the double cover come from: From the way we multiply quaternions with vectors. Internally quaternions actually don’t have double cover. If you multiply one 90 degree quaternion with a different quaternion, then after four rotations that second quaternion will end up exactly where it started. But then we chose a vector multiplication function that applies the quaternion twice. So we have to change the interpretation and that 90 degree quaternion becomes a 180 degree quaternion. And actually my visualizations above don’t make sense any more because the vector multiplication always does that operation twice.

Killing Double Cover

So if the vector multiplication is the problem, could we define a vector multiplication that doesn’t lead to double cover? That would make quaternions much simpler.

And the answer is that yes, we can. Remember that rotating vectors that lie on the plane already worked correctly. The problem was that rotating an orthogonal vector would turn into a trivector. (but rotations should leave orthogonal vectors unchanged) The solution is that we have to first project the vector down onto the plane, then rotate within the plane, and then apply the original offset again. Here is an outline of the algorithm:

  1. Compute the normal of the plane by multiplying with the trivector (very fast)
  2. Project the vector onto that normal (fast, as long as you use the version without a square root)
  3. Subtract that projected part (very fast)
  4. Multiply the vector with the quaternion
  5. Add the projected part (very fast)

So now we only have to do a single multiplication instead of two multiplications. And since all other operations are fast, this might even be faster than the double-cover-giving quaternion/vector multiplication.

And yes, this totally works and it’s faster and it’s less confusing. But you don’t want to use it. The reason is that as soon as I didn’t have double cover in my quaternions, I discovered why double cover is actually awesome.

Why We Need Double Cover

Double cover is what makes quaternion interpolation so great. (by interpolation I mean getting from rotation a to rotation b in multiple small steps as opposed to one large step) Without double cover, there are some quaternions that you can not interpolate between. Having to worry about those special cases makes interpolation a giant pain and defeats the whole point of why we used quaternions to begin with.

To explain what the problem is, let’s do a couple 90 degree rotations on the xy plane, once using double cover and once not using double cover:

Rotation Single Cover Double Cover
0^\circ 1 + 0xy 1 + 0xy
90^\circ 0 + xy 0.707 + 0.707xy
180^\circ -1 + 0xy 0 + xy
270^\circ 0 - xy -0.707 + 0.707xy
360^\circ 1 + 0xy -1 + 0xy

If we interpreted these two numbers as vectors, the double cover version would do a 45 degree rotations of the vector each time. But since the double cover quaternion will rotate twice, this will actually give us a 90 degree rotation from one row to the next.

Here is a visualization of the same numbers. The idea here is that I put the scalar value on the x axis and the xy bivector on the y axis:

90_degree_rotations.png

I drew the double cover as two lines, and the single cover as one line. Once again we see that a quaternion that uses double cover rotation is simply half-way towards the quaternion that uses single cover rotation.

I said that double cover is what makes quaternion interpolation so great. To see why, let’s try interpolating between these. To keep it simple I won’t do a slerp, but I’ll just try to find the rotation half-way between any of these rotations. We do that by adding the quaternions and then renormalizing them. Interpolating from the 0^\circ rotation to the 90^\circ rotation is pretty easy in both cases:

For single cover: (1 + 0xy) + (0 + xy) = 1 + xy and after normalization that comes out to be 0.707 + 0.707xy which is a 45 degree rotation.

For double cover: (1 + 0xy) + (0.707 + 0.707xy) = 1.707 + 0.707xy and after normalization that comes out to be 0.924 + 0.383xy, which is a 22.5 degree rotation, or with the double cover it’s a 45 degree rotation.

So interpolating a 90 degree rotation works just fine in both cases.

However we run into problems when interpolating from the 0^\circ rotation to the 180^\circ rotation:

For single cover: (1 + 0xy) + (-1 + 0xy) = 0. Huh. We can’t find the half-way rotation between these two because we just get 0, which we can’t normalize. You may think that this is just a problem because I chose to find the exact midpoint between these two vectors. But this is also a problem if we want to slerp from one to the other. It all collapses and we’re left with a zero vector.

So let’s reason through this manually. How would we interpolate from +1 to -1? We could rotate on the xy plane or on the yz plane or on the zx plane, or on any combined bivector. How do we know which bivector to choose? They’re all zero in both of our inputs. We’re missing information. In order to interpolate between two rotations, we need to know a plane on which we want to interpolate.

Let’s see how the double cover solves this: (1 + 0xy) + (0 + xy) = 1 + xy and after normalization we’re left with 0.707 + 0.707xy which was our 90 degree rotation, which is exactly the half-way point between the 0 degree rotation and the 180 degree rotation.

Isn’t that neat? In the double cover version one of our quaternions had a xy component, so we could interpolate on that plane. In fact you could build many possible 180 degree rotations in the double cover version. We could build a 180 degree rotation that rotates on the yz plane or on a linear combination of the xy and zx planes, or on any arbitrary plane. They all look different and they all interpolate differently. That’s a great property because we want to be able to interpolate on any plane of our choosing. In the single cover version however we only have one way to rotate 180 degrees and it looks the same no matter which plane you’re on. Which works fine if all you want to do is rotate 180 degrees, but it doesn’t work if you want to interpolate from one rotation to the other.

One way of thinking of this is that the trick of double cover is that you can express any rotation as a rotation of less than 90 degrees. We already saw that if we want to go 180 degrees, we just go 90 degrees twice. Want to go 270 degrees? Just go -45 degrees twice. Like that we can always stay far away from the problem point of the 180 degree rotation that we would run into often if we used the single cover version of quaternions. And like that we always keep the information of which plane we are rotating on, making interpolation easy.

Another way of thinking of this is that the double cover version always gives us a midpoint of the rotation which we can use to interpolate. For some pairs of rotations, there are a lot of possible midpoints depending on which plane we want to interpolate on. Double cover solves that problem by giving us one midpoint, which narrows our choices down to one plane. And we can derive any other desired interpolation if we have the midpoint.

You may be wondering if there is a problem point where the double cover breaks down. Looking at the table above, we can find one: Rotating by 360 degrees: (1 + 0xy) + (-1 + 0xy) = 0. Which we can not renormalize. But that case is easy to handle, and in fact every slerp implementation already handles this: We detect if the dot product of the quaternions is negative, and if it is we flip the target quaternion. So then we interpolate from (1 + 0xy) to (1 + 0xy) which is just a 0 degree rotation. Which is exactly what we wanted. So as long as we handle the “negative dot product” case in our interpolation function, we can handle all possible rotations. Because there are two possible ways to express every rotation, and if we run into one that’s inconvenient, we just switch to the other one.

So I hope I have convinced you that you want to have double cover. It’s a neat trick that makes interpolation easy. Quaternions do not “naturally” have double cover, but the double cover comes from the way we define the vector multiplication. If we used a different algorithm to multiply a quaternion with a vector (I outlined one above) then we could get rid of the double cover, but we would be making interpolation more difficult. I actually think that the double cover trick is not unique to quaternions. I think we could also apply it to rotation matrices to make them easier to interpolate. I haven’t done the math for that though.

Summary

So in summary I hope that I was able to make quaternions a whole lot less weird. The geometric algebra interpretation of quaternions shows us that they are normal 3D constructs, not weird four-dimensional beasts. They consist of a scalar and three bivectors. Bivectors do 90 degree rotations followed by scaling, and we saw how we can create any rotation just from those 90 degree rotations and linear scaling. The rules that govern these constructs are simple, making the equations easy to derive and understand. (as opposed to the quaternion equations which can only be memorized) Also quaternions do not naturally have a double cover. The double cover comes from the way we define the multiplication of vectors and quaternions. We could get rid of it, but the double cover is a great trick for making interpolations easier.

Unfortunately this still only makes it slightly easier to understand the numbers in quaternion. The double cover makes it so that each rotation actually gets applied twice, so my visualizations above only show half of what’s going on. This also makes it difficult to interpret the numbers because you have to know what happens if a rotation gets applied twice, which is a whole lot harder to do in your head than doing a single rotation. But still I now have a picture of quaternions, and I know what each component means, and why they behave the way they do. I hope I was able to do something similar for you.

I also think that Geometric Algebra is a very interesting field that merits further study. The fact that quaternions came out so naturally (in fact they almost don’t even need a special name) and that if we do the same derivation in 2D we end up with complex numbers is fascinating to me. The paper I linked at the beginning, Imaginary Numbers are not Real, spends a lot of time talking about how various equations in physics come out much simpler if we use geometric algebra instead of imaginary numbers and matrices. Simplicity like that is a good hint that there is something good going on here. If you’re interested in this for doing 3D math, there is something called Conformal Geometric Algebra which adds translation to quaternions. I didn’t look too much into it, but a brief glance shows that it might be related to dual quaternions. So there’s much more to discover.