# SIMD Scalar Accessor

*How to make the type system work for you.*

One day you find yourself writing code for a C++ vector math library. Of course, it is written with templates so that you can parametrize the scalar type and vector dimensions. You will have something like this:

```
template <typename ScalarType, int Size>
struct Vector
{
ScalarType data[Size];
};
```

It can't get easier than that, right? Then the requirements start piling up and the code begins to bloat. This is perfectly normal and nothing to be alarmed about. One of the first things that typically happen at this stage is that the template is specialized for specific parameters. 2-, 3- and 4-dimensional vectors are very common with well established names for the vector components so something like this is scribbled into the text editor:

```
template <typename ScalarType>
struct Vector<ScalarType, 4>
{
union
{
ScalarType data[4];
struct { ScalarType x, y, z, w; };
};
};
```

Looks fairly innocent, right? Wrong! There is aliasing between the data members so the compiler has no choice but to treat these as discrete values. The CPU on the other hand has to assume that each object even if with identical storage location are actually unique and any read or write from these locations must be coherent.

The alternative is to write methods to access the vector components like so:

```
template <typename ScalarType>
struct Vector<ScalarType, 4>
{
ScalarType data[4];
ScalarType getX() const { return data[0]; }
void setX(ScalarType x) { data[0] = x; }
};
```

Don't get hung up on the setter/getter member function names; some like to use lower-case get_x(), some like to call them x() and so on. This depends on coding conventions and other factors we don't care about in this example because this post is about making all of this disappear. The best solution I have so far found to solve this design problem is what I coined the "Scalar Accessor" and it works like this:

```
template <typename ScalarType, typename VectorType, int Index>
struct ScalarAccessor
{
VectorType m;
// conversion into a scalar value ("accessor-to-scalar" conversion)
operator ScalarType () const
{
return simd::get_component<Index>(m);
}
// assignment of a scalar value ("scalar-to-accessor" conversion)
ScalarAccessor& operator = (ScalarType s)
{
m = simd::set_component<Index>(m, s);
return *this;
}
};
```

ScalarAccessor is a type which stores a SIMD vector and has operators to convert between vector and scalar values. The Index template parameter chooses the vector component which is being accessed. The low-level SIMD code encapsulates the implementation details. Now this template type is used to build the accessor into the vector class.

```
template <>
struct Vector<float, 4>
{
union
{
simd::float32x4 xyzw;
ScalarAccessor<float, simd::float32x4, 0> x;
ScalarAccessor<float, simd::float32x4, 1> y;
ScalarAccessor<float, simd::float32x4, 2> z;
ScalarAccessor<float, simd::float32x4, 3> w;
};
};
```

We have union just like at the beginning but there is a subtle difference: it is union of identical types (the standard would call this common initial sequence, 9.5.1). Now, access to the object x, y, z or w is accessing the same object as xyzw with one difference: xyzw will return a vector and x and it's kins will return a component of the vector. This is VERY IMPORTANT: the code runs completely on CPU registers and does NOT write into temporary memory location to access the vector.

How well does this work in practise?

```
float32x4 test(float32x4 a, float32x4 b)
{
return a * b.x;
}
```

Let's compile!

```
shufps xmm1, xmm1, 0
mulps xmm0, xmm1
```

Looks about what you expect the compiler to do anyway, what's the big deal? Look what the compiler does with the union of float values and vector register:

```
shufps xmm2, xmm2, 0
movq QWORD PTR [rsp-24], xmm0
movq QWORD PTR [rsp-16], xmm1
mulps xmm2, XMMWORD PTR [rsp-24]
movaps XMMWORD PTR [rsp-24], xmm2
mov rax, QWORD PTR [rsp-16]
movq xmm0, QWORD PTR [rsp-24]
mov QWORD PTR [rsp-24], rax
movq xmm1, QWORD PTR [rsp-24]
```

What's wrong with this picture, anyone spot the problem?

The same technique can be used to implement vector shuffling the same way GLSL and other shading languages expose element permutations. Here is a simple x86 / SSE implementation:

```
template <typename vec, size_t x, size_t y, size_t z, size_t w>
struct ShuffleAccessor
{
__m128 data;
operator vec () const
{
return { _mm_shuffle_ps(data, data, _MM_SHUFFLE(w, z, y, x)) };
}
};
union float32x4
{
ShuffleAccessor<float32x4, 0, 1, 2, 3> xyzw;
ShuffleAccessor<float32x4, 0, 0, 1, 1> xxyy;
};
float32x4 test_shuffle(float32x4 a)
{
return a.xxyy;
}
```

This compiles into one single instruction:

```
unpcklps xmm0, xmm0
```

Please do check out a math library I been working on ; it is a little bit more than that but that is another story.

"coding cave / super secret hideout"