… and ended (as always) with code

Our requirements from an index-type are fairly minimal…

public interface IIndex
	int Value { get; }

… and this would be enough for a number of scenarios, although not one particular scenario that I wanted: being able to construct a typed-index from an int. This operation is useful for being able to expose existing properties of an array as their typed equivalent, in particular I wanted to expose the length of the array in each dimension.

public interface IIndex<TIndex> : IIndex
	where TIndex : struct, IIndex<TIndex>
        TIndex WithValue(int value);

The above is then used in conjunction with the fact that T : struct implies T : new() to essentially gain a T : new(int) constraint by using var t = new T().WithValue(i). I could call this a tweak on the prototype-pattern but at the end of the day it’s a bit of a hack around the type constraints I’m able to achieve with generics.

Due to index types needing to be structs we can’t use inheritance, making implementing an index type a rather tedious and unpleasant copy-paste affair. While this obviously has plenty of disadvantages, it has the benefit that each index type is free to enabled – or not – methods and operations as it sees fit, for example IComparable<>, IEquatable<>, operator overloads, &c. No more than five or six index types will be created, with some having the requirement to be comparable (e.g. yearA < yearB) and others with the requirement to explicitly not be comparable (e.g. simulation – each is fully independent of all others) so I consider the trade-offs being made here to be acceptable.

Implementing the wrappers for arrays was also a rather tedious job as each rank requires a separate implementation with the correct number of type parameters.

public class StrongArray<TValue, TIndex0, TIndex1> : StrongArray
	where TIndex0 : IIndex<TIndex0>
	where TIndex1 : IIndex<TIndex1>
	private readonly TValue[,] array;

	public StrongArray(TIndex0 length0, TIndex1 length1)
		: this(new TValue[length0.Value, length1.Value])

	public StrongArray(TValue[,] array)
		: base(array)
		this.array = array;

	public TValue this[TIndex0 index0, TIndex1 index1]
		get { return array[index0.Value, index1.Value]; }
		set { array[index0.Value, index1.Value] = value; }

        public TIndex0 Length0
            get { return new TIndex0().WithValue(array.GetLength(0)); }

        public TIndex1 Length1
            get { return new TIndex1().WithValue(array.GetLength(1)); }

	public TValue[,] Array
		get { return array; }

And, no, I couldn’t think of a better name at the time!

Was it worth it?

For about a day an a half of coding, what Code Metrics tells me is 125 lines of code – with the biggest contributor being a class of helper methods such as Enumerable.Range equivalents – and a very modest performance penalty, this thin abstraction over arrays has had a significant impact on how much information is conveyed by our code and has helped to reduce the amount of time and effort required to understand a piece of code.

It all started with Matlab…

O, for a green field

We’re currently preparing to rewrite a reasonable amount of Matlab into C#. Starting afresh is always enjoyable but rarely sensible, however one of the major reasons for us doing so is that we no longer have the skills to maintain the codebase as it stands. For a start we no longer have any experienced Matlab developers, and for a second the existing code is a glorious combination of spaghetti logic, single-letter variables, and cryptic – and often untrue – comments.

We intend to do the rewrite in stages, moving into the world of C#, Visual Studio, and heaps of testing and refactoring tools as quickly as possible, then incrementing our way to a reliable, performant, maintainable solution. Our first step will be to rewrite the Matlab code into a very simple stripped-down Matlab-like set of classes in C# as verbatim as possible, we’ll use a barrage of regression tests to make sure that is done correctly. Next, we’ll refactor that code from the loosely-typed ‘framework’ we’ve created into arrays where we know types and dimensions &c; during this process we’ll be familiarising ourselves with the code and adding simple structure as we go. Finally, we’ll convert what we now have into idiomatic C#, moving – where appropriate – away from the array-based Matlab-style approach, using the full power of the language and data-structures available to us.

One of the major struggles that we’ve seen with the current code is tracking dimensions. With an array of simulation * year * ... * series * tranche that is being manipulated and combined with other arrays in sometimes unconvential ways and inside thousand-line methods, it becomes almost impossible to know if the second dimension of arrPenIcrPost was year, simulation, or something else entirely.

With this in mind, I wrote a thin wrapper for C# arrays which, along with some helper classes, will allow for ‘strongly-typed’ indexing: for example having an array of double<Sim,Year>[,] so that the dimensions are not referenced simply by int but by custom types. The types of the dimensions will be an integral part of the type of the object, will be enforced by the language, and will as a consequence be impossible to mistake.

Performance anxiety

My main concern with doing this was that of performance, I truly wasn’t sure how well the code I was writing would be perform. I could certainly see ways in which I hoped the code would be optimised, such as inlining the underlying value of the structs being used as indexes; but I could also see potential problems, mostly around the boxing of value types and the overhead of pointer-chasing.

To my delight (and slight surprise) the performance overhead could be made extremely minimal with relatively little work; the most important factors that I hit while developing were using structs for the index types, avoiding inheritance and virtual methods in certain situations, and using class-level generics rather than method-level generics.
The first is fairly obvious and was the approach I took anyway, I simply changed to classes to see what impact it would have; the second is again relatively straightforward, vtables and all that; the third was slightly more surprising, and I believe has to do with the optimiser being able to avoid boxing value types when accessing them through an interface if the entire class is generic.

There will be code

That’s rather a lot of text without so much as a hint of code. Next time I’ll be looking at the code that actually achieves all of the above!