It all started with Matlab…

O, for a green field

We’re currently preparing to rewrite a reasonable amount of Matlab into C#. Starting afresh is always enjoyable but rarely sensible, however one of the major reasons for us doing so is that we no longer have the skills to maintain the codebase as it stands. For a start we no longer have any experienced Matlab developers, and for a second the existing code is a glorious combination of spaghetti logic, single-letter variables, and cryptic – and often untrue – comments.

We intend to do the rewrite in stages, moving into the world of C#, Visual Studio, and heaps of testing and refactoring tools as quickly as possible, then incrementing our way to a reliable, performant, maintainable solution. Our first step will be to rewrite the Matlab code into a very simple stripped-down Matlab-like set of classes in C# as verbatim as possible, we’ll use a barrage of regression tests to make sure that is done correctly. Next, we’ll refactor that code from the loosely-typed ‘framework’ we’ve created into arrays where we know types and dimensions &c; during this process we’ll be familiarising ourselves with the code and adding simple structure as we go. Finally, we’ll convert what we now have into idiomatic C#, moving – where appropriate – away from the array-based Matlab-style approach, using the full power of the language and data-structures available to us.

One of the major struggles that we’ve seen with the current code is tracking dimensions. With an array of simulation * year * ... * series * tranche that is being manipulated and combined with other arrays in sometimes unconvential ways and inside thousand-line methods, it becomes almost impossible to know if the second dimension of arrPenIcrPost was year, simulation, or something else entirely.

With this in mind, I wrote a thin wrapper for C# arrays which, along with some helper classes, will allow for ‘strongly-typed’ indexing: for example having an array of double<Sim,Year>[,] so that the dimensions are not referenced simply by int but by custom types. The types of the dimensions will be an integral part of the type of the object, will be enforced by the language, and will as a consequence be impossible to mistake.

Performance anxiety

My main concern with doing this was that of performance, I truly wasn’t sure how well the code I was writing would be perform. I could certainly see ways in which I hoped the code would be optimised, such as inlining the underlying value of the structs being used as indexes; but I could also see potential problems, mostly around the boxing of value types and the overhead of pointer-chasing.

To my delight (and slight surprise) the performance overhead could be made extremely minimal with relatively little work; the most important factors that I hit while developing were using structs for the index types, avoiding inheritance and virtual methods in certain situations, and using class-level generics rather than method-level generics.
The first is fairly obvious and was the approach I took anyway, I simply changed to classes to see what impact it would have; the second is again relatively straightforward, vtables and all that; the third was slightly more surprising, and I believe has to do with the optimiser being able to avoid boxing value types when accessing them through an interface if the entire class is generic.

There will be code

That’s rather a lot of text without so much as a hint of code. Next time I’ll be looking at the code that actually achieves all of the above!

Leave a Reply