Refactoring Cassowary

Now the work begins…
As I mentioned before, my aim is for this codebase to become:

clean, modern, idiomatic, tested

If Refactoring and Working Effectively with Legacy Code have taught me anything, the first step is to get tests in place so that we can “Cover and Modify” and not have to “Edit and Pray”. There are a number of ports of the Cassowary engine out there (Python, JavaScript) and I was able to get a decent amount of tests written very quickly by porting their existing test suites to C#. This left me with a lot of integration / end-to-end tests, but with relatively few unit tests – for those there was no free lunch.

Let’s get typing

After tests, the first item on the agenda was to improve the type-sanity of the code.
It made heavy use of ArrayList and Hashtable (no generics in sight) and had a lot of casting scattered throughout the code. Replacing the above with List<T>, Hashset<T>, and Dictionary<TKey,TValue> would make the code clearer, cleaner, and safer to work with. I’m a big fan of letting the compiler / type system do as much of the work as possible – it’s much less prone to idiocy than me.

foreach (var idiom …

The next change was another relatively straightforward one: be more idiomatic.
For example, in C# foreach loops are more idiomatic than for loops, and certainly more idiomatic that the direct use of IEnumerators that was to be found in the original code! As I mentioned before, this was a port from a Java implementation of Cassowary and you could tell – it felt like Java. Using bool Try...() methods rather than returning null; properties instead of Get... and Set... methods; all changes to make this codebase feel more like C# and to help future developers (including myself) fall into the pit of success.

Change is tough

And I’m not talking about the refactoring… I’m talking about mutability.
The original code had a lot of publicly mutable state, with comments warning that it shouldn’t be modified when x, y, or z – at least there were comments! Some of the very first steps to tackle this were simply making this internal/protected/private, but that was quickly followed up by removing as much inappropriate mutability as possible.

Currently I’m still wrangling with a few spots of unwanted mutability right in the heart of the solver that are turning out to be a little more stubborn than the rest.

How do you call it?

Where we’re at

Following on from last time, we have our MethodInfo object and now need to actually invoke it. However we want to do this in a way that is as clean, safe and performant as possible.

Keeping it simple

We could simply call invoke on the MethodInfo object:

var foo = (Foo)methodInfo.Invoke(fooFactory, new object[] { 42 });

Last week, we put a lot of effort into trying to make what we were doing as safe as possible, so to throw away all the safety on the final hurdle seems rather silly. No type checking on the method’s parameters, an ugly cast of the return value to (Foo), and a fairly serious performance overhead make this solution less than ideal.

A nicer way

We can use System.Linq.Expressions to compile ourselves a Func<int, Foo> representing the method we wish to invoke:

var parameter = Expression.Parameter(typeof (int));
var lambda = Expression.Lambda<Func<int, Foo>>(
	Expression.Call(
		Expression.Constant(fooFactory), 
		methodInfo, 
		parameter),
	parameter);
var getFoo = lambda.Compile();
var foo = getFoo(42);

And this works… almost.

For reasons that I won’t go into deeply here, System.Linq.Expressions doesn’t deal with inheritance; it covers only a subset of language features and inheritance is not one of them. If we’re using an interface or base class for fooFactory then this will fail when we try to construct to Func.

The work around for this restriction is extremely simple though, we make the normal compiler do the work with inheritance by wrapping the call with our own concrete method.

private Foo GetFoo<T>(int id)
	where T : FlagBase
{
	return fooFactory.GetFoo<T>(id);
}

Since we’re now compiling this call to a Func<int, Foo> we can do this ahead of time and store the result, failing fast if anything is wrong and allowing us to invoke it repeatedly while avoiding the performance penalties associated with reflection. This leaves our full code looking like:

public abstract class FlagBase
{
	private readonly Func<int, Foo> getFoo;
	private readonly IFooFactory fooFactory = new FooFactory();
	
	public FlagBase()
	{
		var runtimeType = this.GetType();
		Func<int, Foo> methodGroup = GetFoo<FlagBase>;
		var methodInfo = methodGroup.Method
			.GetGenericMethodDefinition()
			.MakeGenericMethod(runtimeType);
		
		var parameter = Expression.Parameter(typeof (int));
		var lambda = Expression.Lambda<Func<int, Foo>>(
			Expression.Call(
				Expression.Constant(this),
				methodInfo, 
				parameter),
			parameter);
		
		this.getFoo = lambda.Compile();
	}
	
	public void Frobnicate()
	{
		var foo = getFoo(42);
		// ...
	}
	
	private Foo GetFoo<T>(int id)
		where T : FlagBase
	{
		return fooFactory.GetFoo<T>(id);
	}
}

Conclusion

What we were trying to do wasn’t particularly pretty and I certainly wouldn’t want to have to do it regularly, but as I mentioned last week this was about striking a balance between doing this strictly with the appropriate information at compile time, and making our API easier and cleaner to consume. Something like this is very much down to personal preference / style but I’m happy with the compromises that have been made, and with a good suite of tests I’m confident that the behaviour of this ‘hack’ is safe and reliable.

Not enough information to be generic…

This problem arose in a library I was writing last week, there were a number of ways to resolve it, all of them involving some amount of pain / ugliness in the code. The only question was where that pain would be.

The problem

We want to call a generic method with the exact runtime type of the current class.

public class FooFactory
{
	public Foo GetFoo<T>(int id) 
		where T : FlagBase
	{
		// ...
	}
}
public abstract class FlagBase
{
	private void Frobnicate()
	{
		var foo = fooFactory.GetFoo<???>(42);
		// ...
	}
}

So, our goal is to get the runtime type of the class in place of the ???.

The solution

It’s relatively simple to imagine ways we can get the deriving class to provide this information.

Since we’re using inheritance, an abstract method fits the bill:

public abstract class FlagBase
{
	protected abstract Foo GetFoo(FooFactory fooFactory, int id);
}

But now we must override this method in each and every deriving class, must get the type parameter correct each time, and must assume the deriving class is actually calling fooFactory correctly.

Another alternative could be to go for a CRTP pattern:

public abstract class FlagBase {}

public abstract class FlagBase<T> : FlagBase
	where T : FlagBase<T>
{
	private void Frobnicate()
	{
		var foo = fooFactory.GetFoo<T>(42);
		// ...
	}
}

This keeps the deriving classes clean but there’s a huge caveat, and one that we can’t enforce. If we create a derived class FlagA : FlagBase<FlagA> we can then create FlagB : FlagA. Unfortunately, we now have FlagB : FlagBase<FlagA> and the type parameter is not the actual runtime type of the class! To make this work we would have to restrict the inheritance from FlagBase<T> to be only one level deep, something that we cannot do.

There are other patterns that we can attempt to apply, but none of them allow us to keep the consuming code out of the picture. In this scenario at least, requiring the caller to provide this information is a bad idea:

  • It’s remarkably easy to provide the wrong information.
  • Every extra requirement you make of a consumer is another opportunity for a mistake.
  • It simply makes the library harder to consume.

Sometimes forcing the consumer to do the extra work is just something we have to live with for the type system we’re in, but here there’s a cleaner alternative – for the consumer at least – and a way that we can ‘internalise’ the pain.

What’s left in the toolbag?

this.GetType() and reflection to the rescue!
Not a sentence you often want to hear…

Searching Google or StackOverflow for anything reflection-based invariably leads to suggestions of invoking the required object by name and, particularly if you’re not very familiar with C#, it can be easy to be led down the garden path. In my opinion (and experience) hard coded object names scream of refactoring pain in the future. While any mistake should be picked up in your test suite, you want to strive to make refactoring as easy as possible, and why have a strongly typed language and a compiler if you’re not going to make them do at least some of the leg work?

For what we’re trying to achieve here, I prefer to get hold of a ‘strongly typed’ method group then change the generic type of that reference. It avoids strings and if applied carefully can be done safely.

public abstract class FlagBase
{
	private void Frobnicate()
	{
		var runtimeType = this.GetType();
		Func<int, Foo> methodGroup = fooFactory.GetFoo<FlagBase>;
		var methodInfo = methodGroup.Method
			.GetGenericMethodDefinition()
			.MakeGenericMethod(runtimeType);
		
		//var foo = fooFactory.GetFoo<???>(42);
		// ...
	}
}

The main type safety (yes, I’m using that term fast-and-loose) mistake we can make here is supplying the type parameters incorrectly, either the wrong number or types that don’t satisfy any type constraints. Fortunately, having the reference to the method right next to us in code helps identify the former at a glance. Satisfying T : FlagBase is guaranteed here, as we’re doing this work inside FlagBase and the runtime type of the object clearly satisfies the constraint.

Now we’re one step further along: we have a reference to the correctly-generically-typed method we want to call. But how do we actually go about calling it?

Using what we’ve built

We could simply call invoke on the MethodInfo object.

var foo = (Foo)methodInfo.Invoke(fooFactory, new object[] { 42 });

This leads to questions of type safety of the method’s parameters and any return value, as well as concerns about performance. Also, there’s only so much safety that I’m willing to throw away to make a library easier to consume.

The answer lies in System.Linq.Expressions for help in compiling this reference to something safely callable. Unfortunately for us, that has its own pitfalls to overcome…