Friday, January 29, 2016

Comparing two lists in C#

Here is some C# code to compare lists with two different types but that have some way (perhaps unique to the various objects) to extract a string for comparison. This code takes actions as to what to do in an element in in one list and not the other and what to do with matches. The argument list is a little long for my tastes, but I solved this ambiguity by naming the parameters on invocation. I feel the parameter names are very nice and easy to understand. Another weakness of this code is that it assumes the get tag functions return unique strings. It also requires nullable types. That isn't too burdensome, but should be noted. It could easily be rewritten to remove this requirement. In the case I use it, this is a valid assumption and the assumption is noted in comments above the function.

I put up this code, because I wanted to explain why I wrote the code like this. The naive answer is, reusability. And I won't say that reusability isn't nice, but more importantly the logic of comparing two lists had nothing to do with my domain. Embedding this code in the context I used it would violate the Single Responsibility Principle. Instead of my code being about my domain, it would be about comparison. With this code, anyone reading code using this can compartmentalize out this code and focus on the domain.

Another interesting question is, "why strings and why require class"? It seems like if you want more flexibility you should not rely on this. That is a somewhat valid point. But the answer is, this is the code I needed. When I write code for the first time I don't care about reuse. I want to create the code that fits my situation. I also don't want to add flexibility without value. When I find a piece of code I could almost use, I will refactor it at the time of reuse.

This is also very unit testable. It is functional code and the logic lends itself to testing very well. Nothing needs to be mocked. Testing small, isolated pieces is good. Testing giant chunks is bad.

// Note that this code assume the getTag functions will return 
// unique strings
public static void CompareLists<A,B>(
    IEnumerable<A> listA, IEnumerable<B> listB, 
    Func<A,string> getTagFromA, Func<B,string> getTagFromB, 
    Action<A> inAnotB, Action<B> inBnotA,
    Action<A, B> inBoth) where A : class where B : class {
    var hash = new Dictionary<string, Tuple<A, B>>();
    listA.ForEach(thingA => {
        string tag = getTagFromA(thingA);
        if (hash.ContainsKey(tag)) {
            hash[tag] = new Tuple<A,B>(thingA,hash[tag].Item2);
        } else {
            hash[tag] = new Tuple<A,B>(thingA, null);
        }
    });
    listB.ForEach(thingB => {
        string tag = getTagFromB(thingB);
        if (hash.ContainsKey(tag)) {
            hash[tag] = new Tuple<A, B>(hash[tag].Item1, thingB);
        } else {
            hash[tag] = new Tuple<A, B>(null, thingB);
        }
    });
    hash.Select(pair => pair.Value).ForEach(tuple => {
        bool hasA = tuple.Item1 != null;
        bool hasB = tuple.Item2 != null;
        if (hasA && hasB) {
            inBoth(tuple.Item1, tuple.Item2);
        } else if (hasA && !hasB) {
            inAnotB(tuple.Item1);
        } else if (!hasA && hasB) {
            inBnotA(tuple.Item2);
        }
    });
}

No comments:

Post a Comment