Tuesday, March 23, 2010

Sneaky Equality Bugs

While trying to figure out why running a single test with Gallio causes ReSharper 5.0 to throw up the test failure in a message box (which it never did it before), I found this sneaky little bug.

public abstract class RemoteTask : IEquatable<RemoteTask>
{
    public string RunnerId { get; set; }

    public override bool Equals(RemoteTask other)
    {
        return Equals(other as RemoteTask);
    }

    public bool Equals(RemoteTask other)
    {
        return other != null && RunnerId == other.RunnerId;
    }

    public override int GetHashCode()
    {
        return RunnerId.GetHashCode();
    }
}

Can you spot the bug?

Magic Triad: Dictionary<TKey, TValue>, EqualityComparer<T> and IEquatable<T>

Whenever we construct a Dictionary<TKey, TValue> without providing an explicit IEqualityComparer<TKey> in the constructor (which is most of the time), the .Net framework automatically provides one for us.  Namely, it uses EqualityComparer<TKey>.Default.

EqualityComparer<T> is one of those really useful generic classes that is really easy to overlook.  Its job is to choose an appropriate implementation of IEqualityComparer<T> for any given type T.  Here are the cases it considers.

  • If T is type byte, then use a ByteEqualityComparer that does the obvious byte comparison.
  • If T is assignable to type IEquatable<T>, then use a GenericEqualityComparer<T> that will call IEquatable<T>.Equals method to determine equality.
  • If T is nullable and its value type TV is assignable to type IEquatable<TV>, then use a NullableEqualityComparer<T> that performs a null-check then calls IEquatable<TV>.Equals(TV) method to determine equality of non-null values.
  • If T is an enum whose underlying type is int, then use an EnumEqualityComparer<T> that compares enum values as integers.
  • Otherwise, use an ObjectEqualityComparer<T> that performs a null-check then compares objects using Object.Equals.

All of this magic is designed to do one thing: minimize the number of boxing conversions and casts required to compare objects.

The Bug

Did you notice that RemoteTask is abstract?  That tells you that it is intended to be subclassed.

To ensure correct equality and hash code semantics, a subclass must override at least the following two methods:

  • bool IEquatable<RemoteTask>.Equals(RemoteTask)
  • int Object.GetHashCode()

(Note: It is not necessary to override object.Equals(object) because all it does is delegate to IEquatable<RemoteTask>.Equals(RemoteTask).  Due to covariance of the equality relation, this code will work fine for all subclasses as long as they override IEquatable<RemoteTask>.Equals(RemoteTask) with their extra logic.)

Unfortunately RemoteTask’s implementation of IEquatable<RemoteTask>.Equals(RemoteTask) is not virtual!  Consequently subclasses cannot override it.  No matter what they do, they are stuck with an implementation that only considers the RunnerId field when determining equality.

As a matter of fact, all subclasses of RemoteTask override object.Equals(object) but not IEquatable<RemoteTask>.Equals(RemoteTask). That’s really bad because the methods will probably return different results!  Any code that uses IEquatable<RemoteTask> to compare RemoteTask instances will be broken.

This bug has been in ReSharper for a few releases now but it only just reared its ugly head due to an internal refactoring.

Previous Usage of RemoteTask.Equals (Hides the Bug)

Previously, ReSharper performed a lookup of tasks using linear search, something like this:

private List<KeyValuePair<RemoteTask, UnitTestElement> tasks;

public UnitTestElement GetTestElement(RemoteTask task)
{
    foreach (RemoteTask candidate in tasks)
    {
        if (object.Equals(task, candidate.Key))
            return candidate.Value;
    }
    return null;
}

Notice that this code calls the object.Equals(object, object) static helper function which performs a null check then determines equality using object.Equals(object). In this case, subclasses that just override object.Equals(object) will be fine.

Refactored Usage of RemoteTask.Equals (Triggers the Bug)

For efficiency, ReSharper now performs a lookup of tasks using a hash table, something like this:

private Dictionary<RemoteTask, UnitTestElement> tasks;

public UnitTestElement GetTestElement(RemoteTask task)
{
    UnitTestElement element;
    tasks.TryGetValue(task, out element);
    return element;
}

What’s new here is that under the hood, the Dictionary<RemoteTask, UnitTestElement> uses EqualityComparer<RemoteTask>.Default.  That in turn compares values using IEquatable<RemoteTask>.Equals(RemoteTask)

Oh yeah, that’s the method that subclasses can’t override…

The net result is that the hash table considers all values to be equal as long as they have the same RunnerId, regardless of their type or whatever other properties they may consider in their implementation of object.Equals(object).

For the most part, this is only a problem when there are hash code collisions.  If the hash codes are always distinct then it won’t matter to the hash table that the equality comparer it is using returns true more often than it should…

In other words, the fact that this code appears to work at all today is a matter of pure luck given that hash code collisions are fairly rare (for well designed hash functions, at least).

Moral

If you implement IEquatable<T> in a class that is intended to be subclasses, make sure to make it virtual!  Also make sure the subclasses override it as needed!

No comments: