How to Choose the Right .NET Collection Class?

Posted by: Damir Arh , on 11/25/2018, in Category C#
Views: 33595
Abstract: The .NET framework Base Class Library contains many collection classes. This can make it difficult to decide when to use which. Grouping them together based on their properties can make the choice for a specific scenario, much easier.

.NET Collection Class - Generic Collections

Most of the collection classes are placed in the System.Collections and the System.Collections.Generic namespaces, but many of them are placed in other namespaces as well. Although, in most cases, only collections in the System.Collections.Generic namespace should be considered.

The collections in System.Collections are all non-generic, originating from the time before generics were added to C# with the release of .NET framework 2.0 and C# 2.0. Unless you are maintaining legacy code, you should avoid them because these don’t provide type safety and have the worst performance when compared to their generic counterparts when used with value types.

The collections in other namespaces are highly specialized for specific scenarios and are usually not applicable to code from other problem domains. E.g. the System.Dynamic.ExpandoObject class implements multiple collection interfaces but is primarily meant for implementing dynamic binding.

Even when only looking at generic collections in the System.Collections.Generic namespace, the choice can still seem overwhelming.

Fortunately, the collections can be easily grouped in a few categories based on the interfaces they implement. These determine which operations are supported by a collection and consequently in which scenarios can they be used.

The common interface for collections is the ICollection<T> interface. It inherits from the IEnumerable<T> interface which provides the means for iterating through a collection of items:

IEnumerable<int> enumerable = new List<int>() { 1, 2, 3, 4, 5 };

foreach(var item in enumerable)
{
    // process items
}

The ICollection<T> interface adds the Count property and methods for modifying the collection:

ICollection<int> collection = new List<int>() { 1, 2, 3, 4, 5 };

var count = collection.Count;
collection.Add(6);
collection.Remove(2);
collection.Clear();

The authors of the Base Class Library (BCL) believed that these suffice for implementing a simple collection. Three different interfaces extend this base interface in different ways to provide additional functionalities.

Editorial Note: Read Using Generics in C# to Improve Application Maintainability to make our software more resilient to data-related changes, thereby improving its maintainability.

1. Lists

The IList<T> interface describes collections with items which can be accessed by their index. For this purpose, it adds an indexer for getting and setting a value:

IList<int> list = new List<int> { 1, 2, 3, 4, 5 };

var item = list[2]; // item = 3
list[2] = 6;        // list = { 1, 2, 6, 4, 5 }

It also includes methods for inserting and removing a value at a specific index:

list.Insert(2, 0); // list = { 1, 2, 0, 6, 4, 5 }
list.RemoveAt(2);  // list = { 1, 2, 6, 4, 5 }

The most commonly used class implementing the IList<T> interface is List<T>. It will work great in most scenarios, but there are other implementations available for more specific requirements. For example, the SynchronizedCollection<T> has a built-in synchronization object which can be used to make it thread-safe.

2. Sets

ISet<T> interface describes a set, i.e. a collection of unique items which doesn’t guarantee to preserve their order. The Add method can still be used for adding individual items to the collection:

ISet<int> set = new HashSet<int> { 1, 2, 3, 4, 5 };

var added = set.Add(0);

However, it will only add the item to the collection if it’s not already present in it. The return value will indicate if the item was added.

Rest of the methods in the interface implement different standard set operations from set theory. The following methods modify the existing collection:

ISet<int> set = new HashSet<int> { 1, 2, 3, 4, 5 };

set.UnionWith(new[] { 5, 6 });              // set = { 1, 2, 3, 4, 5, 6 }
set.IntersectWith(new[] { 3, 4, 5, 6, 7 }); // set = { 3, 4, 5, 6 }
set.ExceptWith(new[] { 6, 7 });             // set = { 3, 4, 5 }
set.SymmetricExceptWith(new[] { 4, 5, 6 }); // set = { 3, 6 }

The others only perform tests on the collection and return a boolean value:

ISet<int> set = new HashSet<int> { 1, 2, 3, 4, 5 };

var isSubset = set.IsSubsetOf(new[] { 1, 2, 3, 4, 5 }); // = true
var isProperSubset = set.IsProperSubsetOf(new[] { 1, 2, 3, 4, 5 }); // = false
var isSuperset = set.IsSupersetOf(new[] { 1, 2, 3, 4, 5 }); // = true
var isProperSuperset = set.IsProperSupersetOf(new[] { 1, 2, 3, 4, 5 }); // = false
var equals = set.SetEquals(new[] { 1, 2, 3, 4, 5 }); // = true
var overlaps = set.Overlaps(new[] { 5, 6 }); // = true

The most basic implementation of ISet<T> is the HashSet<T> class. If you want the items in the set to be sorted, you can use SortedSet<T> instead.

3. Dictionaries

IDictionary<TKey, TValue> is the third interface extending the ICollection<T> interface. In contrast to the previous two, this one is for storing key-value pairs instead of standalone values, i.e. it uses KeyValuePair<TKey, TValue> as the generic parameter in ICollection<T>. Its members are designed accordingly. The indexer allows getting and setting the values based on a key instead of an index:

IDictionary<string, string> dictionary = new Dictionary<string, string>
{
    ["one"] = "ena",
    ["two"] = "dva",
    ["three"] = "tri",
    ["four"] = "štiri",
    ["five"] = "pet"
};
var value = dictionary["two"];
dictionary["zero"] = "nič" ;

The Add method can be used instead of the indexer to add a pair to the collection. Unlike the indexer it will throw an exception if the key is already in the collection.

dictionary.Add("zero", "nič");

Of course, there are methods for checking if a key is in the collection and for removing an existing pair based on its key value:

var contains = dictionary.ContainsKey("two");
var removed = dictionary.Remove("two");

The latter will return a value indicating whether or not the key was removed because it wasn’t there in the collection in the first place.

There are also properties available for retrieving separate collections of keys and values:

ICollection<int> keys = dictionary.Keys;
ICollection<string> values = dictionary.Values;

For scenarios with no special requirements, the Dictionary<TKey, TValue> class is the go-to implementation of the IDictionary<TKey, TValue> interface.

Two different implementations are available if keys need to be sorted (SortedDictionary<TKey, TValue> and SortedList<TKey, TValue>), each with its own advantages and disadvantages.

Many more implementations of the interface are available in the Base Class Library.

Queue and Stack

There are two collection classes worth mentioning which unlike the others so far, implement none of the specialized interface derived from the ICollection<T> interface.

The Queue<T> class implements a FIFO (first in, first out) collection.

Only a single item in it is directly accessible, i.e. the one that’s in it for the longest time. The methods for adding and removing an item have standard names for such a datatype:

var queue = new Queue<int>(new[] { 1, 2, 3, 4, 5 });

queue.Enqueue(6);
var dequeuedItem = queue.Dequeue();

There’s also the Peek method which returns the same item as the Dequeue method but leaves it in the collection:

var peekedItem = queue.Peek();

The Stack<T> class is similar to Queue<T>, but it implements a LIFO (last in, first out) collection. The single item that’s directly accessible in this collection is the one that was added the most recently. The methods names reflect this:

var stack = new Stack<int>(new[] { 1, 2, 3, 4, 5 });
stack.Push(6);
var poppedItem = stack.Pop();
var peekedItem = stack.Peek();

Thread Safety

The regular generic classes in the Base Class Library have one very important deficiency if you need to use them in a multi-threaded application: they are not thread-safe.

...or at least not entirely thread-safe.

While most of them support several concurrent readers, the reading operations are still not thread-safe as no concurrent write access is allowed.

Editorial Note: Read C# Sharding and Multithreading - Deep Dive as well as Concurrent Programming in .NET Core.

As soon as the collection has to be modified, any access to it from multiple threads must be synchronized.

The simplest approach to implementing such synchronization involves using the lock statement with a common synchronization object:

public class DictionaryWithLock<TKey, TValue>
{
    private readonly object syncObject = new object();
    private readonly Dictionary<TKey, TValue> dictionary;

    public DictionaryWithLock()
    {
        dictionary = new Dictionary<TKey, TValue>();
    }

    public TValue this[TKey key]
    {
        get
        {
            lock (syncObject)
            {
                return dictionary[key];
            }
        }
        set
        {
            lock (syncObject)
            {
                dictionary[key] = value;
            }
        }
    }

    public void Add(TKey key, TValue value)
    {
        lock (syncObject)
        {
            dictionary.Add(key, value);
        }
    }
}

Although this approach would work, it is far from optimal.

All access to the inner dictionary is fully serialized, i.e. the next operation can only start when the previous one is completed. Since the collection allows multiple concurrent readers, it can significantly improve performance if it takes that into account and only restricts concurrent access for writing operations.

Fortunately, there’s no need for implementing this functionality from scratch.

The Base Class Library comes with the ReaderWriterLockSlim class which can be used to implement this specific functionality in a relatively simple manner:

public class DictionaryWithReaderWriterLock<TKey, TValue>
{
    private readonly ReaderWriterLockSlim dictionaryLock = new ReaderWriterLockSlim();
    private readonly Dictionary<TKey, TValue> dictionary;

    public DictionaryWithReaderWriterLock()
    {
        dictionary = new Dictionary<TKey, TValue>();
    }

    public TValue this[TKey key]
    {
        get
        {
            dictionaryLock.EnterReadLock();
            try
            {
                return dictionary[key];
            }
            finally
            {
                dictionaryLock.ExitReadLock();
            }
        }
        set
        {
            dictionaryLock.EnterWriteLock();
            try
            {
                dictionary[key] = value;
            }
            finally
            {
                dictionaryLock.ExitWriteLock();
            }
        }
    }

    public void Add(TKey key, TValue value)
    {
        dictionaryLock.EnterWriteLock();
        try
        {
            dictionary.Add(key, value);
        }
        finally
        {
            dictionaryLock.ExitWriteLock();
        }
    }
}

There are two different lock methods used, depending on the type of operation being synchronized: reading or writing.

The class will allow any number of concurrent read operations. On the other hand, the write lock will be exclusive and won’t allow any other read or write access at the same time.

Concurrent Collections

Even with the use of synchronization primitives provided by C# and the .NET framework, writing production-quality synchronization code is no small feat. Both of my wrapper classes implement synchronization for only a small subset of operations. You would want to implement at least the full IDictionary<TKey, TValue> interface if not all the members of the wrapped Dictionary<TKey, TValue> class.

That’s exactly what the concurrent collections in the System.Collections.Concurrent namespace do. They provide thread-safe implementations of collection interfaces.

The exceptions are the ConcurrentQueue<T> and the ConcurrentStack<T> classes which provide independent implementations similar to their non-concurrent counterparts Queue<T> and Stack<T> because there are no suitable interfaces in the .NET framework to implement.

The concurrent collection classes can be safely used in multi-threaded applications. They even have extra members in addition to those from the implemented interfaces which can prove useful in multi-threaded scenarios.

Still, there are minor caveats when using these collections. For example, the ConcurrentDictionary<TKey, TValue> class includes methods which are not fully atomic. The overloads of GetOrAdd and AddOrUpdate methods which accept delegates as parameters will invoke these delegates outside the synchronization locks.

Let’s inspect the following line of code to understand the implications of this:

var resource = concurrentDictionary.GetOrAdd(newKey, key => ExpensiveResource.Create(key));

Such method calls are common when the dictionary is used as a cache for instances of classes which might be expensive to create but can safely be used from multiple threads.

Without knowing the details of how the ConcurrentDictionary<TKey, TValue> is implemented, one would assume that this line of code will either return an instance from the cache or create a single new instance using the provided delegate when there’s no matching instance yet in the dictionary. It should then return that instance on all subsequent requests.

However, since the delegate is invoked outside the synchronization lock, it could be invoked from multiple different threads if it is not yet in the dictionary when this line of code is reached.

Although only one of the created instances will be stored in the dictionary in the end, the fact that multiple instances were created could be an issue. At the very least, it will affect performance if the creation takes a long time. If the factory method should only ever be called once for a specific key, you will need to write additional synchronization code yourself.

It is important to always thoroughly read the documentation for all concurrent collection classes to be aware of such implementation details.

Immutable Collections

Immutable collections aren’t included in the Base Class Library. To use them, the System.Collections.Immutable NuGet package must be installed in the project.

They take a different approach to making collections thread-safe. Instead of using synchronization locks as concurrent collections do, immutable collections can’t be changed after they are created. This automatically makes them safe to use in multi-threaded scenarios since there’s no way for another thread to modify them and make the state inconsistent.

This fundamental design decision affects the API of immutable collection classes. They don’t even have public constructors. There are two other ways to create a new instance though:

· A regular collection can be converted into an immutable one using an extension method:

var immutableList = new[] { 1, 2, 3, 4, 5 }.ToImmutableList();

· An empty collection which is exposed as a static property can be modified by adding new items to it:

var immutableList = ImmutableList<int>.Empty.AddRange(new[] { 1, 2, 3, 4, 5 });

All operations which would normally modify the collection return a new instance instead. It’s important to remember that the returned value must be used from there on instead of the original instance:

immutableList = immutableList.Add(6);

That each method returns a new instance of the same class makes it easy to chain multiple method calls:

immutableList = immutableList
    .Add(6)
    .Add(7)
    .Add(8);

However, this approach should be avoided whenever possible because each method call in such a chain creates a new instance of the collection. Although the immutable collections are implemented in a way to reuse as much of the original collection as possible when creating a new instance, some memory allocations are still required. This means more work for the garbage collector.

The best way to minimize this problem is to use methods which can perform the required modifications in a single call. The above chain of method calls could for example be replaced with the following single method call:

immutableList = immutableList.AddRange(new[] { 6, 7, 8 });

Even more complex operations can be performed with a single method call. By following the naive approach from mutable collections, the following block of code would be used to remove all even numbers from a list:

for (var i = immutableList.Count - 1; i >= 0; i--)
{
    if (immutableList[i] % 2 == 0)
    {
        immutableList = immutableList.RemoveAt(i);
    }
}

To improve performance, the following equivalent line of code should be used instead:

immutableList = immutableList.RemoveAll(n => n % 2 == 0);

However, specialized methods are not available for all types of complex modifications. For example, there’s no single method available to both add and remove specific items as below:

immutableList = immutableList
    .Add(6)
    .Remove(2);

In such cases a builder can be used instead:

var builder = immutableList.ToBuilder();
builder.Add(6);
builder.Remove(2);
immutableList = builder.ToImmutable();

The ToBuilder method will create a builder for an immutable collection which will implement the interface of the corresponding mutable collection. Its internal memory structure will still match the one of the immutable collection, but the operations will modify the same instance instead of always creating a new one. Only when calling the ToImmutable method, will the instance be made immutable again very efficiently. This will reduce the amount of work for the garbage collector as much as possible.

So, when should immutable collections be used instead of regular or concurrent mutable ones?

A typical use case are multi-threaded applications, especially when a thread need not have access to the latest state of the collection and can use a potentially stale immutable instance which was originally passed to it. When that’s the case, immutable collections might offer better performance despite the additional work for the garbage collector because there are no synchronization locks required.

If an application needs to undo the operations on collections, it might make sense to use immutable collections even in single-threaded scenarios. Snapshots of previous states are a side product of immutable collections and require none additional resources to create. They can, for example, be stored in a stack to undo the last change:

snapshots.Push(immutableList);
immutableList = immutableList.Add(6);

immutableList = snapshots.Pop(); // undo: revert list to previous state

To achieve the same functionality with mutable collections, copies must be created before each modification:

snapshots.Push(list.ToList());
list.Add(6);

list = snapshots.Pop();

Not only is creating copies time consuming, the copies also require more memory than their immutable counterparts. The copies of mutable collections don’t share any memory between them, unlike immutable collections which often share large parts when they were created through modifications.

Conclusion:

When choosing a collection to use in your code, always start by thinking through which operations you will need to perform on that collection. Based on that, you can select the most appropriate collection interface.

Unless you have any other special requirements, go with an implementation from the System.Collections.Generic namespace. If you’re writing a multithreaded application and will need to modify the collection from multiple threads, choose the concurrent implementation of the same interface instead. Consider immutable collections if their behavior and performance match your requirements best.

This article has been editorially reviewed by Suprotim Agarwal.

Absolutely Awesome Book on C# and .NET

C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.

We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).

Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.

Click here to Explore the Table of Contents or Download Sample Chapters!

What Others Are Reading!
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+

Author
Damir Arh has many years of experience with software development and maintenance; from complex enterprise software projects to modern consumer-oriented mobile applications. Although he has worked with a wide spectrum of different languages, his favorite language remains C#. In his drive towards better development processes, he is a proponent of Test-driven development, Continuous Integration, and Continuous Deployment. He shares his knowledge by speaking at local user groups and conferences, blogging, and writing articles. He is an awarded Microsoft MVP for .NET since 2012.


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!