DotNetCurry Logo

Understanding HashSet in C# with Examples

Posted by: Mahesh Sabnis , on 4/28/2017, in Category C#
Views: 7569
Abstract: The HashSet collection type was first introduced in C# v3 and with .NET 3.5. This article will explore features of Hashset and also compare its performance with List.

In .NET, the collection framework plays a very important role.

The System.Collections namespace and System.Collections.Generic namespace provide a set of collection classes which are used for In-Memory data store for an application running on the client machine.

Note: Hashset is a collection of distinct values.

Are you a .NET/JavaScript Developer who loves reading magazines? Check out our Free Magazines at www.dotnetcurry.com/magazine/

In Data-Centric applications a.k.a. applications which have to perform a large amount of Read/Write operations on the Database, need to store the state of the data in the local memory (cache) where the client application is executed. Although caching is not mandatory, but the benefit of this approach is to increase performance of the application. This reduces frequent calls to database and hence helps in improving the application performance.

The collection classes are helpful to maintain the state of the data In-Memory, which is later available for read/write operations.

Introducing HashSet in C#

In the .NET framework, there are several classes available for performing these operations. Some of the classes are as follows:

  • List
  • Dictionary
  • HashSet
  • Queue

Sets

In C# programming, collections like ArrayList, List, simply adds values in it without checking any duplication. To avoid such a duplicate data store, .NET provides a collection name set. This is a collection type with distinct items.

There are two types of sets, SortedSet and HastSet. The SortedSet stores data in sorted order and also eliminates duplication.

HashSet vs SortedSet

Both the classes store non-duplicate items. However if you want performance and do not care if the items are stored unsorted, then go in for HashSet. However if you want the items to be sorted after insertion but are ready to take a performance hit, chose Sorted.

This article is divided into six sections which are as follows:

Section 1: Features of HastSet
Section 2: Eliminating Duplicate data entry in HashSet
Section 3: Modify HashSet using UnionWith() Method
Section 4: Modify Hashset using ExceptWith() Method
Section 5: Modify Hashset using SymmetricExceptWith() method
Section 6: Checking performance of operations like Add, Remove, Contains on HashSet and List.

Let us get started.

Section 1: Features of HastSet

Here are some salient features of HashSet.

  • This class represent a set of values.
  • This class provides high-performance set of operations.
  • This is a set of collection that contains no duplicate elements and there is no specific order for the elements stored in it.
  • In the .NET Framework 4.6 release, the HashSet implements IReadOnlyCollection interface along with the ISet interface.
  • The HashSet class does not have any maximum capacity for the number of elements stored in it. This capacity keeps increasing as the number of elements are added in it.

Section 2: Eliminating Duplicates in C# HashSet

Step 1: Open Visual Studio and create a Console Application of name CS_Using_HashSet.

Step 2: In the Main() method of Program.cs, add the following code

using System;
using System.Collections.Generic;

namespace CS_Using_HashSet
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Using HashSet");
            //1. Defining String Array (Note that the string "mahesh" is repeated) 
            string[] names = new string[] {
                "mahesh",
                "vikram",
                "mahesh",
                "mayur",
                "suprotim",
                "saket",
                "manish"
            };
            //2. Length of Array and Printing array
            Console.WriteLine("Length of Array " + names.Length);
            Console.WriteLine();
            Console.WriteLine("The Data in Array");
            foreach (var n in names)
            {
                Console.WriteLine(n);
            }

            Console.WriteLine();
            //3. Defining HashSet by passing an Array of string to it
            HashSet< string > hSet = new HashSet< string >(names);
            //4. Count of Elements in HashSet
            Console.WriteLine("Count of Data in HashSet " + hSet.Count);
            Console.WriteLine();
            //5. Printing Data in HashSet, this will eliminate duplication of "mahesh" 
            Console.WriteLine("Data in HashSet");
            foreach (var n in hSet)
            {
                Console.WriteLine(n);
            }     
            Console.ReadLine();
        }
    }
}

The above code has the following specifications (Note: Comment numbers on code matches with the following numbering)

1. declare an array of string of name names, which stores names in it. This array has a duplicate entry for string “mahesh”.

2. print(s) the length of array and data in it.

3. defines a HashSet of type string. This object is initialized using array which adds items in HashSet from the array automatically.

4. As discussed in Section 1, the HashSet object does not allow duplicate entry, hence the result will show the count of the data present in the HashSet less than the array count.

5. displays the data in HashSet.

Run the application, and the following result will be displayed:

hashset-duplicate-eliminate

Section 3: Modify HashSet Using UnionWith() Method

The UnionWith() method is used to modify the HashSet to contain all elements present in itself along with elements in other (IEnumerable) collection with which the union is established.

The following code is the implementation on UnionWith().

Step 1: Add the following code in the Main() method of the project.

string[] names1 = new string[] {
    "mahesh","sabnis","manish","sharma","saket","karnik" 
};

string[] Names2 = new string[] {
    "suprotim","agarwal","vikram","pendse","mahesh","mitkari"
};
//2.

HashSet< string > hSetN1 = new HashSet< string >(Names1);
Console.WriteLine("Data in First HashSet");
foreach (var n in hSetN1)
{
    Console.WriteLine(n);
}
Console.WriteLine("_______________________________________________________________");
HashSet< string > hSetN2 = new HashSet< string >(names2);
Console.WriteLine("Data in Second HashSet");
foreach (var n in hSetN2)
{
    Console.WriteLine(n);
}
Console.WriteLine("________________________________________________________________");
//3.
Console.WriteLine("Data After Union");
hSetN1.UnionWith(hSetN2);
foreach (var n in hSetN1)
{
    Console.WriteLine(n);
}

The above code has the following specifications (Note: Following numbering matches with the comments.)

1. Array objects declare Names1 and Names2 which contains string data in it.

2. This step defines two HashSet objects hSetN1 and hSetN2 based on names1 and names2 respectively and data from both HashSet is printed.

3. This step calls UnionWith() method on the hSetN1 and passes hSetN2 object to it and displays all data from hSetN1 after union.

Run the application and the following result will be displayed:

hashset-unionwith

Section 4: Modify Hashset Using ExceptWith() Method

This method is used to modify the HashSet by removing all elements which match with elements in another collection.

Step 1: Add the following code in the Main() method. The code uses hSetN2 declared in Section 3 and declares a new HashSet using the names1 array which is used to declare hSetN1.

Console.WriteLine();
Console.WriteLine("_________________________________");
Console.WriteLine("Data in HashSet before using Except With");
Console.WriteLine("_________________________________");
//storing data of hSetN3 in temporary HashSet
HashSet< string > hSetN3 = new HashSet< string >(names1);
foreach (var n in hSetN3)
{
    Console.WriteLine(n);
}
Console.WriteLine();
Console.WriteLine("_________________________________");
Console.WriteLine("Using Except With");
Console.WriteLine("_________________________________");
hSetN3.ExceptWith(hSetN2);
foreach (var n in hSetN3)
{
    Console.WriteLine(n);
}

After running the application, the following result will be displayed:

except-with-result

The above result shows that when the ExceptWith() method on the hSetN3 HashSet is called by passing hSetN2 parameter to it, then the matching string ‘mahesh’ is eliminated from the hSetN3 and remaining strings are displayed.

 

Section 5: Modify Hashset using SymmetricExceptWith() method

 

This method modifies the HashSet object to contain those elements which are only present in one of the two collections, but not both.

All the matching elements will be removed.

Step 1: Add the following code in the Main() method. The code useshSetN2 declared in Section 3 and declares a new HashSet hSet4 using an array names1.

HashSet< string > hSetN4 = new HashSet< string >(names1);
Console.WriteLine("_________________________________");
Console.WriteLine("Elements in HashSet before using SymmetricExceptWith");
Console.WriteLine("_________________________________");
Console.WriteLine("HashSet 1"); 
foreach (var n in hSetN4)
{
    Console.WriteLine(n);
}
Console.WriteLine("HashSet 2");
foreach (var n in hSetN2)
{
    Console.WriteLine(n);
}
Console.WriteLine("_________________________________");
Console.WriteLine("Using SymmetricExceptWith");
Console.WriteLine("_________________________________");
hSetN4.SymmetricExceptWith(hSetN2);
foreach (var n in hSetN4)
{
    Console.WriteLine(n);
}

The SymmetircExceptWith() method is called on hSetN4 HashSet by passing hSetN2 HashSet to it. Both these HashSets contains a string name “mahesh”.

The hSetN4 will be merged with values from hSetN2 by eliminating the matching entry. After running the application, the result will be as follows:

symmetric-except-result

Section 6: Checking performance of operations like Add, Remove, Contains on HashSet vs List.

All of the above sections have explained various methods of the HashSet.

But when a developer wants to make a decision for selecting the most suitable collections type based on the performance, then it is important to check which operations are frequently performed on the collection.

Generally, Add, Remove, Contains are the operations that are performed on In-Memory collections. To perform comparison between List vs HashSet for Add, Remove and Contains operation, the following string array is used. (Note: You can use any other data)

static string[] names = new string[] {
    "Tejas", "Mahesh", "Ramesh", "Ram", "GundaRam", "Sabnis", "Leena",
    "Neema", "Sita" , "Tejas", "Mahesh", "Ramesh", "Ram",
    "GundaRam", "Sabnis", "Leena", "Neema", "Sita" ,
    "Tejas", "Mahesh", "Ramesh", "Ram", "GundaRam",
    "Sabnis", "Leena", "Neema", "Sita" , "Tejas",
    "Mahesh", "Ramesh", "Ram", "GundaRam", "Sabnis",
    "Leena", "Neema", "Sita",
    "Tejas", "Mahesh", "Ramesh", "Ram", "GundaRam", "Sabnis", ……            };

(Total number of strings are:550)

Add the following method to Program.cs

static void Get_Add_Performance_HashSet_vs_List()
{
    
    Console.WriteLine("____________________________________");
    Console.WriteLine("List Performance while Adding Item");
    Console.WriteLine();
    List< string > lstNames = new List< string >();
    var s2 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        lstNames.Add(s);
    }
    s2.Stop();

    Console.WriteLine(s2.Elapsed.TotalMilliseconds.ToString("0.000 ms"));            Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine();
    Console.WriteLine("____________________________________");
    Console.WriteLine("HashSet Performance while Adding Item");
    Console.WriteLine();
   
    HashSet< string > hStringNames = new HashSet< string >(StringComparer.Ordinal);
    var s1 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        hStringNames.Add(s);
    }
    s1.Stop();

    Console.WriteLine(s1.Elapsed.TotalMilliseconds.ToString("0.000 ms"));            Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine("____________________________________");
    Console.WriteLine();
   
}

HashSet vs List – Add() method

The above method performs Add() operation on the List and HashSet by iterating strings from the names array.

Operation performance is calculated using the StopWatch class from System.Diagnostics namespace.

Run the application and the following result will be shown:

Please note that the following results shows difference of time on my machine, it may differ on your machine when the sample is executed.

perf-adding-item

The List<> takes less time to add strings when compared to HashSet.

The reason behind this is List.Add() simply adds an item to the list whereas HashSet.Add() will skip new item if it (is)equal to one of the existing items. This takes time to execute HashSet.Add() method as compare to List.Add() method.

HashSet vs List – Contains() method

Add the following method in Program.cs

static void Get_Contains_Performance_HashSet_vs_List()
{
 
    Console.WriteLine("____________________________________");
    Console.WriteLine("List Performance while checking Contains operation");
    Console.WriteLine();
    List< string > lstNames = new List< string >();
    var s2 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        lstNames.Contains(s);
    }
    s2.Stop();

    Console.WriteLine(s2.Elapsed.TotalMilliseconds.ToString("0.000 ms"));            Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine();
    Console.WriteLine("____________________________________");
    Console.WriteLine("HashSet Performance while checking Contains operation");
    Console.WriteLine();

    HashSet< string > hStringNames = new HashSet< string >(StringComparer.Ordinal);
    var s1 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        hStringNames.Contains(s);
    }
    s1.Stop();

    Console.WriteLine(s1.Elapsed.TotalMilliseconds.ToString("0.000 ms"));
    Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine("____________________________________");
    Console.WriteLine();

}

The above method checks if the List and HashSet contains item passed as an input parameter to the Contains() method. Run the application, the result will be shown as in the following image

perf-contains-operation

The result clearly shows that the HashSet provides faster lookup for the element than the List.

This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.

When the lookup occurs, the HashSet hashes it and jumps it to the matching bucket for each character starting from the first character and extracts the element from HashSet.

HashSet vs List – Remove() method

Add the following method in Program.cs

static void Get_Remove_Performance_HashSet_vs_List()
{
  
    Console.WriteLine("____________________________________");
    Console.WriteLine("List Performance while performing Remove item operation");
    Console.WriteLine();
    List< string > lstNames = new List< string >();
    var s2 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        lstNames.Remove(s);
    }
    s2.Stop();

    Console.WriteLine(s2.Elapsed.TotalMilliseconds.ToString("0.000 ms"));            Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine();
    Console.WriteLine("____________________________________");
    Console.WriteLine("HashSet Performance while performing Remove item operation");
    Console.WriteLine();

    HashSet< string > hStringNames = new HashSet< string >(StringComparer.Ordinal);
    var s1 = Stopwatch.StartNew();
    foreach (string s in names)
    {
        hStringNames.Remove(s);
    }
    s1.Stop();

    Console.WriteLine(s1.Elapsed.TotalMilliseconds.ToString("0.000 ms"));            Console.WriteLine();
    Console.WriteLine("Ends Here");
    Console.WriteLine("____________________________________");
    Console.WriteLine();

}

The above method performs remove operation on List and HashSet using Remove() method. Run the application, the result will be displayed as shown in the following image:

perf-remove-operation

The above image clearly shows that the Removal operation of HashSet is faster than the List. The Remove operation also works similar to the Contains operation.

Conclusion: HashSet in C# .NET is a high-performance collection store. The advantage of using HashSet object is to perform standard operations like Union, Intersection, etc. which provides an easy and maintainable coding experience. On the other hand, List object has the feature of item ordering, duplication, etc.

So based on the requirements of data handling, one can take a wise decision for choosing an appropriate collection.

This article was technically reviewed by Damir Arh.

Download the entire source code of this article (Github).

Was this article worth reading? Share it with fellow developers too. Thanks!
Share on Google+
Further Reading - Articles You May Like!
Author
Mahesh Sabnis is a DotNetCurry author and Microsoft MVP having over 17 years of experience in IT education and development. He is a Microsoft Certified Trainer (MCT) since 2005 and has conducted various Corporate Training programs for .NET Technologies (all versions). Follow him on twitter @maheshdotnet


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!