DotNetCurry Logo

Using Generics in C# to Improve Application Maintainability

Posted by: Yacoub Massad , on 8/5/2017, in Category Patterns & Practices
Views: 23925
Abstract: This article describes how we can use Generics in C# to make our software more resilient to data-related changes, thereby improving its maintainability.

In a previous article, Data and Encapsulation in complex C# applications, I talked about two ways to treat runtime data. More specifically, I talked about data encapsulation and behavior-only encapsulation.

In a nutshell, encapsulating data means that we have units (e.g. classes) that contain data in a manner that:

- prevents consumers from accessing it directly (e.g. by putting it in private fields),

- allows consumers to use public methods that interact with it in an indirect way.

data-encapsulation

A different approach is to separate data and behavior into separate units.

Are you keeping up with new developer technologies? Advance your IT career with our Free Developer magazines covering C#, Coding Patterns, .NET Core, MVC, Azure, Angular, React, and more. Subscribe to our Magazine for FREE and download all previous, current and upcoming editions.

Data units contain only data that is accessible directly, and behavior units contain only behavior. Such behavior units are similar to functions and procedures in functional and procedural programming, respectively.

Behavior encapsulation means that consumers of behavior units consume them indirectly via interfaces and therefore they don’t know and don’t care how they are implemented.

behavior-only-encapsulation

When we encapsulate data, we can change the internal representation of the data without affecting the consumers of the unit encapsulating the data.

On the other hand, when we don’t encapsulate data, the behavior units will access data (received as method parameters for example) directly and therefore they have a hard dependency on the internal representation of the data.

Therefore, when doing behavior-only encapsulation, it is harder to change the internal representation of data.

In this article, I am going to discuss how we can use generics in C# to make it easier to handle data-related change requests in software applications when doing behavior-only encapsulation.

Note: this article is not an introduction to generics in C#. I assume that the reader is already familiar with generics. For more information about generics, consider this guide from Microsoft.

Data-related software changes

When maintaining an application, the development team receives many requirements that involve changing behavior with little or no changes to data structure/representation.

For example, one requirement might ask that we save documents to the database instead of the file system. Such requirements can be mostly met by creating new behavior units that contain the new behavior and then composing them correctly in the Composition Root.

On other occasions, however, requirements require our data objects to change or to support new data objects.

For example, in an application that processes (e.g. translates, prints, etc.) plain text documents, we might receive a requirement to support PDF documents or documents in Microsoft Word format.

Consider the following interface:

public interface ITextDocumentTranslator
{
    TextDocument Translate(TextDocument document, Language language);
}
public class TextDocument
{
    public string Identifier { get; }
    public string Content { get; }

    public TextDocument(string identifier, string content)
    {
        Identifier = identifier;
        Content = content;
    }
}

The ITextDocumentTranslator interface abstracts the process of translating a text-plain document to a different language. The TextDocument is a simple data object that represents a text document.

The ITextDocumentTranslator interface and its implementations have a hard dependency on TextDocument . They access the internals of this class (e.g. the Content property) freely. In a large application, we can expect to find tens of classes/interfaces that have a hard dependency on TextDocument .

In order to support PDF documents, we can create new interfaces and classes that deal with a PdfDocument data object. For instance, we can create a IPdfDocumentTranslator interface that looks like this:

public interface IPdfDocumentTranslator
{
    PdfDocument Translate(PdfDocument document, Language language);
}

The implementation of this interface would know how to translate a PdfDocument to a different language.

If the application originally had fifty classes and ten interfaces to deal with plain text documents, we would expect to have a similar number of classes and interfaces for also dealing with PDF documents.

A portion of the code in the new fifty (or so) classes is going to be very specific to PDF documents. After all, PDF documents have different features than plain text documents and processing them is going to be different than processing plain text documents.

On the other hand, another portion of the code in the new classes is going to be very similar to the old code. This is true because both sets of code are about processing documents.

Consider the following example:

public class TextDocumentProcessor : ITextDocumentProcessor
{
    private readonly ITextDocumentTranslator translator;
    private readonly ITextDocumentStore store;

    //..

    public void Process(TextDocument document)
    {
        var englishVersion = translator.Translate(document, Language.English);

        store.StoreDocument(englishVersion);
    }
}

This class is an orchestrator.

It receives a text document, invokes a translator dependency to translate it to English and then it invokes a store dependency to store the English version of the document. All it does is orchestration of operations; it doesn’t know how the document is translated or where it is going to be stored.

This orchestrator is data-independent.

Although it references the “document” parameter, it only uses it to pass it to the other dependencies. It has no dependency on the internals of the TextDocument data object.

If we create a similar class for PDF documents, it is going to be exactly the same as this one. This is clearly duplicate code:

public class PdfDocumentProcessor : IPdfDocumentProcessor
{
    private readonly IPdfDocumentTranslator translator;
    private readonly IPdfDocumentStore store;

    //..

    public void Process(PdfDocument document)
    {
        var englishVersion = translator.Translate(document, Language.English);

        store.StoreDocument(englishVersion);
    }
}

Using Generics to remove duplication

Instead of creating duplicate code, we can refactor the original code to deal with a generic document type.

We do this with both classes and interfaces like this:

public interface IDocumentProcessor<TDocument>
{
    void Process(TDocument document);
}

public interface IDocumentTranslator<TDocument>
{
    TDocument Translate(TDocument document, Language language);
}

public interface IDocumentStore<TDocument>
{
    void StoreDocument(TDocument document);
}

public class DocumentProcessor<TDocument> : IDocumentProcessor<TDocument>
{
    private readonly IDocumentTranslator<TDocument> translator;
    private readonly IDocumentStore<TDocument> store;

    //..
    
    public void Process(TDocument document)
    {
        var englishVersion = translator.Translate(document, Language.English);

        store.StoreDocument(englishVersion);
    }
}

For this particular example, we have reduced the number of interfaces from six to three.

Also, now we have a single generic DocumentProcessor<TDocument> class instead of two classes. Because the orchestration is data-independent, it does not care about the internals of the document object and thus we were able to convert it into a generic class where the document type is generic.

For the plain text part of the application, we can construct an instance of DocumentProcessor<TextDocument> like this:

var textDocumentProcessor =
    new DocumentProcessor<TextDocument>(
        new TextDocumentTranslator(...),
        new TextDocumentStore(...));

..and for the PDF part of the application, we can construct an instance of DocumentProcessor<PdfDocument> like this:

var pdfDocumentProcessor =
    new DocumentProcessor<PdfDocument>(
        new PdfDocumentTranslator(...),
        new PdfDocumentStore(...));

The TextDocumentTranslator, TextDocumentStore, PdfDocumentTranslator, PdfDocumentStore are non-generic classes that implement the generic interfaces for specific TDocument.

Here is one example:

public class TextDocumentStore : IDocumentStore<TextDocument>
{
    private readonly string connectionString;

    //..
    public void StoreDocument(TextDocument document)
    {
        using (var context = new DataContext(connectionString))
        {
            context.Documents.Add(
                new StoredDocument
                {
                    Type = StoredDocumentType.PlainText,
                    Identifier = document.Identifier,
                    BinaryContent = Encoding.UTF8.GetBytes(document.Content),
                });

            context.SaveChanges();
        }
    }
}

This class is non-generic, and it implements the IDocumentStore<TextDocument> interface. It deals specifically with the TextDocument class. It accesses the Identifier and Content properties of the TextDocument data object.

Therefore, the class is data-dependent.

This class uses entity framework to store the document to the database. The DataContext class is a class derived from entity framework’s DbContext class to enable accessing the database.

See this article from Microsoft for more details.

Here is another one:

public class PdfDocumentStore : IDocumentStore<PdfDocument>
{
    private readonly string connectionString;

    //..
    public void StoreDocument(PdfDocument document)
    {
        using (var context = new DataContext(connectionString))
        {
            context.Documents.Add(
                new StoredDocument
                {
                    Type = StoredDocumentType.Pdf,
                    Identifier = document.Identifier,
                    BinaryContent = SerializeDocument(document)
                });

            context.SaveChanges();
        }
    }

    private byte[] SerializeDocument(PdfDocument document)
    {
        //...
    }
}

This class is also non-generic, and it implements the IDocumentStore<PdfDocument> interface.

Notice how it accesses specific properties from the PdfDocument data object. The SerializeDocument method accesses the internals of the PdfDocument object and somehow serializes the content of the PDF document into a byte array to store it in the database.

The exact details of how this is done is not relevant to the subject of this article.

Maximizing data-independency

We have seen in the previous sections that it is data-independency that allows us to make behavior units generic and therefore remove duplication.

A logical deduction is, the more the data-independent behavior units we have, the more the amount of code duplication we can remove.

But how can we do this?

Let’s consider the TextDocumentStore and PdfDocumentStore units. These two units are data-dependent, each on its specific document type.

These two classes contain some shared logic. For example,

- both create an instance of the DataContext class passing a connection string,

- both create a new instance of the StoredDocument class and add it to the Documents DbSet,

- both invoke SaveChanges to commit the changes to the database, and

- both manage the connection by using the “using” statement.

However this shared logic is data-independent.

We can extract this shared logic into a data-independent class that replaces both the TextDocumentStore and the PdfDocumentStore classes, and then create two data-dependent classes just to extract the data specific to the two data types:

public interface IContentExtractor<TDocument>
{
    IdAndContent Extract(TDocument document);
}

public class DocumentStore<TDocument> : IDocumentStore<TDocument>
{
    private readonly IContentExtractor<TDocument> contentExtractor;
    private readonly string connectionString;
    private readonly StoredDocumentType documentType;

    //..

    public void StoreDocument(TDocument document)
    {
        var content = contentExtractor.Extract(document);

        using (var context = new DataContext(connectionString))
        {
            context.Documents.Add(
                new StoredDocument
                {
                    Type = documentType,
                    Identifier = content.Identifier,
                    BinaryContent = content.Content
                });

            context.SaveChanges();
        }
    }
}

public class PdfContentExtractor : IContentExtractor<PdfDocument>
{
    public IdAndContent Extract(PdfDocument document)
    {
        return new IdAndContent(document.Identifier, SerializeDocument(document));
    }

    private byte[] SerializeDocument(PdfDocument document)
    {
        //Serialize Pdf document here
    }
}

public class TextContentExtractor : IContentExtractor<TextDocument>
{
    public IdAndContent Extract(TextDocument document)
    {
        return new IdAndContent(
            document.Identifier,
            Encoding.UTF8.GetBytes(document.Content));
    }
}

The amount of duplication has decreased.

The DocumentStore class is now a generic data-independent class that contains the logic to write data to the database.

When it needs data-type specific data, it uses the IContentExtractor<TDocument> dependency to obtain such data. For each data-type, we have a data-dependent implementation of this interface to extract the data.

Here is how the Composition Root looks like now:

var textDocumentProcessor =
    new DocumentProcessor<TextDocument>(
        new TextDocumentTranslator(),
        new DocumentStore<TextDocument>(
            new TextContentExtractor(),
            connectionString,
            StoredDocumentType.PlainText));

var pdfDocumentProcessor =
    new DocumentProcessor<PdfDocument>(
        new PdfDocumentTranslator(),
        new DocumentStore<PdfDocument>(
            new PdfContentExtractor(),
            connectionString,
            StoredDocumentType.Pdf));

If the translation code in TextDocumentTranslator and PdfDocumentTranslator has shared logic, we can do a similar refactoring to move the shared data-independent part into generic data-independent classes, and move the data-dependent parts into their own classes.

After we are done with refactoring, data-independent code will be in generic data-independent classes, and for each data type, we will have smaller non-generic data-dependent classes to do the things that are specific to each data type.

 

splitting-data-independent-dependent-units

Supporting changes within the context of a single data type

Separation of data-independent behavior and data-dependent behavior also has benefits within the context of a single data type.

Many a times, we are required to change the internal structure of our data types.

For example, we might decide to change the type of the Content property in the TextDocument data class from a simple string to a string array representing the text in each paragraph of the document. In this case, we will have to modify many of the data-dependent units because they depend on the old representation of the Content property.

When we separate data-independent behavior and data-dependent behavior into different units, we minimize the number and size of data-dependent units, and therefore it becomes easier to make changes to the internal structure of existing data types.

Moving data-dependent logic into the data objects

The PdfContentExtractor and TextContentExtractor classes from the previous example contain logic that is specific to PDF and plain text documents respectively.

One might be tempted to move such logic into the PdfDocument and TextDocument classes respectively. This is especially true when such logic is simple.

Here is how the relevant code would look like:

public interface IDocument
{
    string GetIdentifier();
    byte[] GetBinaryContent();
    StoredDocumentType GetTypeForStore();
}

public class TextDocument : IDocument
{
    public string Identifier { get; }
    public string Content { get; }

    //..

    public string GetIdentifier() => Identifier;

    public byte[] GetBinaryContent() => Encoding.UTF8.GetBytes(Content);

    public StoredDocumentType GetTypeForStore() => StoredDocumentType.PlainText;
}

public class PdfDocument : IDocument
{
    public string Identifier { get; }
    //..

    public string GetIdentifier() => Identifier;

    public byte[] GetBinaryContent()
    {
        //Serialize Pdf document here
    }

    public StoredDocumentType GetTypeForStore() => StoredDocumentType.Pdf;

}

public class DocumentStore<TDocument> : IDocumentStore<TDocument>
    where TDocument: IDocument
{
    private readonly string connectionString;

    //..
    public DocumentStore(string connectionString)
    {
        this.connectionString = connectionString;
    }

    public void StoreDocument(TDocument document)
    {
        using (var context = new DataContext(connectionString))
        {
            context.Documents.Add(
                new StoredDocument
                {
                    Type = document.GetTypeForStore(),
                    Identifier = document.GetIdentifier(),
                    BinaryContent = document.GetBinaryContent()
                });

            context.SaveChanges();
        }
    }
}

The DocumentStore class in this case does not need a dependency on IContentExtractor<TDocument> because the data dependent logic was moved to the document object itself. The DocumentStore can invoke methods via the IDocument interface to get the data-type specific data.

In this particular example, since it is very simple, such an approach is OK.

However, as I discussed in the Data and Encapsulation in complex C# applications article, this moves us towards data encapsulation, and in complex applications, this raises issues that we need to consider.

You aren’t gonna need it (The YAGNI principle)

Should we always separate data-independent and data-dependent logic into different classes?

Should we make our data-independent classes generic from the start?

Well, it depends.

But in most of the cases, we don’t have to.

In most of the cases, applications start small, and then they evolve with time. It is important to first concentrate efforts on meeting the requirements we have at hand. We can always refactor later to meet new requirements.

In a document processing application that processes plain text documents only, we can start normally without making the document type generic in the classes that deal with documents.

Later, when we need to introduce different documents types, we can refactor existing classes/interfaces to become generic, and we can also refactor to separate data-independent behavior and data-dependent behavior into different classes.

If we follow the SOLID principles (the Single Responsibility Principle in particular), chances are that separation of data-independent and data-dependent behavior, is already high. Also, refactoring in this case would be a lot easier compared to when our classes are very long.

Conclusion:

In this article, I discussed some data-related software changes and how C# Generics can help us with them.

One particular type of change is supporting new data types.

In many cases, the main application/domain logic doesn’t change much when we add new data types. If we can separate the logic that is data-independent from the logic that is data-dependent, we can reuse a lot of code when we add new data types.

Also, such separation helps us with changing the internal structure of data types by minimizing the number and size of behavior units that need to change as a result of such changes.

Generics in C# allow us to create data-independent behavior units that can be used for different data types.

This article was technically reviewed by Damir Arh.

Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+
Further Reading - Articles You May Like!
Author
Yacoub Massad is a software developer that works mainly with Microsoft technologies. Currently, he works at Zeva International where he uses C#, .NET, and other technologies to create eDiscovery solutions. He is interested in learning and writing about software design principles that aim at creating maintainable software. You can view his blog posts at criticalsoftwareblog.com


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!