Shipping Pseudocode to Production

Posted by: Dobromir Nikolov , on 6/1/2019, in Category Patterns & Practices
Views: 26037
Abstract: The article talks about type expressiveness and how we can use the functional Either monad in C# to implement complex business logic that reads as simply as a description of sequential operations. We’ll also look at a technique that allows us to express a complex scenario involving multiple systems as pseudocode and then convert that pseudocode into something we can actually ship to production.

Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.

-Grady Booch - author of Object Oriented Analysis and Design with Applications

We’ve all heard that line. It seems intuitive and obvious. Everyone strives to write clean code, everyone strives towards simplicity and maintainability, or at least everyone claims to do so.

Sadly, if it were true, we wouldn’t have ended up with these huge unmaintainable messes that everyone is complaining about.

If it were true, we would’ve never ended up with 5000 line functions containing critical business logic that no one seems to be aware of. (if you haven’t seen that, good for you)!

What’s the cause? Tight deadlines? Obscure requirements? Incompetent developers?

Of course, it may be any or all of these, but in my opinion the most common cause that results in unmaintainable code is a wrong mindset. A “code-first” mindset that pushes the developers to just start patching around the codebase without thinking too much about what it is exactly that they’re trying to accomplish.

What if we could establish a different mindset? And even better, what if we could make our codebase push us towards following that mindset, as if the code itself is working with us to achieve a common goal?

Disclaimer: This article concerns enterprise development, where priority is maintainability and robustness, some of the points made will be invalid for performance critical applications.

Writing Code - The problem

What does this function do?

RowData ParseRow(ExcelRow row);

Parse an excel row, right?

Looks like it, but excel rows are unstructured user input. You know there are going to be columns, but there are no guarantees about the content inside them or how many there are going to be.

Now, I’m no expert, but I’ve done a fair amount of work with unstructured user input and one thing is for certain - you can’t trust it!

If you can’t trust something, then you must think about handling the error case. What if this row is not in the format you’re expecting it to be? Do you throw an exception? Return null?

Sadly, the function definition itself is not telling us any of these things. If the ParseRow function came from an undocumented closed source library and we didn’t have access to the code, the only way of understanding what it does in case of an error is - by trial and error.

Luckily, we’re the ones implementing ParseRow, so we have the luxury of following best practices.

In the case of errors, we’re not going to return null, as that is error-prone and we’ll also lose the context that the error occurred in. What we’ll do instead is throw a custom ParsingException. This exception is going to contain all of the relevant information that we could use while debugging/displaying error messages.

Parsing an excel file now looks like this:

try
{
   ExcelRow[] rows = ReadExcelRows(excelFile);

   RowData[] parsedData = rows
       .Select(ParseRow)
       .ToArray();

   return parsedData;
}
catch (ParsingException e)
{
   // Handle
}

Nothing complicated. The code sits there and works quietly for a while now.

Then comes John. John is the manager for the service delivery team.

He says that he loves the functionality, but now that the company has grown and the Excel files his team is inputting are larger, they are losing a lot of time by stopping the process on the first broken row (remember that’s what the exception will do).

John asks if they could receive all errors at once as some sort of a report.

Good point, John! Definitely sounds like a reasonable request. Let’s get to work!

Refactoring

 

So we’ll have to aggregate all of the errors. The most straightforward solution is to add an Error property to the RowData class like so:

class RowData
{
  ...
  string Error { get; set; }
  ...
}

Fortunately, we’re serious professionals, we don’t joke around with the single responsibility principle, so we create a ParsingResult class.

class ParsingResult
{
  RowData Data { get; set; }
  string Error { get; set; }
}

The ParseRow function now looks like this:

ParsingResult ParseRow(ExcelRow row);

Extracting the errors is now simple. No try-catch blocks:

ParsingResult[] parsedData = rows
    .Select(ParseRow)
    .ToArray();

string[] errors = parsedData
    .Select(r => r.Error)
    .ToArray();

Here’s the catch.

What if r.Error is null? Of course, we can always add a .Where clause, but this is way too easy to forget.

And what if we want to use the ParseRow function on one row only?

ExcelRow row = ReadExcelRow();
ParsingResult result = ParseRow(row);

if (string.IsNullOrEmpty(result.Error))
{
   // Do something with result.Data
}
else
{
   // Handle the error
}

Smell anything? What if someone forgot the if statement and went straight for result.Data? He’d get a NullReferenceException.

We’ve “improved” the design by including the potential error inside the ParsingResult class, which has allowed us to easily generate an error report, but the cost is that now every time someone calls the ParseRow function, he’s required to not forget to check result.Error or result.Data for null.

This is a code smell, and a serious one. As the codebase grows there will be hundreds of situations where someone will be required to not forget to check for null/some other condition. Just think about how many bugs you’ve fixed by adding yet another if statement or a try-catch block.

If anything can be forgotten, it will be forgotten.

 

Why did this happen? What was wrong with our approach?

Expressiveness

The answer is lack of expressiveness.

If the ParseRow function was expressive enough, then it would’ve simply told the caller that they are supposed to handle two cases - one for the error and one for the success. We wouldn’t have had to rely on someone not forgetting to check for null or some other condition.

Functions telling the caller what to do definitely sounds great, but what does it mean exactly? How can a function be expressive? Let’s take a look.

Our first expressive function

 

Let’s go through another example of an operation that is likely to fail - performing an HTTP request.

The HTTP protocol has 62 status codes defined, and only 10 of them mean success. 84% of the status codes mean some sort of an error. This means something. If we were to model a function that performs an HTTP request, it better communicate clearly with the caller about how likely it is to fail.

But how can a function communicate?

Well, let’s think about all of the possible ways:

- Through its name - We obviously can’t name our function PerformAnHttpRequestAndRememberItsLikelyToFail (I know we can, but we can’t), so the name won’t help us much.

- Through its arguments (and their names) - There is the possibility of adding successHandler and errorHandler function arguments, but what is the function going to return then (void?)? It seems like a decent approach, but we can do better.

- Through its return type - Names and arguments won’t do the job, but what about the return type? What if we could model a type that represents an operation with two possible outcomes. Then the compiler could force us towards handling both the success and error case and we would’ve eliminated the easily forgettable if statements.

We’ll take the return type approach as it seems to fit our needs nicely. Let’s define some requirements for the type we’ll be modeling:

1. It should be able to contain 2 other types - one that is going to be returned if the operation went successfully and another one if it failed

- In the HTTP request case, we would like one type of response if the server returned a success status code (in the form of 2XX) and another one if it failed

2. It should never be null

- It makes no sense for a type that signals a likely error to be null as that will lead to even more errors

Reading the requirements, it becomes obvious that we’ll need some sort of a generic struct.

public struct Either<T, TException>
{
  // Constructors omitted for brevity
  public bool IsSuccessful { get; }

  public static Either<T, TException> Success(T value) =>
      new Either<T, TException>(value);

  public static Either<T, TException> Error(TException exception) =>
      new Either<T, TException>(exception);
}

Using the newly defined Either type, we can model our HTTP request function like so.

Either<SucessfulHttpResponse, HttpError> PerformHttpRequest(...);

It reads pretty nicely. We either get a SuccessfulHttpResponse (just some arbitrary type to illustrate a point) or an HttpError. But how do we make use of it?

Either<SucessfulHttpResponse, HttpError> responseResult =
       PerformHttpRequest("GET", "http://someurl.com/");

// ????

We obviously can’t access two values at once, but the good thing is we won’t have to. The Either type signals that there are two possible outcomes. The actual outcome will be known in runtime.

As you probably saw, we didn’t expose direct value accessors when defining the Either type. What we’ll do instead is “force” the consumer to always handle both the success and the error case during compile time through a Match function.

public struct Either<T, TException>
{
    public bool IsSuccessful { get; private set; }

    public void Match(Action<T> success, Action<TException> error)
    {
        if (IsSuccessful)
            success(value);
        else
            error(exception);
    }

    public TResult Match<TResult>(Func<T, TResult> success, Func<TException, TResult> error) =>
        IsSuccessful ? success(value) : error(exception);
}

Match abstracts away the success/error condition and forces us to handle both cases. Now if someone tried to handle only one case (like going straight for result.Data in the Excel parsing example), they would get a compiler error.

Either<SucessfulHttpResponse, HttpError> responseResult =
       PerformHttpRequest("GET", "http://someurl.com/");

// Compiler error
responseResult.Match(
    some: response => ...
);

The only correct way to consume an Either type (for now) is by always providing both handlers.

Either<SucessfulHttpResponse, HttpError> responseResult =
   PerformHttpRequest("GET", "http://someurl.com/");

responseResult.Match(
   some: response => /* Happy path */,
   none: error => /* Sad path */
);

This seems good enough for a simple case when you only need to perform one request, but what about when you need to perform a series?

In an imperative approach, performing consecutive requests that depend on each other would look something like:

// Note that HttpResponse is just an arbitrary type to illustrate a point
HttpResponse response = HttpRequest(...);

if (response.IsSuccessful)
{
   HttpResponse anotherResponse = HttpRequest(... response.Data);

   if (anotherResponse.IsSuccessful)
   {
       HttpResponse thirdResponse = HttpRequest(... anotherResponse.Data);

       if (thirdResponse.IsSuccessful)
       {
           ...
       }
   }
}

A crazy amount of nesting and the need of having to always remember to check whether the response is successful. With our Either type, we’ve eliminated the need to remember to check the IsSuccessful property, but the code looks like this:

var responseResult = HttpRequest(...);

responseResult.Match(
   success: response =>
   {
       var anotherResponseResult = HttpRequest(... response.Data);

       anotherResponseResult.Match(
           success: anotherResponse =>
           {
               var thirdResponseResult = HttpRequest(... anotherResponse.Data);

               thirdResponseResult.Match(
                   success: thirdResponse
                   {

                   },
                   error: /* Error handling */
               )
           },
           error: /* Error handling */
       )
   },
   error: /* Error handling */
);

Definitely not much better. So what can we do?

Let’s take a step back and look at what we’re doing.

1. Perform a request

2. If the request is successful, perform another request

3. If the consequent request is successful, perform another request

4. ...

Notice anything?

1. Perform a request

2. If the request is successful, perform another request

3. If the consequent request is successful, perform another request

4. If the consequent request is successful, perform another request

There’s a pattern.

And since our Either type doesn’t know about HTTP requests in particular, we can take another step back and just look at it in terms of operations (where operations are nothing but functions).

1. Perform an operation

2. If the request operation is successful, perform another request operation

3. If the consequent request operation is successful, perform another request operation

4. If the consequent request operation is successful, perform another request operation

This pattern can be translated into a function that we can add to our Either type:

public struct Either<T, TException>
{
   public Either<TResult, TException> Map<TResult>(Func<T, TResult> mapping) =>
      Match(
          success: value => Either<TResult, TException>.Success(mapping(value)),
          error: exception => Either<TResult, TException>.Error(exception)
      );

}

Map (also called Select in LINQ) acts the same way as it does on lists. If the Either has a success value inside it (compare this to a list that has elements), it applies a transformation function to it. Otherwise, if it has an exception (compare this to an empty list), it will simply return the exception value in a new “transformed” Either. (similarly, if you map an empty List<string> with int.Parse, the list type would change to List<int>, regardless of whether the List is empty)

 

It may sound confusing, but the concept is very simple. For a type container C<T>, map acts like so:

(C<T>, (T => T2)) => C<T2>

(List<string>, int.Parse) => List<int> (even if it was empty)

(Either<T, TException>, (T => TResult)) => Either<TResult, TException> (even if there was no actual T value inside)

It takes the container and transforms the inner value using the provided (T => T2) function.

Types that implement a map function are called Functors in the functional programming world.

Map allows us to simplify our code like so:

var responseResult = HttpRequest(...);

responseResult.Map(response =>
{
  var anotherResponseResult = HttpRequest(... response.Data);

  anotherResponseResult.Map(anotherResponse =>
  {
      var thirdResponseResult = HttpRequest(... anotherResponse.Data);

      thirdResponseResult.Map(thirdResponse =>
      {
          var fourthResponseResult = HttpRequest(... thirdResponse.Data);

          fourthResponseResult.Map(fourthResponse =>
          {
              ...
          });
      });
   });
});

The need to pass an error handler every time is now gone. It still looks messy, but we can rearrange it a bit:

HttpRequest(...).Map(response =>
HttpRequest(... response.Data).Map(anotherResponse =>
HttpRequest(... anotherResponse.Data).Map(thirdResponse =>
HttpRequest(... thirdResponse.Data))));

Four consecutive requests - four lines of code. And the great part - if an error occurred in any of these chained requests, it will simply stop there and return you the error put inside an Either instance.

Does that chain mean we can define a function similar to:

Either<SuccessfulHttpResponse, HttpError> ChainFour() =>
  HttpRequest(...).Map(response =>
  HttpRequest(... response.Data).Map(anotherResponse =>
  HttpRequest(... anotherResponse.Data).Map(thirdResponse =>
  HttpRequest(... thirdResponse.Data))));

Sadly, no. But we’ll get there.

The problem with defining functions like that lies within the chain’s type. The type of these 4 chained requests is

Either<Either<Either<Either<SucessfulHttpResponse, HttpError>, HttpError>, HttpError>, HttpError>

which frankly, doesn’t look friendly enough to work with.

What’s interesting about this nested type is that although it looks quite intimidating, it will actually contain either only one value or only one error.

1.    HttpRequest(...).Map(response =>
2.    HttpRequest(... response.Data).Map(anotherResponse =>
3.    HttpRequest(... anotherResponse.Data).Map(thirdResponse =>
4.    HttpRequest(... thirdResponse.Data))));

If 1 passed successfully, its result is going to be passed into 2 and then discarded. If 2 passed successfully, its result is going to be passed into 3 and then discarded, and so on. There is only going to be one SuccessfulHttpResponse, that is if the whole chain completed successfully, and there is potentially only going to be one error, that is the first one that occurred.

Therefore, we can perform the following transformation without losing any data:

Either<Either<Either<Either<SucessfulHttpResponse, HttpError>, HttpError>, HttpError>, HttpError>

=>

Either<SucessfulHttpResponse, HttpError>

The nesting appears to be meaningless.

In order to get rid of it, we must first understand why it happened. If you remember, the map function takes a container of type C<T> and applies a (T => T2) function to it.

(C<T>, (T => T2)) => C<T2>

If the transforming function happened to return another C<T>, then we would get a nested type C<C<T>> as it happens in our case.

Or more concretely:

Take an Either<SuccessfulHttpResponse, HttpError>

=>

Map the SuccessfulHttpResponse into Either<SuccessfulHttpResponse, HttpError>

=>

Receive an Either<Either<SuccessfulHttpResponse, HttpError>, HttpError>

The solution to this is a FlatMap function.

public struct Either<T, TException>
{
  public Either<TResult, TException> FlatMap<TResult>(Func<T, Either<TResult, TException>> mapping) =>
      Match(
          success: mapping, // We skip wrapping the value into an Either
          error: exception => Either<TResult, TException>.Error(exception)
      );

}

FlatMap is very similar to Map except that instead of taking any transformation function, it only accepts those that return another Either. This allows us to skip wrapping the result over and over again.

Map:

(C<T>, (T => T2)) => C<T2>

FlatMap:

(C<T>, (T => C<T2>)) => C<T2>

Types that implement a FlatMap function (along with a few other things) are called Monads in the functional programming world.

Now we can simply substitute the Map calls for FlatMap in order for the ChainFour function definition to become valid:

Either<SuccessfulHttpResponse, HttpError> ChainFour() =>
  HttpRequest(...).FlatMap(response =>
  HttpRequest(... response.Data).FlatMap(anotherResponse =>
  HttpRequest(... anotherResponse.Data).FlatMap(thirdResponse =>
  HttpRequest(... thirdResponse.Data))));

We went from the imperative:

HttpResponse response = HttpRequest(...);

if (response.IsSuccessfull)
{
   HttpResponse anotherResponse = HttpRequest(... response.Data);

   if (anotherResponse.IsSuccessfull)
   {
       HttpResponse thirdResponse = HttpRequest(... anotherResponse.Data);

       if (thirdResponse.IsSuccessfull)
       {
           HttpResponse fourthResponse = HttpRequest(... thirdResponse.Data);

           if (fourthResponse.IsSuccessfull)
           {
                
           }
       }
   }
}

To the functional:

HttpRequest(...).FlatMap(response =>
HttpRequest(... response.Data).FlatMap(anotherResponse =>
HttpRequest(... anotherResponse.Data).FlatMap(thirdResponse =>
HttpRequest(... thirdResponse.Data))));

Going into production

Now this may seem cool and all, but I seriously doubt that most of you have ran into the exact scenario where you perform consequent HTTP requests that rely on one another. Real life problems look more like this:

psuedocode-to-production

  1. A user uploads a file to a server with a category.
  2. The server checks if the user has rights for this category.
  3. The server stores the file into some cloud storage (AWS S3 for example).
  4. The server takes the cloud storage id and stores it into a database with a newly generated unique key.
  5. The unique key is then sent to an external service for asynchronous processing.

Enter the pseudocode

 

This article is called “Shipping Pseudocode to Production” for a reason.

Let’s see how we can implement this scenario in a robust and readable way using our Either type and some pseudocode.

If you convert the above diagram into pseudocode, it will probably look something like this:

// function ProcessDocument userId, category, file
//   1. check if the user is authorized
//   2. store the file into the cloud
//   3. store database log with unique key
//   4. send the unique key to an external service for post-processing

We can clearly see there are 4 distinct operations. For each operation we can model a corresponding function definition that describes it using Either:

1.    Either<User, Error> CheckIfUserIsAuthorized(string userId, string category);
2.    Either<CloudRecord, Error> StoreTheFileIntoTheCloud(File file, string category);
3.    Either<Guid, Error> StoreDatabaseLog(CloudRecord record);
4.    Either<DocumentProcessedResult, Error> SendToExternalService(Guid key);

The implementation of the ProcessDocument function can now be looked upon as an arrangement of these 4 operations. Conveniently, we’ve implemented the “arrangement” mechanism through FlatMap and Map.

Either<DocumentProcessedResult, Error> ProcessDocument(
  string userId,
  string category,
  File file) =>
  CheckIfUserIsAuthorized(userId, category).FlatMap(user =>
  StoreTheFileIntoTheCloud(file, category).FlatMap(cloudRecord =>
  StoreDatabaseLog(cloudRecord).FlatMap(uniqueKey =>
  SendToExternalService(uniqueKey))));

It really is that simple. The workflow behind the implementation is readable even to someone who doesn’t regularly read code.

Imagine if you scaled this to 15 operations. You will not lose any readability, as 15 operations are still 15 lines of code.

If you wanted to expose this function as an ASP.NET endpoint, for example, the usage is quite simple:

public IActionResult ProcessDocument(string category, File file) =>
   _documentService.ProcessDocument(User.Id, category, file)
       .Match<IActionResult>(Ok, BadRequest);

What status code will be returned will be decided by the Match function.

Going back to Grady Booch’s quote:

“Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.”

  1. Simple and direct - check
  2. Reads like well-written(?) prose - check
  3. Doesn’t obscure the designer’s intent - check
  4. Full of crisp abstractions and straightforward lines of control - check

Compare this to an imperative approach:

DocumentProcessedResult ProcessDocument(
   string userId,
   string category,
   File file)
{
   var user = _userRepository.GetById(userId);

   if (user != null)
   {
       var isAuthorized = _authService.IsUserAuthorized(user, category);

       if (isAuthorized)
       {
           try
           {
               var cloudRecord = _cloudStorageService.Store(file, category);

               var databaseStorageResult = _recordsRepository.Save(new DatabaseRecord
               {
                   Key = Guid.NewGuid(),
                   CloudRecordId = cloudRecord.Id
               });

               // ...and so on
           }
           catch (CloudStorageException)
           {
               // Handle
           }
           catch (DatabaseException)
           {
               // Handle
           }
       }
   }   
}

It’s really hard to see the actual workflow, and if we had to add more logic, the function would very soon become bloated to the point of unmaintainability.

The mindset

How does this resonate with the mindset concept I started talking about early in the article?

Since one Either instance represents one operation, using it pushes you towards splitting what you’re implementing into logical chunks. Each logical chunk is a separate operation (function) that doesn’t know anything about where it’s going to stand once put in the big picture.

Compare extending the chained (functional) and imperative implementations of the ProcessDocument function.

Say we want to add an additional operation to the workflow - emitting a DocumentSentForProcessing event.

Although the imperative version follows common best practices - it has abstracted the persistence logic in repositories, the communication logic in services, you are still inevitably thrown inside this huge nested pile of code that doesn’t have clear “borders” around the logical sections of the workflow.

You are required to read attentively and be careful not to break some existing logic. You can’t really “extend” the implementation without getting your hands dirty with what’s already there.

In the chained version, however, adding an additional operation is as simple as adding an additional line of code.

CheckIfUserIsAuthorized(userId, category).FlatMap(user =>
StoreTheFileIntoTheCloud(file, category).FlatMap(cloudRecord =>
StoreDatabaseLog(cloudRecord).FlatMap(uniqueKey =>
SendToExternalService(uniqueKey).FlatMap(documentProcessedResult =>
PublishEvent(new DocumentSentForProcessing(documentProcessedResult))))));

Now what you’re concerned with is only implementing the PublishEvent function correctly. You didn’t have to get familiar with what’s inside the other operations, nor could you break something inside them by not being careful.

Splitting the logic into isolated sections and implementing them one by one helps maintainability, extensibility and debugging. It is much easier to avoid bugs and reason about your code when implementing only one function at a time.

You also get the added benefit of directly translating pseudocode that you have discussed with some non-technical domain expert, for example, into actual code that you can ship to production.

Conclusion

Establishing the mindset of reasoning about your code as a composition of operations helps tremendously with identifying bugs and design issues early on.

This modular mentality, paired together with the Either monad results in code that is readable, maintainable, and scalable in terms of complexity. The readability will make sure that after getting back to the code 6 months later you’ll remember what it’s about and the compile-time errors will keep you from forgetting about the edge cases and handling them adequately.

If you’re interested, you can go to https://github.com/dnikolovv/bar-event-sourcing for a “real” application made using this approach. (note that the Either monad that is used in this project is called Option<T, TException> instead of Either<T, TException>, but its behavior is the same).

This article was technically reviewed by Yacoub Massad.

 

This article has been editorially reviewed by Suprotim Agarwal.

Absolutely Awesome Book on C# and .NET

C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.

We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).

Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.

Click here to Explore the Table of Contents or Download Sample Chapters!

What Others Are Reading!
Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+

Author
Dobromir Nikolov is a software developer working mainly with Microsoft technologies, with his specialty being enterprise web applications and services. Very driven towards constantly improving the development process, he is an avid supporter of functional programming and test-driven development. In his spare time, you’ll find him tinkering with Haskell, building some project on GitHub (https://github.com/dnikolovv), or occasionally talking in front of the local tech community.


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!