DotNetCurry Logo

Designing a Lossless Information System with a Proof Of Concept in ASP.NET, Entity Framework and Ninject

Posted by: Sumit Maitra , on 12/17/2013, in Category ASP.NET
Views: 78284
Abstract: This article introduces a database design pattern with a POC in ASP.NET, Ninject and Entity Framework that ensures data once inserted into the database is never lost, a must have for financial systems.

Almost a year and half ago, Clemens Vasters (Architect at Microsoft) tweeted saying both Updates and Deletes (in databases) result in Destruction of data. This struck me as a very interesting take on maintenance of Historical data. It turns out, it’s not an Architectural fantasy but a hard bound requirement in the Finance world where ‘destruction’ of any type of data is deemed a criminal offense.

Ever since that Tweet from Clemens, I have been meaning to explore this pattern. Finally, when suggestions for an Architecture focused article in the magazine popped up, I decided to do a deep dive in what I refer to as - a Lossless Information System design.

A Lossless Information System (LLIS) is one that ensures that every data operation in it, is tracked and no information is ever deleted (archived, but not deleted). Projects in Finance domain are the biggest targets for Lossless Information Systems because of stringent Audit trail requirements. This type of data is often referred to as immutable data. Today we will see how we can build such a system using a traditional RDBMS solution.

The DotNetCurry .NET Magazine – A Free High Quality Digital Magazine for .NET professionals is published once every two months. Subscribe to this eMagazine for Free and get access to hundreds of free tutorials from experts

A Little History of LLIS

LLIS is not a new concept and has been academically researched extensively in the 1980s and 90s. One of the notable works on LLIS is an IEEE paper by Bhargava, G and Gadia S.K. Their paper was published in IEEE journal – IEEE Transactions on Knowledge and Data Engineering in Feb 1993 and this article leans heavily on the paper for theoretical relevance.

For simplicity, we will use the scenario where we need to save User Account information in a way that every change is tracked, leaving us an easy to trace history trail. Our sample schema will save UserAccount + UserDepartment association information.

originalschema

As we see above, the two tables store information about a user and their department. [UserDepartment.UserID] is the foreign key (FK) to the UserAccount table.

Traditional Usage and limitations

Traditional Usage of the above schema is to simply timestamp on creation and modify the UpdatedBy timestamp at the time of updates. It provides a creation history and the last update history. It depends on backend archival and backup systems to retrieve old data. This suffices where audit requirements do not mandate exhaustive history. However, Finance domain audit requirements often mandate that data once entered in a system, should never be destroyed. This implies every bit of data be traceable. This gives us the initial impression that data cannot be deleted or in other words the DELETE operation is restricted. However, if we look deeper we will realize that even updating a value from False to True is in fact ‘destruction’ of data. The value of False is now ‘Replaced’ and hence no longer retrievable and thus considered ‘destroyed’. This extends our restrictions to DELETE and UPDATE operations.

In a world of transactional RDBMS when CREATE, READ, UPDATE and DELETE is the norm, taking out UPDATE and DELETE kind of throws a major spanner in the works, at least on the outset. Let’s see what we can do to try and conform to this ‘requirement’.

Modifying Schema for History Tracking

losslessschema

As we see above, we have split each table into an Id table and an Operations table. The ID table has what is referred to as non-volatile data (because a User.Id value is immutable throughout the lifetime of the entity). The volatile data, data that is updated in the lifetime of the entity, is moved to the Operations table. Also notice the Action column that has been added to the volatile data table. This can have three values C(reate), U(pdate) and D(elete), depending on the action. Id is the auto-increment PK.

Database Operations

Given the above changes, let’s see how we can do the CRUD operations or their equivalents.

CREATE Operation

This involves Inserting a row of data in the Id table. This gives us the Primary Key to be used in the Operations table. Once we have the Primary Key, we insert the volatile data in the Operations table with the action flag value of ‘C’ (for create).

UPDATE Operation

To update the data entered above, we do another INSERT operation. However we need to touch (the volatile) data that’s in the operations table only. So an UPDATE operation actually does an INSERT with the Action flag set to ‘U’. All fields that have been updated in that operation are INSERTed in the operations table.

DELETE Operation

To delete data, we essentially mark an entry as deleted by inserting a row in the operations table with the Action flag ‘D’.

READ Operation

Now that we have seen all the update operations, we have one challenge to overcome. That is SELECT. Since we have multiple possible rows of volatile data (in the operations table) for each non-volatile data item, the SELECT operation needs to be able to select the Active or the most recent ‘row’ of data. Max value of the Id column in the operations table, for a given Foreign Key (UserAccountId or UserDepartmentId) will always have the latest value. So in our filter, we check for the Max value of the Id of the Operations table for each row in the Id table. Next we check if the latest field’s Action is ‘D’ and exclude all rows marked as such. This returns us the last created/updated data that is not deleted.

For easy access to this set of data, we create a View on top of the Id and Operations tables with the above filtering constraints.

Implementing the LLIS

Verifying data at SQL Server

Before we get into MVC and EntityFramework, let’s validate our design by entering some data in our tables. Let’s say we want to Create a new User ‘Sumit’ at location ‘Pune’, we would fire the following two queries

INSERT INTO UserAccounts (CreatedById, CreatedDate)
Values(1, GETDATE());

INSERT INTO UserAccountOperations
           ([UserAccountId]
           ,[UserName]
           ,[Address]
           ,[Action]
           ,[UpdatedById]
           ,[UpdatedDate])
     VALUES
           (@@IDENTITY
           ,'Sumit'
           ,'Pune'
           ,'C'
           ,1
           ,GETDATE())

The UserAcccountOperations table is using the Identity key from the UserAccounts table and updating itself with the required data and status flag as ‘C’

Once this data is entered, execute the following query

SELECT
UserAccounts.Id, UserAccounts.
CreatedById, UserAccounts.
CreatedDate, UserAccountOperations.
Id
AS
OperationsId, UserAccountOperations.UserName, UserAccountOperations.Address, UserAccountOperations.Action, UserAccountOperations.UpdatedById,
UserAccountOperations.UpdatedDate
FROM           
UserAccountOperations INNER JOIN UserAccounts ON
UserAccountOperations.UserAccountId = UserAccounts.Id
WHERE       
(UserAccountOperations.Action <> 'D') AND
(UserAccountOperations.Id =
(SELECT MAX(Id) AS MaxId
  FROM UserAccountOperations AS UAO
  WHERE (UserAccountId = UserAccountOperations.UserAccountId)
)
)


This query returns the following data

create

Now we have two rows in the Operations table for the same user, but if we use the query for our View, we will get only the last UPDATED row.

To delete this row of data we again do an INSERT operation into UserAccountOperations table with the ActionFlag set to ‘D’.

INSERT INTO UserAccountOperations
           ([UserAccountId]
           ,[UserName]
           ,[Address]
           ,[Action]
           ,[UpdatedById]
           ,[UpdatedDate])
     VALUES
           (@@IDENTITY
           ,'Sumit'
           ,'Mumbai'
           ,'D'
           ,1
           ,GETDATE())

Now when we use the ‘View’ query, we end up with no rows of data even though we actually have three historic rows of data. The following query shows us the History

SELECT * FROM UserAccountOperations WHERE UserAccountId = 1

completehistory

Creating an ASP.NET Application for a Proof of Concept

projectstructure

We create an ASP.NET WebAPI Solution and add three Class library projects to it.

Along with the View project, we have the CompositionRoot for mapping dependencies, Data for interacting with Database using EntityFramework, Domain layer for defining Business Entities and constraints. In the View layer we make use of Ninject as our Dependency Injection framework.

Generating the Data Layer using EF Database

Since we designed the DB Schema first, we’ll use the Entity Framework EDMX Designer to reverse engineer our data layer. Once we have the DB Diagram we right click on it and add the Code Generation T4 file EF5.x Db Context Generator. This T4 template ensures the entities generated via the EMDX are EF Code First compliant.

addcodegenerationitem

Setting up Repository

In the Domain class, we’ll setup the Repository Interface ISqlServerRepository

public interface IUserDataRepository
{
User CreateUser(User newUser);
User UpdateUser(User existingUser);
User SelectUser(int userId);
IEnumerable SelectAllUsers();
IEnumerable SelectAllUsersHistory();
IEnumerable SelectUserHistory(int userId);
bool DeleteUser(int userId);

UserDepartment CreateUserDepartment(UserDepartment newUser);
UserDepartment UpdateUserDepartment(UserDepartment existingUser);
UserDepartment SelectUserDepartment(int userId);
IEnumerable SelectAllUserDepartments();
bool DeleteUserDepartment(int userId);
}

Here User and UserDepartment are domain objects that don’t map to the database schema, instead they encapsulate data that the User and UserDeparment entities should contain from a business perspective.

In the Llis.Data library that contains the EF Code first data objects, we implement the IUserDataRepository in SqlServerRepository class. This is where we deviate from traditional implementations. Instead of using the DBContext to call the respective operations, we change the implementation such that the Updates and Deletes are replaced by Insert statements and the Read/Select operation uses the View Poco to retrieve required data.

Setting up Ninject

To effectively use the Repository pattern, we will go with a DI framework. Ninject is pretty easy to use and get started with. If you are new to it, read Dependency Injection Using Ninject In ASP.NET MVC. In the Llis.View project add the Ninject package via Nuget. The Console command is

Install-package Ninject

Next we add the MVC extensions, these are a part of the Ninject.MVC3 package. Don’t be alarmed they work with MVC4 just fine.

Install-package Ninject.MVC3

Now that these two dependencies are in place, open the App_Start\NinjectWebCommon.cs file that’s added by the Ninject.MVC3 package. This class is responsible for hooking Ninject on to MVC as the DI Framework. Here update the empty RegisterServices method as follows

private static void RegisterServices(IKernel kernel)
{
kernel.Load(Assembly.GetExecutingAssembly(),
  Assembly.Load("Llis.Domain"),
  Assembly.Load("Llis.Data"),
  Assembly.Load("Llis.CompositionRoot"));
}

We are providing Ninject will all possible dependencies in that might have the Concrete implementations of interfaces that we may need. Next in the Llis.CompositionRoot project we add the NInject package first and then add a class called DependencyMapper as follows:

public class DependencyMapper : NinjectModule
{
public override void Load()
{
  this.Bind().To();
}
}

The CompositionRoot library is traditionally where an IoC container composes objects. In our case, the Interface and Implementation are in different libraries (justifiably so) as we don’t want the View layer to carry any direct dependency of the Data layer. We use the CompositionRoot to bridge this problem.

Please note, all Interfaces whose concrete instances are required, need to be registered here.

Setting up an Index and History View

Now that we have setup the backend, let’s create a couple of Views, one for the Index and one for the History of User data. The data is returned to us via the Implementations in the repository. Their implementations are as follows:

IEnumerable SelectAllUsers();
IEnumerable SelectAllUsersHistory();

public IEnumerable SelectAllUsersHistory()
{
List entities = new List();
using (llisdb _context = new llisdb())
{
foreach (var item in _context.UserAccounts.Include("UserAccountOperations"))
{
  foreach (var operItem in item.UserAccountOperations)
  {
   entities.Add(new Domain.Entities.User
   {
    Id = item.Id,
    Address = operItem.Address,
    OperationId = operItem.Id,
    UserName = operItem.UserName,
    Action = operItem.Action,
    CreateDate = item.CreatedDate,
    UpdatedDate = operItem.UpdatedDate
    });
   }
  }
}
return entities.OrderBy(k => k.OperationId);
}
public IEnumerable SelectAllUsers()
{
List entities = new List();
using (llisdb _context = new llisdb())
{
  var items = _context.UserAccountsViews.ToList();
  foreach (var operItem in items)
  {
   entities.Add(new Domain.Entities.User
   {
    Id = operItem.Id,
    Address = operItem.Address,
    OperationId = operItem.OperationsId,
    UserName = operItem.UserName,
    Action = operItem.Action,
    CreateDate = operItem.CreatedDate,
    UpdatedDate = operItem.UpdatedDate
   });
  }
}
return entities;
}

As we can see, SelectAllUsersHistory actually fetches data using the User (and UserOperations) tables. This returns us the complete history of the data. SelectAllUsers on the other hand, uses the UserAccountView that returns only the latest data.

From the given script (UserData.sql) if we run the INSERT queries only, we get data as follows:

newlyinserted

As we can see in the image, there are 4 Users and they have four corresponding entries in the history table. All their actions are ‘C’ (for Create). Next if we run the Update portion of the included script and refresh the above two page we’ll see data as below

afterupdates

As highlighted in the Users List, the User ‘Sumit’ seems to have moved to ‘Pune’ and the User ‘Akhi’ has moved to ‘Milpitas’.

However if we check the history table User ‘Sumit’ in fact made a pit stop at ‘Milpitas’ before heading out to ‘Pune’. We can also see the time at which the changes were made and in the database we have the UserId to show who made the change.

As a result we now have an easy to maintain Audit trail of our User Data.

Conclusion

Trying to maintain a lossless system has many approaches like copying data over to archive before updating or using built in database transaction logs and so on. When business requirements impose strict restrictions on what DML operations can be allowed, the above explained design can be used to build a simple Loss Less Information System.

There are two major considerations, first being volume of transactions and second being performance. There will be a trade-off, however by baking in reasonable performance expectations with stringent archival policy can ensure performance never goes below given SLA and system remains responsive for the longest period.

Source code for this sample can be downloaded from Github

Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+
Further Reading - Articles You May Like!
Author
Sumit is a .NET consultant and has been working on Microsoft Technologies since his college days. He edits, he codes and he manages content when at work. C# is his first love, but he is often seen flirting with Java and Objective C. You can follow him on twitter at @sumitkm or email him at sumitkm [at] gmail


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!
Comment posted by David on Tuesday, December 17, 2013 7:46 AM
Another domain where data cannot be removed is in the medical records world of HIPAA requirements.

Speaking from experience in that world the method that you are using to determine the "Active" SQL record will not work when the number of records starts to have anything approaching real life volumes.  We found that the best way to approach it, that the auditors and legal personages accept, is to add another field to *Operations table, IsCurrent. This is a bit field, that is only updatable via the ON INSERT trigger.  Prior to the insert, all existing records for the primary ID (i.e. UserAccountId) are updated with the IsCurrent to false, then the new record is inserted with the IsCurrent set to true.  Note that the view against the tables will be orders of magnitude faster.  For more performance, add a composite index that is ( UserAccountId, IsCurrent ).  Also note that the view result set, and the insert schema do not include the IsCurrent field as this field can only be set in the trigger.
Comment posted by Edd on Tuesday, December 17, 2013 10:47 AM
Good point David
Comment posted by Greg Engle on Tuesday, December 17, 2013 10:55 AM
My personal opinion is I believe CQRS with Event Sourcing would be a more attractive solution in this situation.  In such a design, the read model doesn't need to suffer the over-complexities of the archival requirements. Leaving the archival requirements inside the Event Steam will allow appropriate SCD2 dimentions/OLAP cube creation if needed.
Comment posted by Vasil Germanov on Wednesday, December 18, 2013 4:04 AM
I thought LLIS stood for lesson learned information system. Nevertheless it was a good read.
Comment posted by IT-IT on Wednesday, December 18, 2013 8:38 AM
Nothing new in this, was doing it years ago. You have to maintain data, you architect a solution. MVC and EF just wrappers/links to the REALLY important layer ie the database.
Comment posted by nick on Thursday, January 9, 2014 9:57 AM
very good stuff. now implement Mock<Interface> to show without a database how the concept works before needing a db.
Comment posted by Mousavi on Thursday, January 9, 2014 12:01 PM
Thank you, It was good for a newbie like me.
Comment posted by Sekhar on Sunday, January 26, 2014 4:52 AM
Good solution to start with but in wrong direction.

Greg made a good point. The upstream systems should not be effected by the archival policies. This only adds complexities to the entire system.

You need to separate the history repositories from the actual ones. Archival system should receive information by an asynchronous event based mechanism.