DotNetCurry Logo

Migrating from SQL Server to NoSQL using Azure DocumentDB

Posted by: Mahesh Sabnis , on 5/12/2016, in Category Microsoft Azure
Views: 12114
Abstract: Moving from a Relational database to NoSQL is a difficult decision. Azure DocumentDB comes handy here to export-import data from any data source like SQL Server to DocumentDB

RDBMS have held the bulk of the world’s enterprise data for over three decades now. For developers working on the Microsoft stack, SQL Server is clearly the de facto relational database server of choice. However over the past few years, RDBMS have been struggling to keep up with the variety and volume of data exchanged in this modern web application tsunami. The type and format of the data uploaded from such web applications varies and is mostly schema independent. In such cases, the data store must also be schema-free and more importantly, flexible. That’s where developers have shown interest in schema-free No-SQL Databases.

This article is published from the DNC Magazine for Developers and Architects. Download this magazine from here [Zip PDF] or Subscribe to this magazine for FREE and download all previous and current editions

DocumentDB, by Microsoft, is a schema-free NoSQL Database. It is a fairly simple, fast database with the capability of data read/write operations with SQL Query support. DocumentDB is designed for modern web and mobile applications. If you are new to DocumentDB, please read my previous article - Working with NoSQL using Azure DocumentDB at http://www.dotnetcurry.com/windows-azure/1262/documentdb-nosql-json-introduction .

To use DocumentDB, we must have an active Microsoft Azure Subscription. The subscription can be purchased or a free trial subscription can be used. To use the DocumentDB Service, we must create a DocumentDB Account. The resource model of DocumentDB can be represented using the following image:

azure-documentdb-architecture

A DocumentDB Account can have multiple Databases in it. Each database has users and permissions. The Database further contains collections. Each collection can have Stored Procedures, Triggers and User Defined Functions (UDF). The collection contains actual Document data. This is stored in JSON form.

The data document in DocumentDB can be created using following mechanisms:

In this article, we will use Microsoft Azure Portal to create a DocumentDB Account and Database. To create collection in this database, we will make use of the DocumentDB Data Migration Tool.

Using the DocumentDB Data Migration Tool

The DocumentDB Data Migration Tool is an open source project and can be downloaded from this link. This tool is used to migrate data from various data sources into DocumentDB. Some of the data sources supported by this tool includes SQL Server, CSV files, JSON files, MongoDB, Azure Table Storage, HBase, Amazon DynamoDB, and DocumentDB.

For our example, we will use SQL Server Data Source to migrate data to DocumentDB. When we use the SQL Server Data source, we need to make some changes in the relational data form, so as to easily migrate it to DocumentDB. Typically, these changes includes creation of an id field property for the primary key, reading data from the source table/view based on condition (where clause), and so on. We will implement these changes in our next steps.

Step 1: Visit Microsoft Azure Portal, login with your credentials. Create a DocumentDB account using New > Data+Storage > Azure DocumentDB panel as shown in the following image:

create-new-documentdb-account

Enter DocumentDB Account Details as shown in the following image:

dcoumentdb-account-name

Add the database in the account using the Add Database option as shown here:

azure-add-database

add-database-name

Step 2: Once the account is created, it’s time to make some changes in SQL Server as discussed earlier. We will make use of a database created using the following script:

Create Database Company

USE [Company]
GO

/****** Object:  Table [dbo].[Department]    Script Date: 1/6/2016 11:23:03 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Department](
    [DeptNo] [int] NOT NULL,
    [Dname] [varchar](50) NOT NULL,
    [Location] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED 
(
    [DeptNo] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

USE [Company]
GO

/****** Object:  Table [dbo].[Employee]    Script Date: 1/6/2016 11:23:13 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Employee](
    [EmpNo] [int] NOT NULL,
    [EmpName] [varchar](50) NOT NULL,
    [Salary] [int] NOT NULL,
    [DeptNo] [int] NOT NULL,
 CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED 
(
    [EmpNo] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[Employee]  WITH CHECK ADD  CONSTRAINT [FK_Employee_Department] FOREIGN KEY([DeptNo])
REFERENCES [dbo].[Department] ([DeptNo])
GO

ALTER TABLE [dbo].[Employee] CHECK CONSTRAINT [FK_Employee_Department]
GO

USE [Company]
GO

/****** Object:  View [dbo].[DeptEmp]    Script Date: 1/6/2016 11:23:35 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE VIEW [dbo].[DeptEmp]
AS
SELECT dbo.Employee.EmpNo, dbo.Employee.EmpName, dbo.Employee.Salary, dbo.Department.Dname, dbo.Department.Location
FROM     dbo.Department INNER JOIN
                  dbo.Employee ON dbo.Department.DeptNo = dbo.Employee.DeptNo

GO 

This SQL Server Company database contains Department and Employee tables and a DeptEmp View. This view shows data based on a JOIN of two tables. Insert some test data in Department and Employee tables. Once the Select Query in this View is processed, the data can be viewed as shown here.

view-execution

As you can see, we have 40 rows. Here each row from the above result can be mapped with a JSON document in Document DB. We need to export data from SQL server, as well as shape our hierarchical JSON document. In our case, we will use the Department Name (Dname) as document and other columns like EmpName, Salary and Location as hierarchical properties of each document. To implement this we will define the shape using the following script.

select cast(EmpNo as varchar) as id,
        EmpName as employeeName,
        Salary as salary,
        Location as workLocation
from DeptEmp
where Dname = 'HRD'

Since the JSON document has the id property as string, we are casting the EmpNo to id as string.

The result returned will be as shown in the following image:

share-data-view

Number of rows returned are 9.

Note: Feel free to use the Adventure Works Database which can be downloaded from this link.

Step 3: Unzip the Data Migration Tool to reveal two executable files

· dt.ext - command line utility.

· dtui.exe - UI which will provide the wizard for migration.

Step 4: Start the tool, click ‘Next’ on the Welcome page. On this page, we need to select the Source Information. The Import from dropdown shows all available option. Select SQL from the dropdown list. You should now see the Connection String textbox. Enter the connection string. Click on the verify button to verify the connection string. Select Enter Query radio button and enter the query in it as shown in the following image

dmt-sql-details

Click on Next.

This is the Target Information window where we need to set the information of the DocumentDB target. Here Export to drop down shows options to export to DocumentDB or JSON file. This option has the following parts:

- DocumentDB Bulk import - Here the migration tool creates a stored procedure, which is called by the migration tool to import data in batches. Once the import is completed, the stored procedure is released.

- DocumentDB Sequential Record import - The data from the data source is imported record-by-record.

- JSON File - This export contains array of JSON documents in JSON file. The data from the source is exported to this file to store data in JSON form.

Select DocumentDB Bulk Import. Now we need to enter the DocumentDB Connection string. This string has a unique format and we need to copy it from the Azure portal. This is a connection point to the DocumentDB Account, so we need to manually add the database name to the connection string.

In the portal for the DocumentDB account panel, click on ‘key’ icon as shown in the following image:

dcoumentdb-account-key

This will open the Keys panel, where we have the following:

· URI - Endpoint URL of the account to be connected

· PRIMARY KEY - authorization key used by the client application to connect to DocumentDB using REST and .NET, JavaScript, Node.js client SDKs. This is like UserName and Password for the account.

· SECONDARY KEY

· PRIMARY CONNECTION STRING - includes the Endpoint URL and authentication information. We need the primary connection string in our case.

· SECONDARY CONNECTION STRING

The following image shows the Keys panel:

keys

Copy the PRIMARY CONNECTION STRING and paste it in the Connection String textbox of the migration tool. Here if we click the Verify button, we will get an error because the database name is not specified. Add the Database name at the end of the connection string after (;) e.g. Database=DeptEmpInfoDB. If the verify button is clicked, a successful connection message will be displayed. In the Collection textbox, enter the collection name. Here we can enter a name as per our choice. If it is not already present in the DocumentDB database, it will be created. Add a name and click on the Add button. The Partition Key textbox is needed if we want to put the exported data into multiple collections. The Collection Pricing Tier can be chosen as per our need. We have S1, S2, S3, and S1 is the cheapest. The Id Field textbox can be set with the property from the source to be used as id property, but in our case, we have already set the id property in the query. The following image shows the data entered so far

migration-tool-export-part1

Click on the Advanced Options. From here we can select the Indexing policy. This policy is based on the Where, Order By clause in the query. Right-click on the Enter Indexing Policy textbox and select the Range option as shown in the following image

advanced-options-part1

Click on ‘Next’. In the Advanced page, click again on ‘Next’ to see a summary. In the Summary page, click on the Import button. This will start the data importing process as shown in the following image

import

Once the import is complete, the following result will be displayed:

import-res

Step 5: Go back to the portal and in the DeptEmpInfoDB database, you will find that the EmpDeptCollection collection is created with the exported data in the document explorer.

import-res-portal

We can see that 9 records were imported from SQL Server database. Click on any Document record (e.g. 101) and a JSON document will be displayed as shown in the following image.

import-res-final

And that’s how it is done. FYI, SQL Server 2016 has built-in JSON support.

Conclusion

Moving from a Relational database to NoSQL is a difficult decision to make and involves many decision points. But once you have decided, tools like the DocumentDB Data Migration tool, comes in handy to export-import data from any data source to DocumentDB.

Download the entire source code from our GitHub Repository at bit.ly/dncm24-sql-nosql

Was this article worth reading? Share it with fellow developers too. Thanks!
Share on LinkedIn
Share on Google+
Further Reading - Articles You May Like!
Author
Mahesh Sabnis is a DotNetCurry author and Microsoft MVP having over 17 years of experience in IT education and development. He is a Microsoft Certified Trainer (MCT) since 2005 and has conducted various Corporate Training programs for .NET Technologies (all versions). Follow him on twitter @maheshdotnet


Page copy protected against web site content infringement 	by Copyscape




Feedback - Leave us some adulation, criticism and everything in between!