Be it social networks, transportation and flight networks, financial traffic routes, fraud detection to website traffic analysis or predicting the likelihood to purchase; Graphs are ubiquitous.
Graph Databases are emerging as a strong technology to solve problems in these areas, where relationships are the first-class entities for the application. These workloads comprise of highly connected data and one must navigate these data points to generate results like recommendations, fraud detections, shortest paths or predicting the best time to run a deal on a retail website.
Are you keeping up with new developer technologies? Advance your IT career with our Free Developer magazines covering C#, Patterns, .NET Core, MVC, Azure, Angular, React, and more. Subscribe to the DotNetCurry (DNC) Magazine for FREE and download all previous, current and upcoming editions.
But, do you need to install a new graph database product to solve these problems?
In this article, we will look at the graph database features introduced with SQL Server 2017 and Azure SQL Database. We will discuss the benefits of using SQL Server Graph, typical graph database use-cases and how you can start using the graph features in your application today.
Editorial Note: If you haven’t already, download SQL Server 2017 or start with a Free Trial of Azure SQL Database subscription to explore the concepts explained in this tutorial.
What is a graph database?
A graph database is a collection of nodes (or vertices) and edges (or relationships). A node represents an entity (for example, a person or an organization) and an edge represents a relationship between the two nodes that it connects (for example, likes or friends).
Both nodes and edges may have properties associated with them.
One may ask, if a graph database is also a collection of entities and relationships, then how is it different from a relational database?
Graph Database vs Relational Database
Here are some features that graph databases offer, which make them unique:
- Unlike a relational database, where foreign keys represent one-to-many relationships, an edge in a graph database can represent many-to-many relationships. An edge can also be heterogeneous in nature, that is, you can connect different type of nodes with each other using the same edge. For example, in the property graph shown above, you can connect a ‘Person’ to a ‘Project’ or a ‘Campaign’ via the works_on edge.
- Relationships become first class entities in a graph database and may or may not have any properties associated to them. For example, in the graph shown in Figure 1, a ‘Person’ works_for a ‘Company’ and you can store the StartDate (the date since when the employee started working for the company) as an attribute of the works_for relationship.
- Graph databases provides you a query language which allows you to express pattern matching and graph traversal queries easily.
- You can heterogeneously connect to various data points and express arbitrary length traversal queries easily.
What is SQL Graph?
SQL Server 2017 and Azure SQL Database now support graph data processing.
SQL Server implements a property graph model and can natively store nodes and edges. The T-SQL extensions also let you write join-free pattern matching and traversal queries for your graph database. In this section, we will review the features that were introduced with SQL Server 2017 and are also available on Azure SQL Database.
Creating a Node or Edge table
Transact-SQL (T-SQL) extensions allow users to create node or edge tables. Both nodes and edges can have properties associated to them.
Since, nodes and edges are stored as tables, all the operations that are supported on relational tables, are also supported on a node or edge table.
Additionally, all types of constraints and indexes are also supported. That is, along with edges, you can also create foreign key relationships between your node and edge tables. Clustered and non-clustered Columnstore indexes are also supported, which can be very useful in analytics-heavy graph workloads.
The following is an example of the new DDL that has been introduced. Note the two new keywords 'AS NODE’ and 'AS EDGE'.
-- DDL for creating node and edge tables.
CREATE TABLE Person (ID INTEGER PRIMARY KEY, Name VARCHAR(100), Age INT) AS NODE;
CREATE TABLE works_for (StartDate DATE, EmploymentStatus VARCHAR(100)) AS EDGE;
For every NODE table, the engine adds one implicit column $node_id to the table, which uniquely identifies each node in the database. Values for this column are automatically generated by the engine every time a row is inserted into the edge table.
For every edge table, the engine adds three implicit columns $edge_id, $from_id and $to_id to the table.
$edge_id uniquely identifies every edge in the database and just like $node_id, values for $edge_id are automatically generated. The $from_id and $to_id implicit columns hold the $node_ids of the nodes that a given edge is connecting to. Users are expected to insert $node_id values into the $from_id and $to_id columns to identify the nodes that the edge is connecting.
Check out some SQL Graph insert examples and best practices.
Traversing or querying the graph
The new MATCH built-in function is introduced to support pattern matching and join-free multi-hop navigation or traversal through the graph.
MATCH uses ASCII-art style syntax for pattern matching. For example, if a user wants to find the names of the people who work for Microsoft, they can write the following query.
-- Find the people who work for Microsoft
FROM Person, works_for, Company
AND Company.Name = 'Microsoft';
In a MATCH pattern, you go from one node to another via an edge. Entities appearing at the two ends of an arrow are nodes and the one appearing inside parenthesis is an edge. The direction of the arrow in MATCH corresponds to the direction of the edge.
Note: Edges in SQL Graph are directed and they always go from one node to another node.
Native to the SQL Server engine
Graph extensions are fully integrated into the SQL Server engine. It uses the same SQL Server storage engine, metadata, and query processor to store and query graph data. This enables users to query across their graph and relational data in a single query.
Users can also benefit from combining graph capabilities with other SQL Server technologies like Columnstore, Availability Groups, R services, and more.
Tools and ecosystem
Users benefit from SQL Server’s existing tools ecosystem. Tools like backup and restore, import and export, BCP and more work out of the box. Other tools or services like SSIS, SSRS or Power BI work with graph tables, just the way they work with relational tables.
Benefits of using SQL Server Graph
1. Relational and Graph on a single platform
SQL Server now provides you the best of both relational and graph databases on a single platform.
Customers do not have to turn to a different product, deal with multiple licensing costs or perform complex ETL operations to take data out of one system and move it to another system for analysis. A single platform can now hold both the relational and graph data.
You can also query across your graph and relational data in the same query.
2. T-SQL extensions for graph traversal queries
The T-SQL extensions let you create and traverse the graph using a known language and in a known environment. You do not need to learn a new language, tools or ecosystem just to process different kinds of data. Your application, which is already talking to the relational database, can query the graph database using the same set of tools.
3. Performance and Scale
As per the TPC-H 10TB benchmark result by Hewlett Packard Enterprise (HPE), SQL Server attained a new world record in database performance for both analytical and transactional workloads? You get the same performance and scale benefits when you use the graph features in SQL Server.
4. High Availability
SQL Server can be easily configured for high availability. On Azure SQL Database, you can ensure high availability with three hot replicas and built-in automatic failover that guarantees a 99.99% availability SLA. You can accelerate recovery from catastrophic failures and regional outages to an RPO of less than five seconds with active-geo replication.
5. Security and Compliance
A defense-in-depth strategy, with overlapping layers of security, is the best way to counter security threats.
SQL Server provides a security architecture that is designed to allow database administrators and developers to create secure database applications and counter threats.
SQL Database helps you build security-enhanced apps in the cloud by providing advanced built-in protection and security features that dynamically mask sensitive data and encrypt it at rest and in motion.
With physical and operational security, SQL Database helps you meet the most stringent regulatory compliances, such as ISO/IEC 27001/27002, Fed RAMP/FISMA, SOC, HIPPA, and PCI DSS.
6. Fully managed on Azure SQL Database
Azure SQL Database is intelligent, fully-managed cloud database service built for developers.
You can accelerate your graph database application development and make the maintenance easy using the SQL tools that you already know. You can also take advantage of built-in intelligence that learns app patterns and adapts to maximize performance, reliability, and data protection.
When to use SQL Graph?
Graph databases shine when relationships between various data points are equally or more important than the actual data itself.
For example, an online retailer may decide to run a marketing campaign based on the connections that people share with other people, locations or weather conditions. Or a bank may study the links between various account holders to identify fraud rings formed by these account holders.
Up until few years ago, graph databases were considered suitable for networking problems, like social networks, transportation networks, and computer networks. But, recently with the advent of big data and evolving modern applications, graph databases are garnering lot of attention in traditionally relational applications like banking, insurance, CRM and more. Organizations running these type of applications need access to both relational and graph databases.
SQL Graph helps keep your graph and relational data on a single platform.
In this section we will discuss some scenarios where graph databases can be useful.
Graph Database Examples
Social Networks are inherently graphs. It is only natural to model them as a graph.
Graph databases not only allow modeling of social networks as graphs, they also let you traverse and query the graph in an intuitive way.
There are many modern applications like online dating, job websites and social media marketing, which depend largely on the connections that people share with each other for better data analysis or to provide services to their customers like providing recommendations. Graph databases make it easy to answer questions like “find me all my friends who are 2-5 hops away from me and who live in Seattle”.
Insurance and banking companies lose millions of dollars to fraud every year. SQL Graph can make it easier to identify fraud transactions that connect back to a stolen device or credit card.
Using SQL Graph, you can also more easily identify fraud rings formed by a group of people who do not share direct connections with each other, but do share some common information like address, phone or SSN in their account details.
SQL Graph can be easily used to generate recommendations.
For example, an airline company may want to generate flight recommendations for a customer booking tickets to a destination, based on “people who travelled to this destination also travelled to these other destinations”, they can use SQL Graph and simply develop a recommendation engine for such queries.
The graph queries are not only easy to understand, but also very easy to maintain and extend to meet the requirements of an evolving application. Here are some recommendation engine examples:
CRM software helps with managing the relationships between sales people, accounts and opportunities. The schema can be modeled as a graph to uncover complex relationships between various entities.
SQL Graph can help with analyzing highly interconnected data and understand the correlation between different entities in the database, for example, “which campaign was most effective in converting opportunities into accounts”.
SQL Graph can also help the sales people find the shortest path that exists between them and another sales person in the organization who can help them run a campaign for an account that they work with.
Identity access management is hierarchical in nature and hierarchies can be easily implemented in a relational database. However, with the increase in complex access management rules in the modern organizations, modelling identity access management rules as a perfect tree structure is not enough.
Often times, finding answers to simple questions like, “find users who have access to a given security group” requires recursively traversing multiple levels of access rules. This is done by going deep into the multiple levels and traversing the complex connections that exist between users and access roles.
Graph databases help model this type of information more intuitively and can make deep traversal queries much easier to write.
How can I get started with SQL Graph?
To get started with SQL Graph, download SQL Server 2017 or start with a Free Trial of Azure SQL Database subscription.
Here are some resources that will help you in getting started:
Graph databases are garnering a lot of interest across many industry domains.
If you are already using SQL Server to store your relational data, there is no need to turn to a new product for your graph data requirements. SQL Server 2017 and Azure SQL Database can help you solve your graph problems easily with the help of new features and T-SQL extensions.
You get the same performance, security and data durability guarantees on your data as you get with the relational data.
This article was technically reviewed by Suprotim Agarwal.