如何在SQL Server 2017中实现图形数据库-白红宇

如何在SQL Server 2017中实现图形数据库

阅读量：2509 次

发布时间：2019-05-11

本文共 14347 字，大约阅读时间需要 47 分钟。

介绍 (Introduction)

Graph database

图形数据库

A graph database is a type of database whose concept is based on nodes and edges.

图数据库是一种数据库，其概念基于节点和边。

Graph databases are based on graph theory (a graph is a diagram of points and lines connected to the ). Nodes represent data or entity and edges represent connections between nodes. Edges own properties that can be related to nodes. This capability allows us to show more complex and deep interactions between our data. Now, to explain this interaction we will show it in a simple diagram

图形数据库基于图形理论（图形是点和连接到这些线的图）。节点表示数据或实体，边表示节点之间的连接。边拥有可以与节点相关的属性。此功能使我们能够显示数据之间更复杂，更深入的交互。现在，为了解释这种相互作用，我们将在一个简单的图中显示它

The diagram above shows the basic model of the graph database concept.

上图显示了图形数据库概念的基本模型。

The nodes are Andera, Bob and Camila and Follows (edges) provide connections between nodes. This database model cannot be treated as an alternative to a relational database model but faced with some specific problems the graph database model can be alternative and effective.

节点是Andera，Bob和Camila，而Follows（边）提供节点之间的连接。该数据库模型不能被视为关系数据库模型的替代方案，但是面对某些特定问题，图形数据库模型可能是替代方案并且有效。

If you look at the diagram closely, maybe you can design this data model in a relational database by joins but imagine that if you have a lot of nodes and edges then how many joins will you need? And, another consideration could be how this design would perform? For this reason, when handling some business problems we need a graph database.

如果仔细观察该图，也许可以通过联接在关系数据库中设计此数据模型，但是可以想象，如果您有很多节点和边，那么您将需要多少个联接？并且，另一个考虑因素可能是该设计如何执行？因此，在处理一些业务问题时，我们需要一个图形数据库。

In the context of social media, for example, there are a lot of social actions like connect, follow etc. and each social action creates a mark. When we combine these marks, it looks like a spider’s web. The graph database model is ideally suited to store this type of data.

例如，在社交媒体的环境中，有很多社交行为，例如连接，关注等，每个社交行为都会创造一个标记。当我们结合这些标记时，它看起来就像是蜘蛛网。图形数据库模型非常适合存储此类数据。

SQL Server 2017 and graph database

SQL Server 2017和图形数据库

Microsoft announced graph database in SQL Server 2017. This feature allows us to create graph data models. SQL Server 2017 and graph database architecture contains two types of tables. They include the node table and edge table.

Microsoft在SQL Server 2017中发布了图形数据库。此功能使我们可以创建图形数据模型。 SQL Server 2017和图数据库体系结构包含两种类型的表。它们包括节点表和边缘表。

We can demonstrate it with a diagram.

我们可以用图来演示它。

NODE TABLE: Node table defines entity in a graph model.

节点表：节点表定义图模型中的实体。

DROP TABLE IF EXISTS UsersCREATE TABLE Users (ID INTEGER PRIMARY KEY, NickName VARCHAR(100)) AS NODE;INSERT INTO UsersVALUES (1,'Andera'),(2,'Bob'),(3,'Camila') SELECT * FROM Users

$NODE_ID: It is an important column in a node table. When a node table is created, this calculated field is automatically generated by the SQL engine. This field describes a given node uniquely. After we create the objects, we will look at the objects through the object explorer in Management Studio. You will see a new folder named as Graph Tables. This folder contains all graph tables.

$ NODE_ID：这是节点表中的重要列。创建节点表后，此计算字段由SQL引擎自动生成。该字段唯一地描述给定节点。创建对象之后，我们将通过Management Studio中的对象资源管理器查看对象。您将看到一个名为Graph Tables的新文件夹。此文件夹包含所有图形表。

SQL Server adds a GUID to the end of $NODE_ID column’s name but we can also use this column without GUID extension (pseudo-column). If we do not create a unique constraint or index on $NODE_ID column, the SQL engine automatically creates unique, non-clustered indexes when the node table is created. This guarantees the uniqueness of $NODE_ID column.

SQL Server在$ NODE_ID列名的末尾添加了一个GUID，但是我们也可以使用没有GUID扩展名（伪列）的该列。如果我们不在$ NODE_ID列上创建唯一约束或索引，则在创建节点表时，SQL引擎会自动创建唯一的非聚集索引。这样可以保证$ NODE_ID列的唯一性。

EDGE TABLE: An edge table defines connection between node table entities

EDGE TABLE：边缘表定义节点表实体之间的连接

DROP TABLE IF EXISTS FlowInfoCREATE TABLE FlowInfo  AS EDGESELECT * FROM FlowInfo

When we create an edge table, the SQL engine creates three implicit columns.

当我们创建边缘表时，SQL引擎将创建三个隐式列。

$EDGE_ID: It defines unique edge in edge table. For this reason, the SQL engine automatically creates a unique non clustered index

$ EDGE_ID：它在边表中定义唯一边。因此，SQL引擎会自动创建唯一的非聚集索引

$FROM_ID: It defines the starting point for the entity of edge.

$ FROM_ID：它定义的边缘实体的起点。

$TO_ID: It defines the end point for the entity of edge.

$ TO_ID：它定义了边缘实体的终点。

Now we will define an edge connection to the edge table.

现在，我们将定义到边表的边连接。

The insert statement for the step when Bob follows Andera is

Bob跟随Andera时的步骤的插入语句为

INSERT INTO FlowInfo ($from_id ,$to_id )VALUES ((SELECT $node_id from Users where ID=2),(SELECT $node_id from Users where ID=1))

The insert statement for the step when Camila follows Andera is

Camila跟随Andera时的步骤的插入语句为

INSERT INTO FlowInfo ($from_id ,$to_id )VALUES ((SELECT $node_id from Users where ID=3),(SELECT $node_id from Users where ID=1))

The insert statement for the step when Camila follows Bob is

Camila跟随Bob的步骤的插入语句为

INSERT INTO FlowInfo ($from_id ,$to_id )VALUES ((SELECT $node_id from Users where ID=3),(SELECT $node_id from Users where ID=2))select * from FlowInfo

The following query explains Camila’s connections

以下查询说明了Camila的连接

SELECT UsersFol.NickName FROM Users Users, FlowInfo, Users UsersFolWHERE MATCH(Users-(FlowInfo)->UsersFol)AND Users.NickName = 'Camila';

In this query we saw some new T-SQL syntax “MATCH”, “-“, “->”.

在此查询中，我们看到了一些新的T-SQL语法“ MATCH”，“-”，“->”。

“-“sign represents $FROM_ID and “->” sign represents $TO_ID field on edge table.

“-”符号表示$ FROM_ID，“->”符号表示边缘表上的$ TO_ID字段。

Have a look at the execution plan of this query

看一下这个查询的执行计划

There is a table scan in the execution plan of this query because there isn’t any index in the $FROM_ID and $TO_ID columns in FlowInfo edge table. We will create a unique non clustered index in these columns and we will look at the execution plan again

此查询的执行计划中有表扫描，因为FlowInfo边缘表的$ FROM_ID和$ TO_ID列中没有任何索引。我们将在这些列中创建唯一的非聚集索引，然后再次查看执行计划

CREATE UNIQUE NONCLUSTERED INDEX IX_IndexEdge1 ON [dbo].[FlowInfo](	$from_id,$to_id)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

After adding an index and then avoiding table scans from the key lookup and nested loop we can see its performance. In case of having a heavy read request on a graph model, we have to add an index.

添加索引然后避免从键查找和嵌套循环进行表扫描后，我们可以看到其性能。如果对图形模型有大量读取请求，则必须添加索引。

Now, we will create a little bit more complex example using the graph database model. Imagine that we have an online book application and customers. Customers can read books online and can connect with other customers. And while using the application we want to show a pop-up to our customers that if your connections like this author or authors

现在，我们将使用图数据库模型来创建一些更复杂的示例。想象一下，我们有一个在线图书申请和客户。客户可以在线阅读书籍，并可以与其他客户建立联系。在使用该应用程序时，我们希望向客户显示一个弹出窗口，表明如果您的联系人喜欢这个作者或其他作者，

After this, we will create our graph database model objects

之后，我们将创建我们的图形数据库模型对象

DROP TABLE IF EXISTS CustomerCREATE TABLE Customer (ID INT PRIMARY KEY IDENTITY(1,1), CustName VARCHAR(100)) AS NODEINSERT INTO Customer VALUES('James'),('Brian'),('Jason'),('Edward') DROP TABLE IF EXISTS AuthorCREATE TABLE Author (ID INT PRIMARY KEY IDENTITY(1,1), AuthorName VARCHAR(100)) AS NODEINSERT INTO Author VALUES('William Shakespeare'),('William Golding'),('Fyodor Dostoyevsky'),('Kathryn Stockett') DROP TABLE IF EXISTS BooksCREATE TABLE Books (ID INT PRIMARY KEY IDENTITY(1,1), BookName VARCHAR(100)) AS NODEINSERT INTO Books VALUES('Romeo and Juliet'),('Lord of the Flies'),('Crime and Punishment'),('The Help')

In the next step, edge tables will be created and connections between nodes are defined

在下一步中，将创建边缘表并定义节点之间的连接

DROP TABLE IF EXISTS CustomerConnect  CREATE TABLE CustomerConnect  AS EDGE --James connects JasonINSERT INTO CustomerConnect VALUES ((SELECT $node_id from Customer where ID=1),(SELECT $node_id from Customer where ID=3))--Brian connects JasonINSERT INTO CustomerConnect VALUES ((SELECT $node_id from Customer where ID=2),(SELECT $node_id from Customer where ID=3))--Brian connects EdwardINSERT INTO CustomerConnect VALUES ((SELECT $node_id from Customer where ID=2),(SELECT $node_id from Customer where ID=4)) DROP TABLE IF EXISTS CustomerLikeAuthorCREATE TABLE CustomerLikeAuthor AS EDGE --James Likes William Shakespeare INSERT INTO CustomerLikeAuthor VALUES ((SELECT $node_id from Customer where ID=1),(SELECT $node_id from Author where ID=1))--Edward Likes Kathryn StockettINSERT INTO CustomerLikeAuthor VALUES ((SELECT $node_id from Customer where ID=4),(SELECT $node_id from Author where ID=4)) DROP TABLE IF EXISTS CustomerLikeBooksCREATE TABLE CustomerLikeBooks AS EDGE--James Likes Crime and PunishmentINSERT INTO CustomerLikeBooks VALUES ((SELECT $node_id from Customer where ID=1),(SELECT $node_id from Books  where ID=3))--Edward likes The HelpINSERT INTO CustomerLikeAuthor VALUES ((SELECT $node_id from Customer where ID=4),(SELECT $node_id from Books where ID=4))  DROP TABLE IF EXISTS AuthorWriteBooks CREATE TABLE AuthorWriteBooks AS EDGEINSERT INTO AuthorWriteBooks VALUES ((SELECT $node_id from Books where ID=1),(SELECT $node_id from Author where ID=1)) INSERT INTO AuthorWriteBooks VALUES ((SELECT $node_id from Books where ID=2),(SELECT $node_id from Author where ID=2)) INSERT INTO AuthorWriteBooks VALUES ((SELECT $node_id from Books where ID=3),(SELECT $node_id from Author where ID=3)) INSERT INTO AuthorWriteBooks VALUES ((SELECT $node_id from Books where ID=4),(SELECT $node_id from Author where ID=4))

This query shows the book with its author’s name in the adjacent column to it

此查询在书的相邻列中显示该书及其作者的名字

SELECT Books.BookName ,Author.AuthorNameFROM Author,AuthorWriteBooks,BooksWHERE MATCH(Books-(AuthorWriteBooks)->Author)

This query gives an answer to our question mentioned above

该查询为上述问题提供了答案

SELECT CustTo.CustName ,CustFrom.CustName AS CustomerConnectionName,Author.AuthorName FROMCustomer CustTo, CustomerConnect CustCon ,Customer CustFrom ,CustomerLikeAuthor ,Authorwhere MATCH (CustTo-(CustCon)->CustFrom-(CustomerLikeAuthor)->Author)and CustTo.CustName='Brian'

Graph database model or relational database model

图数据库模型或关系数据库模型

The major difference between these two database models is how they define relations within your data. In relational database model we can create relationships with primary or foreign keys but, on the other hand, in a graph database model we define connections (edges) and we can add properties to these connections (edges). Let us differentiate it with an example.

这两个数据库模型之间的主要区别在于它们如何定义数据中的关系。在关系数据库模型中，我们可以使用主键或外键创建关系，但另一方面，在图数据库模型中，我们定义连接（边），并且可以向这些连接（边）添加属性。让我们用一个例子来区分它。

Imagine that our customers rank books. We want to store this rank point of books. If we create this model in relational model it will look like:

想象一下，我们的客户对书籍进行排名。我们要存储书籍的这一点。如果我们在关系模型中创建此模型，它将看起来像：

Now, as in case of graph model

现在，就像图模型一样

Now we will create node and edge tables. The key point here is that we will create our edge table with a rank property.

现在，我们将创建节点表和边表。这里的关键点是，我们将使用rank属性创建边表。

The node table is defined as follows:

节点表定义如下：

DROP TABLE IF EXISTS CustomerCREATE TABLE Customer (ID INT PRIMARY KEY IDENTITY(1,1), CustName VARCHAR(100)) AS NODEINSERT INTO Customer VALUES('James'),('Brian'),('Jason'),('Edward')  DROP TABLE IF EXISTS BooksCREATE TABLE Books (ID INT PRIMARY KEY IDENTITY(1,1), BookName VARCHAR(100)) AS NODEINSERT INTO Books VALUES('Romeo and Juliet'),('Lord of the Flies'),('Crime and Punishment'),('The Help')

The edge table with rank property is defined as follows

具有rank属性的边表定义如下

CREATE TABLE CustomerBookRate(ID INT PRIMARY KEY IDENTITY(1,1),RankAmount SMALLINT) AS EDGESELECT * FROM CustomerBookRate

Firstly, we will insert data in the edge table with connections and rank amount:

首先，我们将在边表中插入具有连接和等级数量的数据：

“Brian” gives “Lord of the Flies” “5” points.

“布莱恩”给出“苍蝇之王”“ 5”分。

INSERT INTO CustomerBookRateVALUES((SELECT $node_id from Customer   where ID =2) ,(SELECT $node_id from Books  where ID =2),5)

After this, we will generate some dummy data

之后，我们将生成一些虚拟数据

INSERT INTO CustomerBookRateVALUES((SELECT $node_id from Customer   where ID =1) ,(SELECT $node_id from Books  where ID =2),3) INSERT INTO CustomerBookRateVALUES((SELECT $node_id from Customer   where ID =1) ,(SELECT $node_id from Books  where ID =4),3) INSERT INTO CustomerBookRateVALUES((SELECT $node_id from Customer   where ID =1) ,(SELECT $node_id from Books  where ID =1),2) INSERT INTO CustomerBookRateVALUES((SELECT $node_id from Customer   where ID =4) ,(SELECT $node_id from Books  where ID =1),2) select * from CustomerBookRate

These queries will show us which customer likes which book and which book is highly ranked

这些查询将向我们显示哪些客户喜欢哪本书以及哪本书排名很高

select Customer.CustName,Books.BookName,CustomerBookRate.RankAmount from Customer,CustomerBookRate,Bookswhere MATCH (Customer-(CustomerBookRate)->Books)

select Books.BookName,sum(CustomerBookRate.RankAmount) from Customer,CustomerBookRate,Bookswhere MATCH (Customer-(CustomerBookRate)->Books)group by Books.BookName

Another major difference is that the graph database model happens to give better performance in heavy connections. Like, in some business problems, the application needs a complex hierarchy. A graph database would be a compelling option in that case because graph database offers better performance and simple data modeling.

另一个主要区别是，图数据库模型恰好在重连接中提供了更好的性能。就像在某些业务问题中一样，应用程序需要复杂的层次结构。在这种情况下，图形数据库将是一个引人注目的选择，因为图形数据库提供了更好的性能和简单的数据建模。

There is the possibility of finding other differences as well but generally these two topics are discussed.

也有可能发现其他差异，但通常会讨论这两个主题。

结论 (Conclusions)

In this article, we discussed graph database and SQL Server 2017 graph database features. SQL Server graph database is a fantastic feature. We can implement both graph database and relational database models in the same database engine. This hybrid architecture allows us to use SQL Server engine capabilities with a graph database. T-SQL syntax support graph database queries. Graph database does some limitations notwithstanding these limitations there are many exceptional features in SQL Server 2017, that make it a compelling technology to consider.

在本文中，我们讨论了图数据库和SQL Server 2017图数据库功能。 SQL Server图形数据库是一个很棒的功能。我们可以在同一数据库引擎中实现图形数据库模型和关系数据库模型。这种混合体系结构使我们可以将SQL Server引擎功能与图形数据库一起使用。 T-SQL语法支持图形数据库查询。尽管存在这些限制，但Graph数据库还是有一些限制，因为SQL Server 2017中有许多例外功能，这使其成为值得考虑的引人注目的技术。