Database Normalize 数据库正则化

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as an integral part of his relational model. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

数据库规范化，又称正规化、标准化，是数据库设计的一系列原理和技术，以减少数据库中数据冗余，增进数据的一致性。关系模型的发明者埃德加·科德最早提出这一概念，并于1970年代初定义了第一范式、第二范式和第三范式的概念，还与Raymond F. Boyce于1974年共同定义了第三范式的改进范式——BC范式。除外还包括针对多值依赖的第四范式，连接依赖的第五范式、DK范式和第六范式。

Concepts

Superkey 超键

A superkey or super-key is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same values for the attributes in this set.[1] It can be defined as a set of attributes of a relation schema upon which all attributes of the schema are functionally dependent.

在关系模式中能够唯一标示多元组（即“行”）的属性集

Candidate Key 候选键

A minimal superkey for that relation; that is, a set of attributes such that:

the relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these attributes (which means that the set of attributes is a superkey)
there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).

超键且不存在真子集亦为超键

Functionally Dependent 函数依赖

Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written $X → Y$ ) if, and only if, each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency $X → Y$ . Equivalently, the projection $\Pi _{X,Y}R$ is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: $X → Y$ is called trivial if Y is a subset of X.

在关系R中，X的值可以确定Y的值，称Y函数依赖X，记作 $X \rightarrow Y$

Fully Functionally Dependent 完全函数依赖

Partial Functionally Dependent 部分函数依赖

Delivery Functionally Dependent 传递函数依赖

Multivalued Dependency 多值依赖

if we denote by $(x,y,z)$ the tuple having values for $\alpha, \beta, R-\alpha -\beta$ collectively equal to $x, y, z$ , correspondingly, then whenever the tuples $(a,b,c)$ and $(a,d,e)$ exist in {\displaystyle r} r, the tuples {\displaystyle (a,b,e)} (a,b,e) and $(a,d,c)$ should also exist in $r$ .

对于关系 $R$ 的一组分解 $(A,B,C)$ ，对于给定的 $a \in A$ ，存在 $C^* \subseteq C$ ，对于任意 $b \in B$ ，值域 $C_(a,b)=C^*$ ，称C多值依赖A，记作 $X \twoheadrightarrow Y$

Join Dependency 连接依赖

A table $T$ is subject to a join dependency if $T$ can always be recreated by joining multiple tables each having a subset of the attributes of $T$ . If one of the tables in the join has all the attributes of the table $T$ , the join dependency is called trivial.

对于关系 $R$ 的一组分解 $(A_1, A_2, ... A_n)$ ，可以通过连接重建关系 $R$ ，称 $R$ 有连接依赖 $*(A_1, A_2, ... A_n)$

1NF

Definition

The domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.

所有列的值域都是由原子值组成；所有字段的值都只能是单一值。

Sample

Name	Type	Phone
Dean	CHD GUI	1801234567
Ashkan	CHD Server	9962653205,86010123
Jason	D4C GUI	1807654321
David	D4C Server	9962653208,86021230

Defects

Find all GUI team members or find all CHD team members
Dean get a new phone number or Ashkan need to remove a phone number

With 1NF

Click to show

Name	Product	Type
Dean	CHD	GUI
Ashkan	CHD	Server
Jason	D4C	GUI
David	D4C	Server

Name	Phone
Dean	1801234567
Ashkan	9962653205
Ashkan	86010123
Jason	1807654321
David	9962653208
David	86021230

2NF

Definition

Any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

任意非键字段都完全依赖每个候选键

Sample

Name	Location	Team	Manager	Type	ProjectName
WenZhe	SZ	PLT	Liang	SharkTank	FindCode
XiaoLong	SJ	PM	George	SharkTank	STAR
WenZhe	SZ	PLT	Liang	ABC	Golang

Defects

Some data duplicate many times
WenZhe transfer to another team
If XiaoLong exit the project, after we delete the row. Where is XiaoLong? Who is his manager?

With 2NF

Click to show

Name	Location	Team	Manager
WenZhe	SZ	PLT	Liang
XiaoLong	SJ	PM	George

Name	Type	ProjectName
WenZhe	SharkTank	FindCode
XiaoLong	SharkTank	STAR
WenZhe	ABC	Golang

3NF

Definition

No non-prime (non-key) attribute is transitively dependent of any key i.e. no non-prime attribute depends on other non-prime attributes. All the non-prime attributes must depend on the primary key only.

不存在非键字段对其他非键字段的依赖

Sample

Name	Location	Team	Manager
WenZhe	SZ	PLT	Liang
ZhengPu	SZ	PLT	Liang
XiaoLong	SJ	PM	George

Defects

WenZhe transfer to another team
PLT transfer to another manager
Team without member has no manager

With 3NF

Click to show

Name	Location	Team
WenZhe	SZ	PLT
ZhengPu	SZ	PLT
XiaoLong	SJ	PM

Location	Team	Manager
SZ	PLT	Liang
SJ	PM	George

BCNF

Definition

For every dependency $X \rightarrow Y$ , one of the following conditions hold true:

$X \rightarrow Y$ is a trivial functional dependency (i.e., $Y$ is a subset of $X$ )
$X$ is a superkey for schema R

任意非平凡依赖 $X \rightarrow Y$ ， $X$ 是超键

4NF

Definition

For every one of its non-trivial multivalued dependencies $X \twoheadrightarrow Y$ , $X$ is a superkey—that is, $X$ is a super key.

任意非平凡多值依赖 $X \twoheadrightarrow Y$ ， $X$ 是超键

Sample

Restaurant	Food	Delivery Area
KFC	Burger	W.Trimble
KFC	Burger	E.Trimble
KFC	Rice	W.Trimble
KFC	Rice	E.Trimble
BK	Burger	W.Trimble

Suppose that for each area they have same food

Defects

KFC has new food: dumpling
KFC has new Delivery Area: N.Trimble

With 4NF

Click to show

Restaurant	Food
KFC	Burger
KFC	Rice
BK	Burger

Restaurant	Delivery Area
KFC	W.Trimble
KFC	E.Trimble
BK	W.Trimble

5NF

Definition

Every non-trivial join dependency in that table is implied by the candidate keys.

任意非平凡连接依赖 { $A_1, A_2, A_3 ... A_n$ }, $A_i$ 是超键

Sample

Restaurant	Food	Delivery Area
KFC	Burger	China
KFC	Burger	US
KFC	Rice	China
BK	Burger	US

Defects

KFC has new food dumpling that only Chinese eat it
KFC has new Delivery Area, North Korea, where need Rice but not Burger

With 5NF

Click to show

Restaurant	Food
KFC	Burger
KFC	Rice
BK	Burger

Restaurant	Delivery Area
KFC	China
KFC	US
BK	US

Food	Delivery Area
Burger	China
Burger	US
Rice	China

6NF

Definition

No nontrivial join dependencies at all.

不存在非平凡连接依赖

Summary

数据库规范化旨在减少数据库中数据冗余，保证一致性，防止异常操作。通过各级范式可以快速发现数据库设计中的缺陷。

然而也有另一种声音存在，即反正则化。例如有些场景下会放弃伸缩性来获取性能。

但是这并不意味着正则化存在缺陷，反而只有在深入理解了正则化后才能设计出兼顾逻辑与性能的优秀数据库。

Un-Normalize 看山是山
Normalize 看山不是山
De-Normalize (Not-Only-Normalize) 看山还是山(看山不只是山)