Database Normalize 数据库正则化
2019-06-27
Coding
Database
👋 ‍️‍️阅读
❤️ 喜欢
💬 评论

Database Normalize 数据库正则化

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as an integral part of his relational model. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

数据库规范化,又称正规化、标准化,是数据库设计的一系列原理和技术,以减少数据库中数据冗余,增进数据的一致性。 关系模型的发明者埃德加·科德最早提出这一概念,并于1970年代初定义了第一范式、第二范式和第三范式的概念,还与Raymond F. Boyce于1974年共同定义了第三范式的改进范式——BC范式。 除外还包括针对多值依赖的第四范式,连接依赖的第五范式、DK范式和第六范式。

Concepts

Superkey 超键

A superkey or super-key is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same values for the attributes in this set.[1] It can be defined as a set of attributes of a relation schema upon which all attributes of the schema are functionally dependent.

在关系模式中能够唯一标示多元组(即“行”)的属性集

Candidate Key 候选键

A minimal superkey for that relation; that is, a set of attributes such that:

  • the relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these attributes (which means that the set of attributes is a superkey)
  • there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).

超键且不存在真子集亦为超键

Functionally Dependent 函数依赖

Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written XYX → Y) if, and only if, each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency XYX → Y. Equivalently, the projection ΠX,YR\Pi _{X,Y}R is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XYX → Y is called trivial if Y is a subset of X.

在关系R中,X的值可以确定Y的值,称Y函数依赖X,记作 XYX \rightarrow Y

Fully Functionally Dependent 完全函数依赖

Partial Functionally Dependent 部分函数依赖

Delivery Functionally Dependent 传递函数依赖

Multivalued Dependency 多值依赖

if we denote by (x,y,z)(x,y,z) the tuple having values for α,β,Rαβ\alpha, \beta, R-\alpha -\beta collectively equal to x,y,zx, y, z, correspondingly, then whenever the tuples (a,b,c)(a,b,c) and (a,d,e)(a,d,e) exist in {\displaystyle r} r, the tuples {\displaystyle (a,b,e)} (a,b,e) and (a,d,c)(a,d,c) should also exist in rr.

对于关系RR的一组分解(A,B,C)(A,B,C),对于给定的aAa \in A,存在 CCC^* \subseteq C,对于任意 bBb \in B,值域 C(a,b)=CC_(a,b)=C^*,称C多值依赖A,记作 XYX \twoheadrightarrow Y

Join Dependency 连接依赖

A table TT is subject to a join dependency if TT can always be recreated by joining multiple tables each having a subset of the attributes of TT. If one of the tables in the join has all the attributes of the table TT, the join dependency is called trivial.

对于关系RR的一组分解(A1,A2,...An)(A_1, A_2, ... A_n),可以通过连接重建关系RR,称RR有连接依赖 (A1,A2,...An)*(A_1, A_2, ... A_n)

1NF

Definition

The domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.

所有列的值域都是由原子值组成;所有字段的值都只能是单一值。

Sample

NameTypePhone
DeanCHD GUI1801234567
AshkanCHD Server9962653205,86010123
JasonD4C GUI1807654321
DavidD4C Server9962653208,86021230

Defects

  • Find all GUI team members or find all CHD team members
  • Dean get a new phone number or Ashkan need to remove a phone number

With 1NF

Click to show


NameProductType
DeanCHDGUI
AshkanCHDServer
JasonD4CGUI
DavidD4CServer

NamePhone
Dean1801234567
Ashkan9962653205
Ashkan86010123
Jason1807654321
David9962653208
David86021230

2NF

Definition

Any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

任意非键字段都完全依赖每个候选键

Sample

NameLocationTeamManagerTypeProjectName
WenZheSZPLTLiangSharkTankFindCode
XiaoLongSJPMGeorgeSharkTankSTAR
WenZheSZPLTLiangABCGolang

Defects

  • Some data duplicate many times
  • WenZhe transfer to another team
  • If XiaoLong exit the project, after we delete the row. Where is XiaoLong? Who is his manager?

With 2NF

Click to show


NameLocationTeamManager
WenZheSZPLTLiang
XiaoLongSJPMGeorge

NameTypeProjectName
WenZheSharkTankFindCode
XiaoLongSharkTankSTAR
WenZheABCGolang

3NF

Definition

No non-prime (non-key) attribute is transitively dependent of any key i.e. no non-prime attribute depends on other non-prime attributes. All the non-prime attributes must depend on the primary key only.

不存在非键字段对其他非键字段的依赖

Sample

NameLocationTeamManager
WenZheSZPLTLiang
ZhengPuSZPLTLiang
XiaoLongSJPMGeorge

Defects

  • WenZhe transfer to another team
  • PLT transfer to another manager
  • Team without member has no manager

With 3NF

Click to show


NameLocationTeam
WenZheSZPLT
ZhengPuSZPLT
XiaoLongSJPM

LocationTeamManager
SZPLTLiang
SJPMGeorge

BCNF

Definition

For every dependency XYX \rightarrow Y, one of the following conditions hold true:

  • XYX \rightarrow Y is a trivial functional dependency (i.e., YY is a subset of XX)
  • XX is a superkey for schema R

任意非平凡依赖XYX \rightarrow YXX是超键

4NF

Definition

For every one of its non-trivial multivalued dependencies XYX \twoheadrightarrow Y, XX is a superkey—that is, XX is a super key.

任意非平凡多值依赖XYX \twoheadrightarrow YXX是超键

Sample

RestaurantFoodDelivery Area
KFCBurgerW.Trimble
KFCBurgerE.Trimble
KFCRiceW.Trimble
KFCRiceE.Trimble
BKBurgerW.Trimble

Suppose that for each area they have same food

Defects

  • KFC has new food: dumpling
  • KFC has new Delivery Area: N.Trimble

With 4NF

Click to show


RestaurantFood
KFCBurger
KFCRice
BKBurger

RestaurantDelivery Area
KFCW.Trimble
KFCE.Trimble
BKW.Trimble

5NF

Definition

Every non-trivial join dependency in that table is implied by the candidate keys.

任意非平凡连接依赖 {A1,A2,A3...AnA_1, A_2, A_3 ... A_n}, AiA_i是超键

Sample

RestaurantFoodDelivery Area
KFCBurgerChina
KFCBurgerUS
KFCRiceChina
BKBurgerUS

Defects

  • KFC has new food dumpling that only Chinese eat it
  • KFC has new Delivery Area, North Korea, where need Rice but not Burger

With 5NF

Click to show


RestaurantFood
KFCBurger
KFCRice
BKBurger

RestaurantDelivery Area
KFCChina
KFCUS
BKUS

FoodDelivery Area
BurgerChina
BurgerUS
RiceChina

6NF

Definition

No nontrivial join dependencies at all.

不存在非平凡连接依赖

Summary

数据库规范化旨在减少数据库中数据冗余,保证一致性,防止异常操作。通过各级范式可以快速发现数据库设计中的缺陷。

然而也有另一种声音存在,即反正则化。例如有些场景下会放弃伸缩性来获取性能。

但是这并不意味着正则化存在缺陷,反而只有在深入理解了正则化后才能设计出兼顾逻辑与性能的优秀数据库。

  1. Un-Normalize 看山是山
  2. Normalize 看山不是山
  3. De-Normalize (Not-Only-Normalize) 看山还是山(看山不只是山)

Copyright © 2020-2024 Dean Xu. All Rights reserved.