SDSC5003
ER Design
constraints策略
One or Many?
ER图的一对一、多对多关系箭头怎么画?
假设A->B中,A只能对应一个B,那么A画到关系中需要加箭头,反过来相同
Participation Constraints
参与是可以是完全(存在约束)或部分(无约束)的
完全参与的关系必须使用双实线进行连接
eg:Children attend pre-school, assume that
- Children can only attend one pre-school
- A pre-school must have children
eg:Employees work for companies
- An employee must be employed by someone (or they wouldn’t be an
employee) 每个员工都必须有公司,必须实现works_for
- People often have more than one job 每个员工都必须有员工
- 二者没有one
- 的限制,即多对多
Extended ER Model
Weak Entities 弱实体
弱实体简单来说就是完全依赖另一方的实体
ISA (‘is a’) Hierarchies 继承
Aggregation 聚和
ER-to-SQL
删除策略
CREATE TABLE 子表 (
id INTEGER PRIMARY KEY,
main_id INTEGER,
-- 外键约束,联级删除
FOREIGN KEY (main_id) REFERENCES 主表(id) ON DELETE CASCADE
);
- CASCAD
ON DELETE CASCADE
- SET NULL
ON DELETE SET NULL
- SET DEFAULT +
ON DELETE SET DEFAULT
- RESTRICT 如果从表中有引用该记录的数据,删除操作会被阻止
ON DELETE RESTRICT
基本方法
对于强制是否存在,主要靠外键是否为空+删除策略来规定
一对一 两方必须存在/一对一 一方必须存在
每一个实体都对应转化为一张SQL表,并选择“可以容忍自己没有对应元素”作为主表(因为主表不存外键),把它的主键放入另一个实体对应的SQL表中作为外键,外键不允许为空,联级删除选择CASCAD
一对一 两方都可选
任意选择一张表为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键允许为空,联级删除选择 SET NULL
一对多 两实体都为强制存在 / WEAK ENTITES
将 多 的实体(例如在部门员工表中:一个员工只能指向一个部门,一个部门可以被多个员工指,此时部门就是“多”)设置为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键不允许为空,联级删除选择CASCAD
一对多 两方都可选
将 多 的实体设置为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键允许为空,联级删除选择 SET NULL
多对多
需要一张新关系表包含两个实体的主键。无论两边实体是否为可选存在的,其转化形式一致,关系表中的外键列不能为NULL。
Relational Algebra
Relational Algebra Operations
- Basic operations:
- Selection (
): Selects a subset of rows from a relation. - Projection (
): Selects a subset of columns from a relation. 选择列 - Cross-product (
): Allows us to combine two relations. - Set-difference (
): Tuples in relation 1, but not in relation 2. - Union (
): Tuples in relation 1 and in relation 2.
- Selection (
- Additional derived operations:
- Intersection
- join (
) eg: from S1, R1 where S1.sid<R1.sid - division (
) - renaming (
/ ) eg: - Not essential, but very useful.
- Note: Since each operation returns a relation, operations can be composed! 注意:每个操作都返回关系,因此可以组合操作!
Natural Join
Equi-join on all common fields. 在所有共同字段上进行等值连接。
Division
eg: Find the names of sailors who’ve reserved all boats
利用除法实现查找“所有”的功能
Skills
How to use algebra to expree select max(value) from T group by name
?
使用笛卡尔积+减
How to translate algebra
Table A (EmployeeSkills): Stores EmployeeID and Skill each employee possesses.
Table B (RequiredSkills): Stores Skill values that represent the required skills.
SELECT DISTINCT A.EmployeeID
FROM EmployeeSkills A
WHERE NOT EXISTS (
SELECT *
FROM RequiredSkills B
WHERE NOT EXISTS (
SELECT *
FROM EmployeeSkills A2
WHERE A2.EmployeeID = A.EmployeeID
AND A2.Skill = B.Skill
)
);
or using inner join and count(distinct )
SELECT A.EmployeeID
FROM EmployeeSkills A
JOIN RequiredSkills B ON A.Skill = B.Skill
GROUP BY A.EmployeeID
HAVING COUNT(DISTINCT A.Skill) = (SELECT COUNT(*) FROM RequiredSkills);
Design Theory
Target
Avoid anomalies 避免异常
Anomaly Kinds
- 数据冗余 redundant
- 更新异常 update anomaly
- 删除异常 delete anomaly
- 插入异常 insert anomaly
Normal Forms
- 1NF: All table are flat
- 2NF: 1NF & No non-prime attribute FD part candidate key 要求实体的属性完全依赖于主关键字
- 3NF: 2NF & no Transitive Dependency 无传递依赖
- BCNF: no bad FDs
Functional Dependency (FD)
We write
for any tuples
and we call
A的值可以直接决定B的是什么,则B依赖于A
example
- Name -> Color : FD holds
- Name -> Price : FD dose not hold eg: Gizmo have two diff Departments
- Category -> Department : DF holds
- Color, Category -> Price
Because of Name -> Color
Category -> Department
Color, Category -> Price
We can infer that Name, Category -> Price
, 可以看出一种传递依赖
Inferred FDs
Split/Combine
And vice-versa
Reduction/Tricial
Transitive
example
Closure
example
Algorithm to find
Non-trivial FD
非平凡函数依赖(non-trivial functional dependency)是指函数依赖
Exercise - find all FDs implied by the given FDs
Find all non-trivial functional dependency implied by
Requirements:
- Non-trivial FDs (we avoid reflexive dependencies like
). - Each FD must have a single attribute on the right-hand side.
Solove:
Compute Attribute Closures, and get new FDs from Closures by
Inferred FDs
methods, especially usingSplit
Identify Non-Redundant FDs
https://chatgpt.com/share/671ffa9a-e5b8-8008-820b-ddbb74006e93
FDs to Candidate keys
按以下步骤求Candidate keys:
- 只在FD右部出现的属性,不属于候选码;
- 只在FD左部出现的属性,一定存在于某候选码当中;
- 其他属性逐个与2,3的属性组合,求属性闭包,直至X的闭包等于U(全部元素),若等于U,则X为候选码。
“Good” vs. “Bad” FDs
What is “Bad” FDs? and what is 2NF/3NF/BFNF?
- Partial Dependency 部分依赖 (2NF) :
某些非主属性依赖于主键的一部分,而不是整个主键
- Transitive Dependency 传递依赖 (3NF):
3NF:
- First, it should be in 2NF
- No non-prime attribute非主属性 should be transitively dependent on the Key of the table.
非主属性依赖于另一个非主属性,而这个非主属性又依赖于主键
X is prime attribute, Y Z is non-prime attribute:
- Non-Candidate Key Dependencies 存在非候选键依赖项 (BCNF):
存在非平凡函数依赖的左手边(决定因素)不是是候选键的情况
BCNF Decomposition Algorithm
example
BCNFDecomp(R(A, B, C, D, E)):
for FD:
1.try {C} -> {D}
C+ = {C, D}
as for R(A, B, C, D, E)
{C, D} in C+, {A, B, E} not in C+
split R(A, B, C, D, E) into
R(C, D) and R(C, A, B, E)
return BCNFDecomp(R(C, D))
and BCNFDecomp(R(A, B, C, E))
BCNFDecomp(R(C, D)):
for FD:
1.try {C} -> {D}
C+ = {C, D}
as for R(C, D)
{C, D} in C+, {} not in C+
can't split
2.try {A} -> {B,C}
A+ = {A, B, C, D}
as for R(C, D)
{C, D} in A+, {} not in A+
can't split
return R(C, D)
BCNFDecomp(R(A, B, C, E)):
for FD:
1.try {C} -> {D}
C+ = {C, D}
as for R(A, B, C, E)
{C} in C+, {A, B, E} not in C+
can't split, because split is meanless
2.try {A} -> {B,C}
A+ = {A, B, C, D}
as for R(A, B, C, E)
{A, B, C} in A+, {E} not in A+
split R(A, B, C, D, E) into
R(A, B, C) and R(A, E)
return BCNFDecomp(R(A, B, C))
and BCNFDecomp(R(A, E))
BCNFDecomp(R(A, B, C)):
for FD:
1.try {C} -> {D}
C+ = {C, D}
as for R(A, B, C):
{C} in C+, {A, B} not in C+
can't split, because split is meanless
2.try {A} -> {B,C}
A+ = {A, B, C, D}
as for R(A, B, C):
{A, B, C} in A+, {}
can't split
return R(A, B, C)
BCNFDecomp(R(A, E)):
1.try {C} -> {D}
C+ = {C, D}
as for R(A, E):
{} in C+, {A, E} not in C+
can't split
2.try {A} -> {B,C}
A+ = {A, B, C, D}
as for R(A, E):
{A} in A+, {E} not in A+
can't split, because split is meanless
return R(A, E)
Answer: {C, D} {A, B, C} {A, E}
A Problem with BCNF
If we use BCNFDecomp(R(A, B, C))
: {A, B} {A, C}
We lose the FD :
THE ANSWER IS THAT: One can always losslessly decompose to 3NF while preserving FDs but BCNF might not preserve them. 3NF的转换可以做到无损,但是BCNF可能会丢失依赖
Indexing
方式 | 数据存储形式 |
---|---|
Alternative (1) | <A, rid> |
Alternative (2) | <A, (B, C)> |
Alternative (3) | <A, L> |
rid 是指向记录的唯一标识符(Record ID) | |
A 是索引键(Search Key),(B, C) 是物理位置的具体描述: | |
• B 表示数据所在的页号(Page Number) | |
• C 表示数据在页内的偏移量(Slot/Location) | |
L 是记录 rid 的列表(List of Record IDs) |