Skip to content

SDSC5003

ER Design

constraints策略

One or Many?

ER图的一对一、多对多关系箭头怎么画?

假设A->B中,A只能对应一个B,那么A画到关系中需要加箭头,反过来相同

Participation Constraints

参与是可以是完全(存在约束)或部分(无约束)的

完全参与的关系必须使用双实线进行连接

eg:Children attend pre-school, assume that

  • Children can only attend one pre-school
  • A pre-school must have children

image-20241014112220570 AM

eg:Employees work for companies

  • An employee must be employed by someone (or they wouldn’t be an

employee) 每个员工都必须有公司,必须实现works_for

  • People often have more than one job 每个员工都必须有员工
  • 二者没有one
  • 的限制,即多对多

image-20241014112418396 AM

Extended ER Model

Weak Entities 弱实体

弱实体简单来说就是完全依赖另一方的实体

image-20241014115935791 AM

ISA (‘is a’) Hierarchies 继承

image-20241014120022341 PM

Aggregation 聚和

image-20241014120104527 PM

ER-to-SQL

删除策略

sql
CREATE TABLE 子表 (
    id INTEGER PRIMARY KEY,
    main_id INTEGER,
    -- 外键约束,联级删除
    FOREIGN KEY (main_id) REFERENCES 主表(id) ON DELETE CASCADE
);
  • CASCAD
    ON DELETE CASCADE
  • SET NULL
    ON DELETE SET NULL
  • SET DEFAULT + ON DELETE SET DEFAULT
  • RESTRICT 如果从表中有引用该记录的数据,删除操作会被阻止
    ON DELETE RESTRICT

基本方法

对于强制是否存在,主要靠外键是否为空+删除策略来规定

一对一 两方必须存在/一对一 一方必须存在

每一个实体都对应转化为一张SQL表,并选择“可以容忍自己没有对应元素”作为主表(因为主表不存外键),把它的主键放入另一个实体对应的SQL表中作为外键,外键不允许为空,联级删除选择CASCAD

一对一 两方都可选

任意选择一张表为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键允许为空,联级删除选择 SET NULL

一对多 两实体都为强制存在 / WEAK ENTITES

将 多 的实体(例如在部门员工表中:一个员工只能指向一个部门,一个部门可以被多个员工指,此时部门就是“多”)设置为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键不允许为空,联级删除选择CASCAD

一对多 两方都可选

将 多 的实体设置为主表,把它的主键放入另一个实体对应的SQL表中作为外键,外键允许为空,联级删除选择 SET NULL

多对多

需要一张新关系表包含两个实体的主键。无论两边实体是否为可选存在的,其转化形式一致,关系表中的外键列不能为NULL。

Relational Algebra

Relational Algebra Operations

  • Basic operations:
    • Selection (σ): Selects a subset of rows from a relation.
      • σIname=Summer(Customer)
    • Projection (π): Selects a subset of columns from a relation. 选择列
    • Cross-product (×): Allows us to combine two relations.
    • Set-difference (): Tuples in relation 1, but not in relation 2.
    • Union (): Tuples in relation 1 and in relation 2.
  • Additional derived operations:
    • Intersection
    • join) eg:S1S1.sid<R1.sidR1 from S1, R1 where S1.sid<R1.sid
    • division (÷)
    • renaming (C/ρ ) eg: ρnewName/oldName(Relation)
      • Not essential, but very useful.
  • Note: Since each operation returns a relation, operations can be composed! 注意:每个操作都返回关系,因此可以组合操作!

Natural Join

Equi-join on all common fields. 在所有共同字段上进行等值连接。

Division

image-2024101463033001 PM

eg: Find the names of sailors who’ve reserved all boats

image-2024101463059301 PM

利用除法实现查找“所有”的功能

πsid(Reserves)÷πbid(Boats)

Skills

How to use algebra to expree select max(value) from T group by name ?

使用笛卡尔积+减

A:=πname,value(T)B:=ρ(name,value,name,value)(T×T)C:=πname,value(σname=name\andvalue<value(B))Result=AC

How to translate algebra ÷ to SQL?

  1. Table A (EmployeeSkills): Stores EmployeeID and Skill each employee possesses.

  2. Table B (RequiredSkills): Stores Skill values that represent the required skills.

sql
SELECT DISTINCT A.EmployeeID
FROM EmployeeSkills A
WHERE NOT EXISTS (
    SELECT *
    FROM RequiredSkills B
    WHERE NOT EXISTS (
        SELECT *
        FROM EmployeeSkills A2
        WHERE A2.EmployeeID = A.EmployeeID
          AND A2.Skill = B.Skill
    )
);

or using inner join and count(distinct )

sql
SELECT A.EmployeeID
FROM EmployeeSkills A
JOIN RequiredSkills B ON A.Skill = B.Skill
GROUP BY A.EmployeeID
HAVING COUNT(DISTINCT A.Skill) = (SELECT COUNT(*) FROM RequiredSkills);

Design Theory

Target

Avoid anomalies 避免异常

Anomaly Kinds

  • 数据冗余 redundant
  • 更新异常 update anomaly
  • 删除异常 delete anomaly
  • 插入异常 insert anomaly

Normal Forms

  • 1NF: All table are flat
  • 2NF: 1NF & No non-prime attribute FD part candidate key 要求实体的属性完全依赖于主关键字
  • 3NF: 2NF & no Transitive Dependency 无传递依赖
  • BCNF: no bad FDs

Functional Dependency (FD)

We write AB or say A functionally determines B if,

for any tuples t1 and t2 :

t1[A]=t2[A] implies t1[B]=t2[B]

and we call AB a functional dependency

A的值可以直接决定B的是什么,则B依赖于A

example

image-2024100774450333 PM

  • Name -> Color : FD holds
  • Name -> Price : FD dose not hold eg: Gizmo have two diff Departments
  • Category -> Department : DF holds
  • Color, Category -> Price

Because of Name -> Color Category -> Department Color, Category -> Price We can infer that Name, Category -> Price, 可以看出一种传递依赖

Inferred FDs

Split/Combine

A1,,AmB1,,Bnis equivalent to the following n FDs:A1,,AmBifor i=1,,n

And vice-versa

A1,,AmBifor i=1,,nis equivalent to the following n FDs:A1,,AmB1,,Bn

Reduction/Tricial

A1,,AmAjfor any j=1,,m

Transitive

A1,,AmB1,,Bn and B1,,BnC1,,CkimpliesA1,,AmC1,,Ck

example

image-2024100781506841 PM

Closure

example

Example:F={namecolorcategorydepartmentcolor, categorypriceClosures:{name}+={name, color}{name, category}+={name, category, color, dept, price}{color}+={color}

Algorithm to find X+

Start with X={A1,,An}, FDs F.Repeat until X doesn’t change;do:if{B1,,Bn}C is in Fand{B1,,Bn}X:then add C to X.Return X as X+.

Non-trivial FD

非平凡函数依赖(non-trivial functional dependency)是指函数依赖 XY 中的右手边 Y 不包含在左手边 X 中的情况。换句话说,Y 中的元素不在 X 中出现

Exercise - find all FDs implied by the given FDs

Find all non-trivial functional dependency implied by

A,BCA,DBBD

Requirements:

  1. Non-trivial FDs (we avoid reflexive dependencies like A,BA ).
  2. Each FD must have a single attribute on the right-hand side.

Solove:

  1. Compute Attribute Closures, and get new FDs from Closures by Inferred FDs methods, especially using Split

  2. Identify Non-Redundant FDs

https://chatgpt.com/share/671ffa9a-e5b8-8008-820b-ddbb74006e93

FDs to Candidate keys

按以下步骤求Candidate keys:

  1. 只在FD右部出现的属性,不属于候选码;
  2. 只在FD左部出现的属性,一定存在于某候选码当中;
  3. 其他属性逐个与2,3的属性组合,求属性闭包,直至X的闭包等于U(全部元素),若等于U,则X为候选码。

“Good” vs. “Bad” FDs

What is “Bad” FDs? and what is 2NF/3NF/BFNF?

  • Partial Dependency 部分依赖 (2NF) :

某些非主属性依赖于主键的一部分,而不是整个主键

A,BCBD (Bad FDs)
  • Transitive Dependency 传递依赖 (3NF):

3NF:

  • First, it should be in 2NF
  • No non-prime attribute非主属性 should be transitively dependent on the Key of the table.

非主属性依赖于另一个非主属性,而这个非主属性又依赖于主键

QUESTIONS ON THIRD NORMAL FORM

X is prime attribute, Y Z is non-prime attribute:

X is Primary keyXYYZ (Bad FDs)
  • Non-Candidate Key Dependencies 存在非候选键依赖项 (BCNF):

存在非平凡函数依赖的左手边(决定因素)不是是候选键的情况

A,BC (A and B is Primary key)CA (Bad FDs, because C is not Primary key)

BCNF Decomposition Algorithm

BCNFDecomp(R):Find a non-trivial bad FD: XYif(not found) then ReturnRSplitR into X+and [X+[rest attributes]] as R1 and R2Return BCNFDecomp(R1),BCNFDecomp(R2)

example

R(A,B,C,D,E){C}{D}{A}{B,C}
BCNFDecomp(R(A, B, C, D, E)):
for FD:
  1.try {C} -> {D}
    C+ = {C, D} 
    as for R(A, B, C, D, E)
      {C, D} in C+, {A, B, E} not in C+ 
    split R(A, B, C, D, E) into
      R(C, D) and R(C, A, B, E)
    return BCNFDecomp(R(C, D)) 
      and BCNFDecomp(R(A, B, C, E))
      		
      		
BCNFDecomp(R(C, D)):
for FD:
  1.try {C} -> {D}
    C+ = {C, D} 
    as for R(C, D)
    	{C, D} in C+, {} not in C+ 
    can't split
    
  2.try {A} -> {B,C}
  	A+ = {A, B, C, D}
  	as for R(C, D)
  		{C, D} in A+, {} not in A+ 
  		can't split
  		
  return R(C, D)
  
  
BCNFDecomp(R(A, B, C, E)):
for FD:
  1.try {C} -> {D}
    C+ = {C, D} 
    as for R(A, B, C, E)
      {C} in C+, {A, B, E} not in C+ 
    can't split, because split is meanless
    
  2.try {A} -> {B,C}
  	A+ = {A, B, C, D}
  	as for R(A, B, C, E)
  		{A, B, C} in A+, {E} not in A+ 
    split R(A, B, C, D, E) into
      R(A, B, C) and R(A, E)
    return BCNFDecomp(R(A, B, C)) 
      and BCNFDecomp(R(A, E))
      
      
BCNFDecomp(R(A, B, C)):
for FD:
  1.try {C} -> {D}
    C+ = {C, D} 
    as for R(A, B, C):
    	{C} in C+, {A, B} not in C+ 
    can't split, because split is meanless
    
  2.try {A} -> {B,C}
  	A+ = {A, B, C, D}
  	as for R(A, B, C):
  		{A, B, C} in A+, {}
			can't split
			
  return R(A, B, C)
			
			
BCNFDecomp(R(A, E)):
  1.try {C} -> {D}
    C+ = {C, D} 
    as for R(A, E):
    	{} in C+, {A, E} not in C+ 
    can't split
    
  2.try {A} -> {B,C}
  	A+ = {A, B, C, D}
  	as for R(A, E):
  		{A} in A+, {E} not in A+
			can't split, because split is meanless
			
  return R(A, E)

image-2024100855957033 PM

Answer: {C, D} {A, B, C} {A, E}

A Problem with BCNF

ABB,CA

If we use BCNFDecomp(R(A, B, C)) : {A, B} {A, C}

We lose the FD : B,CA

THE ANSWER IS THAT: One can always losslessly decompose to 3NF while preserving FDs but BCNF might not preserve them. 3NF的转换可以做到无损,但是BCNF可能会丢失依赖

Refrence: https://stackoverflow.com/questions/23965143/converting-3nf-to-bcnf-when-there-is-a-circular-dependency

Indexing

方式数据存储形式
Alternative (1)<A, rid>
Alternative (2)<A, (B, C)>
Alternative (3)<A, L>
rid 是指向记录的唯一标识符(Record ID)
A 是索引键(Search Key),(B, C) 是物理位置的具体描述:
B 表示数据所在的页号(Page Number)
C 表示数据在页内的偏移量(Slot/Location)
L 是记录 rid 的列表(List of Record IDs)