SDSC6012

References

Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications : With R Examples (4th ed. 2017.). Springer International Publishing. https://doi.org/10.1007/978-3-319-52452-8

CityU library 可以下载PDF版本点击直达

Course SDSC6012 slides
ChatGPT 4o
My Thinking …

Lecture1

时间序列分析的目标：描述、预测、控制和假设检验

时序中的随机变量

随机变量: 随机变量 $X$ 是一个函数，它将一个样本空间中的事件映射到实数（或其他数值）。其值是随机的，即它的取值受某个概率分布的控制
在时间序列分析中，随机变量 $X_{t}$ 是指时间序列的每一个观察值，它在每个时间点上是随机的，并且其值遵循某个概率分布

Stochastic process 随机过程

$X_{t}$ 按照时间索引的过程就是随机过程
是一种建模方法，研究不同时间点的关系
The observed values of a stochastic process are referred to as a realization of the stochastic process. 也就是说，我们在某个时间序列中看到的数据只是随机过程的一个实现realization
An observed time series is considered to be one realization of a stochastic process. 观测到的时间序列是随机过程的一种实现

什么叫“随机过程”？这个随机和random有关系吗？
GPT：
“随机过程”中的“随机”确实与“random”（随机）有关。它指的是一个随时间演变的系统或现象，其中的结果不是确定的，而是随机的。每个时刻的数据点（或状态）是由一组随机变量表示的。
具体来说，“随机过程”是指一个按时间顺序排列的随机变量集合，表示某个系统或现象在时间上的变化。由于这些随机变量的结果是不可预知的，所以我们说它是“随机”的。这与“random”一词的含义一致，指的是结果或状态是通过某种概率分布产生的，而不是确定的。
例如，股票价格、天气变化、人口增长等都可以用随机过程来描述。它们在不同时间点的值是由多个因素影响的，并且这些值无法完全预测，因此被视为“随机”的。
总结来说，随机过程是描述系统或现象如何在时间上以随机方式变化的数学模型，其中的“随机”与“random”密切相关，表示不确定性和概率性。

随机过程 vs. 随机变量：

在时间序列分析中，我们通常处理的是随机过程 ${X_{t}}$ (注意有大括号,是多个变量)，它是随时间变化的随机变量序列 $X_{t}$ (注意这是一个变量,没有大括号)

换言之: 多个会变化的随机变量 $X_{t}$ ,并且变化是以时间为维度的,就变成了随机过程 ${X_{t}}$

假设有一个随机过程 ${X_{t}}$ ，每个时间点 $t$ 上我们都观察到一个随机变量 $X_{t}$

White noise 白噪声

白噪声是一种随机信号，其中各个时间点上的值都是彼此不相关的，也就是这些随机变量之间没有相关性。

它的自相关函数是零，表示在不同时间点的信号是完全独立的。

$w_{t} \sim wn (0, σ_{w}^{2})$

Gaussian white noise 高斯白噪声

一种特殊的白噪声,保证噪声的分布满足正态分布

$w_{t} \sim N (0, σ_{w}^{2})$

Time series modeling 时间序列模型

时间序列模型的基本结构可以表示为：

x_{t} = m_{t} + s_{t} + e_{t}

其中：

$x_{t}$ 表示时间序列的观测值；

注意是观测值, 这就是为什么是小x而不是大X

$m_{t}$ 是趋势成分（trend component），表示时间序列的长期变化趋势；
$s_{t}$ 是季节性成分（seasonal component），表示时间序列的周期性波动（如季节、月度变化等）；
$e_{t}$ 是残差（residual），表示无法通过趋势或季节性成分解释的随机噪声或误差, 例如 $w_{t}$ 。

这个模型将时间序列分解为三个部分，以便更好地理解和预测时间序列的行为。

噪声属于什么部分？
GPT:
在时间序列模型中，噪声属于公式中的残差部分（residual），即 $e_{t}$ 。残差代表的是时间序列中无法通过趋势（ $m_{t}$ ）和季节性成分（ $s_{t}$ ）解释的部分。
具体来说：
噪声（noise） 是指时间序列中由于随机因素产生的波动或误差，通常被视为不可预测的随机变化。
残差（residual） 是实际观测值与模型预测值之间的差异，包括噪声成分。
因此，在时间序列模型 $x_{t} = m_{t} + s_{t} + e_{t}$ 中，噪声就是 $e_{t}$ ，它通常表示为服从某种概率分布的随机变量（如白噪声、高斯白噪声等），是建模过程中用来描述不规则波动的重要部分。

均值性质

均值具有线性性(线性性)

$E (a X + b Y) = a E (X) + b E (Y)$

常数的期望即本身(常数性)

$E (c) = c$

独立随机变量和的均值 == 独立随机变量均值的和 (独立性)

$E (\frac{1}{n} \sum_{i = 1}^{n} X_{i}) = \frac{1}{n} \sum_{i = 1}^{n} E (X_{i})$

独立随机变量积的均值 == 独立随机变量均值的积 (独立性)

$E (X_{1} X_{2} \dots X_{n}) = E (X_{1}) \cdot E (X_{2}) \cdot \dots \cdot E (X_{n})$

均值与随机过程

Moving Average Series

$v_{t} = \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})$

$E (v_{t}) ?$

根据均值的线性性,可得:

\begin{aligned} E (v_{t}) & = \frac{1}{3} E (w_{t - 1} + w_{t} + w_{t + 1}) \\ = \frac{1}{3} E (w_{t - 1}) + \frac{1}{3} E (w_{t}) + \frac{1}{3} E (w_{t + 1}) \\ = 0 \end{aligned}

性质：⽆论多少个⽆关噪声求和，期望都是0

Random Walk with Drift

$x_{t} = δ t + \sum_{j = 1}^{t} w_{j}$

$E (x_{t}) ?$

注意这里 $t$ 是一个固定的时间索引,需要看成常数!

根据均值的线性性与常数型,可得

\begin{aligned} E (x_{t}) & = E (δ t + \sum_{j = 1}^{t} w_{j}) \\ = δ t + \sum_{j = 1}^{t} E (w_{j}) \\ = δ t \end{aligned}

性质:在具有线性趋势的时间序列模型中，期望值反映的是时间序列的趋势部分，而不受随机噪声的影响

Signal Plus Noise

$x_{t} = A \cos (2 π ω t + ϕ) + w_{t}$

$E (x_{t}) ?$

与上个Example一样,同样t也是看成常数

\begin{aligned} E (x_{t}) & = E (A \cos (2 π ω t + ϕ) + w_{t}) \\ = A \cos (2 π ω t + ϕ) + E (w_{t}) \\ = A \cos (2 π ω t + ϕ) \end{aligned}

噪声不会影响时间序列的期望值

协方差

协方差的定义

协方差 $Cov (X, Y)$ 是一种用来衡量两个随机变量之间线性关系的统计量。它反映了两个变量如何一起变化。

$Cov (X, Y) = E [(X - E [X]) (Y - E [Y])]$

$E [X]$ 和 $E [Y]$ 分别是随机变量 $X$ 和 $Y$ 的期望值
$(X - E [X])$ 表示随机变量 $X$ 偏离其期望值的程度
$(Y - E [Y])$ 表示随机变量 $Y$ 偏离其期望值的程度
协方差实际上是对 $(X - E [X])$ 和 $(Y - E [Y])$ 这两个偏差的乘积的期望

均值的平方转换为方差/协方差

在 $E [X] = 0$ 时，可以将 $E [X^{2}]$ 转换为 $Var (X)$

尤其在：

E [w_{t}^{2}] = σ_{w}^{2}

协方差的意义

如果协方差为正，意味着这两个变量趋向于同方向变化
如果协方差为负，意味着它们趋向于相反方向变化
如果协方差接近 0，意味着两个变量之间没有线性关系

协方差的性质

对称性

$Cov (X, Y) = Cov (Y, X)$

退化方差

方差是协方差的特例。当 $X = Y$ 时，协方差就是随机变量的方差：

$Cov (X, X) = Var (X)$

缩放不变性

假设 $a$ 和 $b$ 是常数，则：

$Cov (a X + b, Y) = a \cdot Cov (X, Y)$

这意味着，如果对一个随机变量进行线性变换，它的协方差会按比例缩放

同样的，常数不会影响协方差和的值

分配性质

对于三个随机变量 $X$ 、 $Y$ 和 $Z$

$Cov (X + Y, Z) = Cov (X, Z) + Cov (Y, Z)$

$Cov (X, Y + Z) = Cov (X, Y) + Cov (X, Z)$

Autocovariance function 自协方差函数 $γ (s, t)$

用于衡量相关性

$γ (s, t) = Cov (X_{s}, X_{t}) = E [(X_{s} - μ_{s}) (X_{t} - μ_{t})]$

是用来衡量同一随机过程中不同时间点对应的随机变量之间的线性相关性

为什么时间点之间可以衡量协方差？衡量协方差不应该是利用随机变量，代表的是多个值吗？
虽然 $s$ 和 $t$ 表示的是时间点, 但是对于随机过程 ${X_{t}}$ 来说, $s$ 和 $t$ 代表的是 $X_{s}$ 和 $X_{t}$ 随机变量, 但是实际上我们关心的是 $X_{s}$ 和 $X_{t}$ 随机变量之间的协方差

当 $γ (s, t) = 0$ 代表无相关性
当 $s = t$ ，协方差退化为方差

为什么当 $s = t$ ，协方差退化为方差?
从公式理解:
$μ_{s} = E [X_{s}]$
$V a r (X_{s}) = E [(X_{s} - E [X_{s}])^{2}] = E [(X_{s} - μ_{s}) (X_{s} - μ_{s}) = C o v (X_{s}, X_{s})$
从意义理解:
协方差衡量的是两个不同随机变量之间的关系。当我们讨论同一个随机变量的协方差（即 $s = t$ ），这个度量变成了它自身随机变量的波动性，即方差

协方差与随机过程

Example1 - white noise

$white noise {w_{t}}$

$γ (s, t) ?$

当 $s = t$ 时, 方差退化成协方差

$γ (s, s) = C o v (w_{s}, w_{s}) = V a r (w_{s}) = σ^{2}$

当 $s \neq t$ 时, 由于白噪声在不同时间点中是互相独立的

$γ (s, r) = C o v (w_{s}, w_{r}) = 0$

或者我们利用公式推导:
$C o v (w_{s}, w_{r}) = E [(w_{s} - μ_{s}) (w_{r} - μ_{r})]$
由于 $w_{s} = μ_{s}$ , $w_{r} = μ_{r}$
$C o v (w_{s}, w_{r}) = 0$

Example2

$v_{t} = \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})$

$γ (s, t) ?$

\begin{aligned} γ (s, t) & = cov (v_{s}, v_{t}) \\ = cov {\frac{1}{3} (w_{s - 1} +_{s} + w_{s + 1}), \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})} \end{aligned}

根据缩放不变形,各提取 $\frac{1}{3}$

\begin{aligned} γ (s, t) & = cov {\frac{1}{3} (w_{s - 1} +_{s} + w_{s + 1}), \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})} \\ = \frac{1}{9} cov {(w_{s - 1} + w_{s} + w_{s + 1}), (w_{t - 1} + w_{t} + w_{t + 1})} \end{aligned}

根据协方差的分配性质

γ (s, t) = \frac{1}{9} (Cov (w_{s - 1}, w_{t - 1}) + Cov (w_{s - 1}, w_{t}) + Cov (w_{s - 1}, w_{t + 1}) + \dots)

这是一个对所有组合 $(w_{s - 1}, w_{t - 1})$ 、 $(w_{s}, w_{t})$ 等的协方差求和的过程

由 Example1 我们知道, 对于白噪声, 任意不同时刻的协方差都为0, 换言之, 在所有组合中, 只有时间相等的协方差为非0, 且为 $σ^{2}$

当 $s = t$ 时

γ (t, t) = \frac{1}{9} (cov (w_{t - 1}, w_{t - 1}) + cov (w_{t}, w_{t}) + cov (w_{t + 1}, w_{t + 1})) = \frac{3}{9} σ_{w}^{2}

当 $s = t + 1 / s = t - 1$ 时

γ (t + 1, t) = \frac{1}{9} [cov (w_{t}, w_{t}) + cov (w_{t + 1}, w_{t + 1})] = \frac{2}{9} σ_{w}^{2}

当 $s = t + 2 / s = t - 2$ 时

γ (t + 2, t) = \frac{1}{9} cov (w_{t + 1}, w_{t + 1}) = \frac{1}{9} σ_{w}^{2}

综上:

γ (s, t) = {\begin{cases} \frac{3}{9} σ_{w}^{2} & s = t, \\ \frac{2}{9} σ_{w}^{2} & | s - t | = 1, \\ \frac{1}{9} σ_{w}^{2} & | s - t | = 2, \\ 0 & | s - t | > 2 \end{cases}

Example3 - random walk

$random walk v_{t} = \sum_{j = 1}^{t} w_{j}$

$γ (s, t) ?$

$v_{t}$ 是多个白噪声的求和, $s, t$ 代表求和的时间长度

由 Example2 我们知道, 要求多个白噪声对应随机变量的协方差, 只需要看在数轴上重合了多少个点

γ (s, t) = cov (x_{s}, x_{t}) = cov (\sum_{j = 1}^{s} w_{j}, \sum_{k = 1}^{t} w_{k}) = min {s, t} σ_{w}^{2}

直观理解 随机游走 random walk ：

某个变量（如价格、位置等）从一个初始点开始，每一步都根据一个随机增量进行更新。这个增量可能是正的也可能是负的，因此随机游走的轨迹看起来像是一条无规则的曲线。

Random Walk

随机游走的方差随着时间增加而增大。因此，随机游走的统计特性不是恒定的。

Lecture2

Strict stationarity 严格平稳性

定义：当满足 $\forall h, \forall k$ （任意h、k）:

(x_{t_{1}}, \dots, x_{t_{k}}) and (x_{t_{1} + h}, \dots, x_{t_{k} + h})

有相同的联合分布的时候，时间序列 $X_{t}$ 被称为严格平稳（或强平稳）

通常利用公式：

P {x_{t_{1}} \leq c_{1}, \dots, x_{t_{k}} \leq c_{k}} = P {x_{t_{1} + h} \leq c_{1}, \dots, x_{t_{k} + h} \leq c_{k}}

来表示。

这种时间平移不影响联合分布的性质，这也意味着时间序列的所有统计特性（如均值、方差、协方差等）不会随着时间的变化而改变。

为什么 $P$ 中仅仅要求“小于某个值 $c$ ”？
GPT:
这是因为在概率论中，事件通常是用不等式表示的。比如说，表示随机变量xt，落在某个范围之内的概率，我们通常用 $x_{t_{1}} \leq c 1$ 这样的不等式来描述。它表示“时间点 $t_{1}$ 处的随机变量 $X_{t_{1}}$ ，小于等于 $c_{1}$ 的概率“。类似地，公式中表达的是多个随机变量同时满足某些不等式的概率。
对于严格平稳性来说，关心的是这些随机变量在不同时间点是否有相同的联合分布。通过不等式 $P {x_{t_{1}} \leq c_{1}, \dots, x_{t_{k}} \leq c_{k}}$ ，我们可以测量在多个时间点上这些变量同时满足条件的概率。如果这些概率在任何时间偏移 $h$ 下都相等，那么我们就说这个时间序列是严格平稳的。

为什么严格平稳性是要求联合分布的概率相同，⽽不是⽅差和均值？
GPT：
严格平稳性要求联合分布的概率相同，而不仅仅是均值和方差相同，是因为它关注随机过程的完整统计特性。均值和方差只是部分特征，无法完全描述过程的行为，而联合分布包含了所有可能的统计信息，确保了过程在时间上的一致性。严格平稳性因此为分析和预测随机过程提供了一个更强大和全面的框架。

严格平稳性和相关性的关系：

推导过程：从多维联合分布到二维联合分布（ $k = 2$ ）

\begin{aligned} P {x_{t_{1}} \leq c_{1}, \dots, x_{t_{k}} \leq c_{k}} & = P {x_{t_{1} + h} \leq c_{1}, \dots, x_{t_{k} + h} \leq c_{k}} \\ ⇓ (k = 2, t_{1} = s, t_{2} = t) \\ P {x_{s} \leq c_{1}, x_{t} \leq c_{2}} & = P {x_{s + h} \leq c_{1}, x_{t + h} \leq c_{2}} \\ ⇓ \\ γ (s, t) & = γ (s + h, t + h) \end{aligned}

Weak stationarity 弱平稳性

与严格平稳性相比，弱平稳性对时间序列的约束条件较少，而不需要每个时间点的联合分布都保持不变。

定义：时间序列 ${X_{t}}$ 被称为弱平稳的，如果满足以下条件：

均值独立于时间

E [X_{t}] = μ, \forall t

协方差只依赖于时间间隔

γ (s, t) = γ (s + h, t + h)

自协方差函数 $γ (h)$

假设时间序列是平稳的

公式定义：

γ (h) = γ (t, t + h) = Cov (X_{t}, X_{t + h}) = E [(X_{t} - μ) (X_{t + h} - μ)]

其中：

$h$ 是时间差（lag），表示两个时刻之间的时间间隔
$μ$ 是时间序列的均值，假设时间序列是弱平稳的（均值不随时间变化）

$γ (h)$ 的性质

$γ (0) \geq 0$
$| γ (h) | \leq r (0)$ 协方差永远不可能与自己的协方差(方差)更大，换言之自己与自己的相关性永远更大
$γ (h) = γ (- h)$ 由于协方差性质 $C o v (X, Y) = C o v (Y, X)$ ，可得自协方差具有对称性

时间序列乘积转换为 $γ (h)$

时间序列 $x_{t}$ 满足弱平稳性条件时， $E (x_{t} x_{t + h}) = γ (h)$

在PACF的时候会利用这个公式

Autocorrelation Function 自相关函数 ACF

定义：

随机过程 ${X_{t}}$ 的自相关函数 ACF $ρ (h)$ 衡量的是平稳时间序列在滞后h个时间单位时的相关性。其定义如下：

ρ (h) = \frac{γ (t + h, t)}{\sqrt{γ (t + h, t + h) γ (t, t)}} = \frac{γ (h)}{γ (0)}

为什么不是 $ρ (h) = \frac{γ (t + h)}{γ (t)}$ ？
从公式上看， $γ (h) = Cov (X_{t}, X_{t + h})$ 后面的 $Cov$ 已经带了 $t$
从定义上看，由于我们假定了随机过程满足平稳性，自协方差函数只依赖于时间滞后 $h$ ，而不是具体的时刻 $t$

由性质 $| γ (h) | \leq r (0)$ 可得：

- 1 \leq ρ (h) \leq 1

平稳性、ACF与随机过程

white noise

$white noise {w_{t}}, S t a t i o n a r y ?$

Step1: 考察方差是否独立于 $t$ ?

E (w_{t}) = 0 (independent of t)

Step2: 考察协方差是否只依赖于时间间隔 $h$ ？

由随机噪声协方差的性质我们知道：

γ (s, t) = Cov (w_{s}, w_{t}) = {\begin{cases} σ_{w}^{2} & s = t (s - t = 0) \\ 0 & s \neq t (s - t \neq 0) \end{cases} (depends only on | s - t |)

故白噪声 $white noise {w_{t}}$ 满足弱平稳性

γ (h) = {\begin{cases} σ_{w}^{2} & h = 0 \\ 0 & h \neq 0 \end{cases}

我们可以得到ACF图像

random walk

$random walk x_{t} = \sum_{j = 1}^{t} w_{j}, S t a t i o n a r y ?$

Step1: 考察方差是否独立于 $t$ ?

E (w_{t}) = E (\sum_{j = 1}^{t} w_{t}) = \sum_{j = 1}^{t} E (w_{t}) = 0 (independent of t)

Step2: 考察协方差是否只依赖于时间间隔 $h$ ？

由random walk 的协方差性质：

γ (s, t) == min {s, t} σ_{w}^{2} (depends on both s and t)

由于随机游走的协方差依赖与 $s 、 t$ ，故随机游走 $random walk x_{t} = \sum_{j = 1}^{t} w_{j}$ 不满足弱平稳性

MA(1) process

MA(1) process (moving average)

$x_{t} = w_{t} + θ w_{t - 1} {w_{t}} \sim wn (0, σ^{2})$

$S t a t i o n a r y ?$

Step1: 考察方差是否独立于 $t$ ?

E (x_{t}) = E (w_{t} + θ w_{t - 1}) = E (w_{t}) + θ E (w_{t - 1}) = 0 (independent of t)

Step2: 考察协方差是否只依赖于时间间隔 $h$ ？

\begin{aligned} γ (s, t) & = Cov (x_{s}, x_{t}) \\ = Cov (w_{s} + θ w_{s - 1}, w_{t} + θ w_{t - 1}) \\ = Cov (w_{s}, w_{t}) + θ Cov (w_{s}, w_{t - 1}) + θ Cov (w_{s - 1}, w_{t}) + θ^{2} Cov (w_{s - 1}, w_{t - 1}) \end{aligned}

由白噪声的协方差性质可得：当时间相同时，协方差为 $σ^{2}$ ，否则为0

当 $s = t$ 时：

$γ (t, t) = Cov (w_{t}, w_{t}) + θ^{2} Cov (w_{t - 1}, w_{t - 1}) = (1 + θ^{2}) σ^{2}$

当 $s = t - 1$ 或 $t = s - 1$ 时：

$γ (t - 1, t) = θ Cov (w_{t - 1}, w_{t - 1}) = θ σ^{2}$

$γ (s, s - 1) = θ Cov (w_{s - 1}, w_{s - 1}) = θ σ^{2}$

当 $| s - t | \geq 2$

$γ (s, t) = 0$

综上

γ (s, t) = {\begin{cases} (1 + θ^{2}) σ^{2}, & s = t \\ θ σ^{2}, & | s - t | = 1 \\ 0, & | s - t | \geq 2 \end{cases} (depends only on | s - t |)

故 MA(1) process 是弱平稳的

γ (h) = {\begin{cases} (1 + θ^{2}) σ^{2}, & h = 0 \\ θ σ^{2}, & | h | = 1 \\ 0, & | h | \geq 2 \end{cases}

知识补充：无穷级数

无穷级数 (shuxuele.com)

对于无穷级数 $S = a + a r + a r^{2} + a r^{3} + \dots$ , 求 $S$ ?

当 $| r | < 1$ 时，几何级数的和可以求得有限值，即这个级数收敛。如果 $| r | \geq 1$ ，则该级数不收敛

$当 | r | < 1$ 几何级数的推导：

假设级数的和为 $S$

$S = a + a r + a r^{2} + a r^{3} + \dots$

将该表达式乘以 $r$

$r S = a r + a r^{2} + a r^{3} + a r^{4} + \dots$

将 S 和 rS 相减，并提取公因子

$S (1 - r) = a$

解得 S

$S = \frac{a}{1 - r} when | r | < 1$

知识补充：几何级数

$S_{n} = a + a r + a r^{2} + \dots + a r^{n - 1}$

$S_{n} = \frac{a (1 - r^{n})}{1 - r}, if r \neq 1$

AR(1) process

AR(1) process (autoregressive)

$x_{t} = ϕ x_{t - 1} + w_{t}, 0 < | ϕ | < 1, {w_{t}} \sim wn (0, σ^{2})$

$w_{t} is uncorrelated with x_{s} for s < t$

$S t a t i o n a r y ?$

先展开递归公式：

\begin{aligned} x_{t} & = ϕ x_{t - 1} + w_{t} \\ = ϕ (ϕ x_{t - 2} + w_{t - 1}) + w_{t} \\ = ϕ^{2} x_{t - 2} + ϕ w_{t - 1} + w_{t} \\ = \dots \\ = \sum_{j = 0}^{\infty} ϕ^{j} w_{t - j} \end{aligned}

Step1: 考察方差是否独立于 $t$ ?

易得： $E (x_{t}) = 0 (independent of t)$

Step2: 考察协方差是否只依赖于时间间隔 $h$ ？

$γ (s, t) = Cov (\sum_{j = 0}^{\infty} ϕ^{j} w_{s - j}, \sum_{j = 0}^{\infty} ϕ^{j} w_{t - j})$

当 $s = t$ 时：

\begin{aligned} r (t, t) & = cov (\sum_{j = 0}^{\infty} ϕ^{j} w_{t - j}, \sum_{i = 0}^{\infty} ϕ^{i} w_{t - i}) \\ = \sum_{j = 0}^{\infty} ϕ^{2 j} cov (w_{t - j}, w_{t - j}) \\ = \frac{σ^{2}}{1 - ϕ^{2}} \end{aligned}

在连续求和中，由于协方差的分配性质，前一个求和项都会后一个求和项形成组合，只有下标相等的为非0项，得到 $\sum_{j = 0}^{\infty} ϕ^{2 j} cov (w_{t - j}, w_{t - j})$

利用无穷级数:

$S = \frac{a}{1 - r} a = cov (w_{t - j}, w_{t - j}) r = ϕ^{2}$

$S = \frac{σ^{2}}{1 - ϕ^{2}}$

当 $s \neq t$ 时， $s = t + h$ ：

$s$ 比 $t$ 多递归了 $h$ 项，得：

$x_{t + h} = \sum_{j = 0}^{h - 1} ϕ^{j} w_{t + h - j} + \sum_{j = h}^{\infty} ϕ^{j} w_{t + h - j}$

对于 $x_{t}$ 的求和项 $\sum$ 是从0开始的，我们对从 $h$ 开始的 $\sum$ (上式的第二项)进行符号替换

\begin{aligned} \sum_{j = h}^{\infty} ϕ^{j} w_{t + h - j} & \overset{j=>k+h}{==} \sum_{k + h = h}^{\infty} ϕ^{k + h} w_{t + h - (k + h)} \\ == \sum_{k = 0}^{\infty} ϕ^{k + h} w_{t - k} \\ == ϕ^{h} \sum_{k = 0}^{\infty} ϕ^{k} w_{t - k} \\ \overset{k=>j}{==} ϕ^{h} \sum_{j = 0}^{\infty} ϕ^{j} w_{t - j} \\ == ϕ^{h} x_{t} \end{aligned}

即

$x_{t + h} = \sum_{j = 0}^{h - 1} ϕ^{j} w_{t + h - j} + ϕ^{h} x_{t}$

可得

\begin{aligned} r (t + h, t) & = cov (x_{t + h}, x_{t}) \\ = cov (\sum_{j = 0}^{h - 1} ϕ^{j} w_{t + h - j} + ϕ^{h} x_{t}, x_{t}) \end{aligned}

由于白噪声在时间 $t \to t + h$ 与 $0 \to t$ 没有相等的下标，即 $\sum_{j = 0}^{h - 1} ϕ^{j} w_{t + h - j}$ 与 $x_{t}$ 之间组合的所有项都为0，可得

\begin{aligned} r (t + h, t) & = cov (\sum_{j = 0}^{h - 1} ϕ^{j} w_{t + h - j} + ϕ^{h} x_{t}, x_{t}) \\ = cov (ϕ^{h} x_{t}, x_{t}) \\ = ϕ^{h} cov (x_{t}, x_{t}) \\ = \frac{ϕ^{h} σ^{2}}{1 - ϕ^{2}} (h > 0) \end{aligned}

故 AR(1) process 是弱平稳的

γ (h) = {\begin{cases} \frac{σ^{2}}{1 - ϕ^{2}}, & h = 0 \\ ϕ^{h} \frac{σ^{2}}{1 - ϕ^{2}}, & | h | \geq 1 \end{cases}

利用ACF判断白噪声

白噪声的 $ρ (h)$ 有如下性质：

(\begin{matrix} \hat{ρ} (1) \\ \hat{ρ} (2) \\ ⋮ \\ \hat{ρ} (K) \end{matrix}) \sim A N (0, \frac{1}{n} I)

这说明当样本量较大时，自相关系数估计量会趋于类正态分布（AN）

意味着可以利用正态分布的性质来推断这些估计值是否为白噪声

样本ACF可以帮助我们识别许多非白噪声(甚至非平稳)时间序列

image-2024100760307465 PM

样本ACF可以帮助我们识别许多非白噪声(甚至非平稳)时间序列

image-2024100760348051 PM

Backshift and forward-shift operator

Backshift operator：

B x_{t} = x_{t - 1}

B^{k} x_{t} = x_{t - k}

Forward-shift operator:

x_{t} = B^{- 1} x_{t - 1}

First difference operator:

\nabla x_{t} = x_{t} - x_{t - 1} \nabla x_{t} = (1 - B) x_{t}

Differences with order d:

\nabla^{d} = (1 - B)^{d}

差分

对于时间序列模型的基本结构 $x_{t} = m_{t} + s_{t} + e_{t}$ 来说，我们想要去除趋势项 $m_{t}$ 的方法就是差分

The first difference eliminates a linear trend 一阶差分消除了线性（一次）趋势:

eg:

x_{t} = β_{0} + β_{1} t + y_{t}

\begin{aligned} \nabla x_{t} & = x_{t} - x_{t - 1} \\ = β_{0} + β_{1} t + y_{t} - (β_{0} + β_{1} (t - 1) + y_{t - 1}) \\ = β_{1} + y_{t} - y_{t - 1} \end{aligned}

The second order difference eliminates a quadratic trend 二阶差分消除了二次趋势:

eg：

x_{t} = β_{0} + β_{1} t + β_{2} t^{2} + y_{t}

\begin{aligned} \nabla x_{t} & = x_{t} - x_{t - 1} \\ = β_{1} - β_{2} + 2 β_{2} t + y_{t} - y_{t - 1} \\ \nabla^{2} x_{t} & = \nabla (\nabla x_{t}) \\ = 2 β_{2} + y_{t} - 2 y_{t - 1} + y_{t - 2} \end{aligned}

求二阶差分的方法是先求出一阶差分，对于一阶差分的表达式再求一次差分

如果我们想要去除季节项 $e_{t}$ 我们也可以使用差分

x_{t} = s_{t} + y_{t} (where s_{t} = s_{t - p} for all t)

季节的时间差为 $p$

\nabla_{p} x_{t} = x_{t} - x_{t - p} = (1 - B^{p}) x_{t} \nabla_{p} x_{t} = (s_{t} + y_{t}) - (s_{t - p} + y_{t - p}) = y_{t} - y_{t - p} = \nabla_{p} y_{t}

Lecture3 / Lecture4

Linear process

Linear process（线性过程） 是时间序列分析中的一个基本概念，用于描述当前时间序列值与过去白噪声项的线性组合

x_{t} = μ + \sum_{j = - \infty}^{\infty} ψ_{j} w_{t - j}, where {w_{t}} \sim wn (0, σ_{w}^{2}) and μ, ψ_{j} are parameters satisfying \sum_{j = - \infty}^{\infty} | ψ_{j} | < \infty

Linear process 的分布为：

\begin{aligned} E [x_{t}] & = μ + \sum_{j = - \infty}^{\infty} ψ_{j} E [w_{t - j}] = μ + 0 = μ \\ γ (h) & = Cov (\sum_{j = - \infty}^{\infty} ψ_{j} w_{t - j}, \sum_{k = - \infty}^{\infty} ψ_{k} w_{t + h - k}) = σ^{2} \sum_{j = - \infty}^{\infty} ψ_{j + h} ψ_{j} \end{aligned}

我们知道 $random walk x_{t} = \sum_{j = 1}^{t} w_{j}$ 是不平稳的，为了平稳我们需要保证 $\sum_{j = - \infty}^{\infty} | ψ_{j} | < \infty$

平稳的时间序列在进行线性变化后也是平稳的，同样的将一个平稳的随机过程作用于线性过程中，整个线性过程也会保持平稳

AR(p)

假设当前值和过去值之间存在关系，当前时间 $x_{t}$ 可以被解释为一个包含为 $x_{t - 1}, x_{t - 2}, . . ., x_{t - p}$ 的线性关系，允许根据观测数据(当前和过去的值)预测未来的值，记为 $A R (p)$

用数学表示为：

\begin{aligned} x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + \dots + ϕ_{p} x_{t - p} + w_{t} \\ w_{t} \sim wn (0, σ_{w}^{2}) \\ ϕ_{1}, ϕ_{2}, \dots, ϕ_{p} are constants (ϕ_{p} \neq 0) \end{aligned}

同时： $W_{t} uncorrelated with X_{s} (t > s)$

也可以形式表示为： $\hat{P_{t}} = E (x_{t} | x_{t - 1}, . . ., x_{x - p})$

利用Backshift operator表示为：

\begin{aligned} x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + \dots + ϕ_{p} x_{t - p} & + w_{t} \\ ⇓ \\ (1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p}) x_{t} & = w_{t} \\ ⇓ \\ ϕ (B) x_{t} & = w_{t} \end{aligned}

Mean and autocovariance function:

μ = 0, γ (h) = ϕ_{1} γ (h - 1) + ϕ_{2} γ (h - 2) + \dots + ϕ_{P} γ (h - P)

详情请搜索Yule-Walker方程及矩阵

(\begin{matrix} γ (0) \\ γ (1) \\ γ (2) \\ ⋮ \\ γ (P - 1) \end{matrix}) = (\begin{matrix} 1 & ϕ_{1} & ϕ_{2} & \dots & ϕ_{P} \\ ϕ_{1} & 1 & ϕ_{2} & \dots & ϕ_{P - 1} \\ ϕ_{2} & ϕ_{1} & 1 & \dots & ϕ_{P - 2} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ_{P - 1} & ϕ_{P - 2} & \dots & ϕ_{1} & 1 \end{matrix}) (\begin{matrix} γ (0) \\ γ (1) \\ ⋮ \\ γ (P - 1) \end{matrix}) + (\begin{matrix} σ_{w}^{2} \\ 0 \\ ⋮ \\ 0 \end{matrix})

AR(1) model

当 Autoregressive models $p = 1$ , 模型变为 AR(1) model :

x_{t} = ϕ_{1} x_{t - 1} + w_{t}, 0 < | ϕ | < 1

根据 AR1 平稳性推理的结论，我们知道：

将 $x$ 递归展开

x_{t} = \sum_{j = 0}^{\infty} ϕ^{j} w_{t - j}

感受递归推导的复杂性, 这也是为什么我们要引入Backshift op来简化计算

Mean and autocovariance function:

μ = 0; γ (h) = \frac{σ_{w}^{2} ϕ^{h}}{1 - ϕ^{2}}

In terms of the backshift operator:

\begin{aligned} x_{t} = ϕ_{1} x_{t - 1} & + w_{t} \\ ⇓ \\ (1 - ϕ_{1} B) x_{t} & = w_{t} \end{aligned}

\begin{aligned} ϕ (B) x_{t} & = w_{t} \\ ⇓ \\ ϕ (B) & = 1 - ϕ B \end{aligned}

Linear process 表示：

\begin{aligned} x_{t} & = \sum_{j = 0}^{\infty} ϕ^{j} w_{t - j} \\ ⇓ \\ x_{t} & = ψ (B) w_{t} \\ ⇓ \\ ψ (B) & = \sum_{j = 0}^{\infty} ϕ^{j} B^{j} \\ = \frac{1}{(1 - ϕ B)} \end{aligned}

注意 $B^{j}$ 是对于 $w_{t}$ 的偏移，由于递推公式展开时会展开 $w_{t} \to w_{t - j}$ 项，所以可以看成 $\sum_{j = 0}^{\infty} B^{j} w_{t}$

Explosive AR Models and Causality

As AR(1) process with $| ϕ |$ > 1, such processes are called explosive because the values of the time series quickly become large in magnitude.

We can, however, modify that argument to obtain a stationary model as follows. Write $x_{t + 1} = ϕ x_{t} + w_{t + 1}$ , in which case,

x_{t} = - \sum_{j = 0}^{\infty} ϕ^{- j} w_{t + j}

which means the process is stationary, but it is also future dependent.

When a process does not depend on the future, such as the AR(1) when $| ϕ | < 1$ , we will say the process is causal.

判断 AR(n) Causality 的方法

$A R (p)$ :

\begin{aligned} y_{t} & = ϕ_{1} y_{t - 1} + ϕ_{2} y_{t - 2} + \dots + ϕ_{n} y_{t - n} + ϵ_{t} \\ y_{t} - ϕ_{1} y_{t - 1} - ϕ_{2} y_{t - 2} - \dots - ϕ_{n} y_{t - n} & = ϵ_{t} \end{aligned}

写成Autoregressive operator:

ϕ (B) = 1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p},

写出特征方程:

1 - ϕ_{1} z - ϕ_{2} z^{2} - \dots - ϕ_{n} z^{p} = 0

我们解出所有的解析解,如果所有的解都满足 $| z | > 1$ ,即 $A R (p)$ 是Causality的,否则不是

怎么理解解析解都必须大于1?
不知道,还在想....

Every Explosion Has a Cause

必须理顺各种表达形式:

autoregressive operator: $ϕ (B) x_{t} = w_{t}$
MA( $\infty$ ) Representation: $x_{t} = ψ (B) w_{t}$

结论：

$ψ (B) = ϕ^{- 1} (B)$

Why？观察公式：

\begin{aligned} ϕ (B) x_{t} & = w_{t} \\ x_{t} & = ψ (B) w_{t} \end{aligned}

易证 $ψ (B) * ϕ (B) = 1$

推导的核心就是将 $B$ 看成多项式即可，这也是为什么需要引入 $B$ 的原因

$ϕ (B)$ to $ψ (B)$ by matching

$ψ (B) ϕ (B) = 1$

$(ψ_{0} + ψ_{1} B + ψ_{2} B^{2} + \dots) (1 - ϕ_{1} B - \dots - ϕ_{p} B^{p}) = 1$

Coefficient of:

$B^{0}$ : $ψ_{0} = 1$
$B^{1}$ : $ψ_{1} - ϕ_{1} ψ_{0} = 0$
$B^{2}$ : $ψ_{2} - ϕ_{1} ψ_{1} - ϕ_{2} ψ_{0} = 0$
$B^{3}$ : $ψ_{3} - ϕ_{1} ψ_{2} - ϕ_{2} ψ_{1} - ϕ_{3} ψ_{0} = 0$
$⋮$

简单来说,依次找 $B^{0} . . . B^{n}$ 项, 与右边的系数进行匹配, 后续的MA, ARMA模型的转换也是同样的思想

MA(q)

$M A (q)$ model 定义为:

\begin{aligned} x_{t} = w_{t} + θ_{1} w_{t - 1} + θ_{2} w_{t - 2} + \dots + θ_{q} w_{t - q} \\ w_{t} \sim w n (0, σ_{w}^{2}) \\ θ_{1}, θ_{2}, \dots, θ_{q} (θ_{q} \neq 0) are parameters \end{aligned}

Moving average operator:

\begin{aligned} x_{t} = w_{t} + θ_{1} w_{t - 1} & + θ_{2} w_{t - 2} + \dots + θ_{q} w_{t - q} \\ ⇓ \\ x_{t} = (1 + θ_{1} B + & θ_{2} B^{2} + \dots + θ_{q} B^{q}) w_{t} \\ ⇓ \\ x_{t} = & θ (B) w_{t} \end{aligned}

Mean and autocovariance function:

μ = 0, γ (h) = {\begin{cases} σ_{w}^{2} \sum_{j = 0}^{q - | h |} θ_{j} θ_{j + | h |}, & | h | \leq q \\ 0, & | h | > q \end{cases}

MA(1)

将 $q = 1$ 变为MA(1) model:

x_{t} = w_{t} + θ w_{t - 1}

易得: Mean, autocovariance, and autocorrelation function:

μ = 0; γ (h) = {\begin{cases} (1 + θ^{2}) σ_{w}^{2}, & h = 0 \\ θ σ_{w}^{2}, & h = 1 \\ 0, & h > 1 \end{cases}

ρ (h) = {\begin{cases} \frac{θ}{1 + θ^{2}}, & h = 1 \\ 0, & h > 1 \end{cases}

先移项,然后按照类似于AR(1)的方法

w_{t} = - θ w_{t - 1} + x_{t}

递归展开,得到

\begin{aligned} w_{t} & = \sum_{j = 0}^{\infty} (- θ)^{j} x_{t - j} \\ = \sum_{j = 0}^{\infty} π_{j} x_{t - j} \\ = π (B) x_{t} \end{aligned}

易证:

π (B) θ (B) = 1

Non-uniqueness of MA Models and Invertibility

对于MA模型来说, 可能会出现这种情况:

We note that for an MA(1) model, $ρ (h)$ is the same for $θ$ and $\frac{1}{θ}$ ; try $5$ and $\frac{1}{5}$ , for example. In addition, the pair $σ_{w}^{2} = 1$ and $θ = 5$ yield the same autocovariance function as the pair $σ_{w}^{2} = 25$ and $θ = \frac{1}{5}$ :

γ (h) = {\begin{cases} 26 & h = 0, \\ 5 & h = 1, \\ 0 & h > 1. \end{cases}

对于拥有观测值并尝试预测模型来说,这是一种灾难,因为同样的数据有可能会出现两个模型都匹配; 因此我们需要挑选出一个模型: We will choose the model with an inﬁnite AR representation. Such a process is called an invertible(可逆) process.

换言之, 我们挑选出的MA模型必须可以转换为AR模型

从公式角度上看, 例如MA(1), $w_{t} = - θ w_{t - 1} + x_{t}$ ; 我们都知道AR模型是只有一个 $w_{t}$ 的, 当需要从 $w_{t} = - θ w_{t - 1} + x_{t}$ 变成类似 $w_{t} = \sum_{j = 0}^{\infty} (- θ)^{j} x_{t - j}$ 需要利用递归展开 $w_{w - j}$ , 若需要保证递归和展开后的式子是收敛的, 我们必须保证 $| θ | < 1$ ; 换言之, 只有 $| θ | < 1$ 才能顺利计算 $π (B) = θ^{- 1} (B)$ 得到 $w_{t} = π (B) x_{t}$ 这种"AR形式".

判断 MA(q) Invertibility 的方法

对于一个 MA(q) 模型：

x_{t} = w_{t} + θ_{1} w_{t - 1} + θ_{2} w_{t - 2} + \dots + θ_{q} w_{t - q}

其中 $w_{t}$ 是独立同分布的白噪声序列，我们可以通过其特征方程的根来判断是否可逆。

得到 $θ (B)$

θ (B) = 1 + θ_{1} B + θ_{2} B^{2} + \dots + θ_{q} B^{q}

根据特征方程的根来判断：

1 + θ_{1} + θ_{2} z^{2} + \dots + θ_{q} z^{q} = 0

如果特征方程的根模都大于1，则该 MA 模型是可逆的 (Invertible)。
如果某些根模小于或等于 1，则该 MA 模型是不可逆的 (Non-invertible)。

$θ (B)$ to $π (B)$ by matching

与 $ϕ (B)$ to $ψ (B)$ by matching 的方法一致, 利用matching

Lecture5

ARMA(p,q)

\begin{aligned} x_{t} - ϕ_{1} x_{t - 1} - \dots - ϕ_{p} x_{t - p} & = w_{t} + θ_{1} w_{t - 1} + \dots + θ_{q} w_{t - q} \\ ⇓ \\ ϕ (B) x_{t} & = θ (B) \\ ⇓ \\ x_{t} = ψ (B) w_{t} & = \frac{θ (B)}{ϕ (B)} w_{t} = w_{t} \end{aligned}

Avoid parameter redundancy

为了确保 ARMA 模型是最佳的表达形式，并避免使用不必要的参数，AR 部分和 MA 部分的多项式必须是互质的（没有共同因子）

example:

假设：

x_{t} = 1.5 x_{t - 1} - 0.5 x_{t - 2} + w_{t} - w_{t - 1}

对应的 AR 多项式和 MA 多项式分别是：

$ϕ (z) = 1 - 1.5 z + 0.5 z^{2} = (1 - z) (1 - 0.5 z)$
$θ (z) = 1 - z$

我们可以通过除去这个公因子来简化模型。原本的 ARMA(2,1) 模型实际上可以简化为一个 ARMA(1,0) 模型（即一个 AR(1) 模型）

x_{t} = 0.5 x_{t - 1} + w_{t}

Stationarity

If $ϕ$ and $θ$ have no common factors, a stationary solution to $ϕ (B) x_{t} = θ (B) w_{t}$ exists if and only if all the roots of $ϕ (z)$ satisfy $| z | \neq 1$

Causality

The ARMA(p,q) process is causal if and only if all the roots of $ϕ (z)$ satisfy $| z | > 1$

Invertibility

The ARMA(p,q) process is invertible if and only if all the roots of $θ (z)$ satisfy $| z | > 1$

Example of Stationarity, Causality, Invertibility

$x_{t} = \frac{5}{6} x_{t - 1} - \frac{1}{6} x_{t - 2} + w_{t} - 0.25 w_{t - 2}$

Step1: 移项AR的 $ϕ (B)$ 和MA的 $θ (B)$

$ϕ (B) = 1 - \frac{5}{6} B + \frac{1}{6} B^{2}$

$θ (B) = 1 - \frac{1}{4} B^{2}$

注意移项的时候不要弄错符号！

Step2: 写出求根公式，进行因式分解

$1 - \frac{5}{6} z + \frac{1}{6} z^{2} = 0 ⟹ z_{1} = 2; z_{2} = 3 ⟹ \frac{1}{6} (x - 2) (x - 3) = 0$

$1 - \frac{1}{4} z^{2} = 0 ⟹ z_{1} = 2; z_{2} = - 2 ⟹ \frac{1}{4} (x - 2) (x + 2) = 0$

得： $\frac{1}{6} (x - 2) (x - 3) x_{t} = \frac{1}{4} (x - 2) (x + 2) w_{t}$

注意: 由于我们需要Avoid parameter redundancy, 对于相同的因式需要消除

$\frac{1}{6} (x - 3) x_{t} = \frac{1}{4} (x + 2) w_{t} ⟹ A R M A (1, 1)$

Step3: 进行Causal 和 Invertible 的判断

The roots of $ϕ (B)$ : $z_{1} = 3$ , all roots $| z |$ greater than 1, so ARMA is Causal

The roots of $θ (B)$ : $z_{1} = - 2$ , all roots $| z |$ greater than 1, so ARMA is Invertible

Convert to MA process

For a causal ARMA(p,q) model, we may write:

x_{t} = \sum_{j = 0}^{\infty} ψ_{j} w_{t - j} = ψ (B) w_{t}

can use matching coefficients to find $ψ (B)$

ψ (B) = \frac{θ (B)}{ϕ (B)} ⟹ ϕ (B) ψ (B) = θ (B)

Example: convert ARMA to MA:

x_{t} = 0.9 x_{t - 1} + 0.5 w_{t - 1} + w_{t}

(1 - 0.9 B) x_{t} = (1 + 0.5 B) w_{t}

(1 - 0.9 B) (ψ_{0} + ψ_{1} B + ψ_{2} B^{2} + ψ_{3} B^{3} + \dots) = 1 + 0.5 B

Coefficient of:

$B^{0}$ : $ψ_{0} = 1$
$B^{1}$ : $ψ_{1} - 0.9 ψ_{0} = 0.5 \Rightarrow ψ_{1} = 1.4$
$B^{2}$ : $ψ_{2} - 0.9 ψ_{1} = 0 \Rightarrow ψ_{2} = 1.26$
$B^{3}$ : $ψ_{3} - 0.9 ψ_{2} = 0 \Rightarrow ψ_{3} = 1.134$
$⋮$

ARMA 的自相关函数

方法1: convert ARMA to MA

γ (h) = σ_{w}^{2} (ψ_{0} ψ_{h} + ψ_{1} ψ_{h + 1} + ψ_{2} ψ_{h + 2} + \dots)

方法2: 利用 $γ (h)$ 递推表达式

PACF

引入 PACF 的核心动机是为了克服 ACF 在分析 AR 或 ARMA 模型时的局限性：

For MA(q) models, the ACF will be zero for lags greater than q, and will not be zero at lag $q$ .
For ARMA(q) models, the diagram of ACF will appear Tails off, a gradual decay in the autocorrelation values over time lags.

For example, $x_{t}$ is dependent on $x_{t - 2}$ $x_{t - 1}$ , we have to build $COV (x_{t} - ϕ x_{t - 1}, x_{t - 2} - ϕ x_{t - 1}) = 0$ , by removing (or partial out) the eﬀect $x_{t - 1}$ , which break this chain of dependence.

为什么对于 $x_{t - 2}$ 也要减去 $x_{t - 1}$ , $x_{t - 2}$ 是发生在 $x_{t - 1}$ 之前的，理论上应该是无关的！？虽然 $x_{t - 2}$ 在时间上发生在 $x_{t - 1}$ 之前，但由于 $x_{t - 1}$ 作为 $x_{t}$ 的一个线性预测变量存在， $x_{t - 2}$ 和 $x_{t}$ 的相关性并非独立的，而是通过 $x_{t - 1}$ 这个中介变量传递。为了消除这种中介效应，我们通过去除 $x_{t - 1}$ 对 $x_{t}$ 和 $x_{t - 2}$ 的影响来部分掉这个线性关系，这就是“将 $x_{t} - ϕ x_{t - 1} 与 x_{t - 2} - ϕ x_{t - 1}$ 进行协方差分析”的原因。这一步骤的目的是破除依赖链，从而仅考察与白噪声 w_t 的直接相关性。

Definition:

对于 $x_{t + h}$ 和 $x_{t}$ 来说，需要“解除依赖项”是 ${x_{t + h - 1}, x_{t + h - 2}, \dots, x_{t + 1}}$ ，我们可以得到regression项：

${\hat{x}}_{t + h} = β_{1} x_{t + h - 1} + β_{2} x_{t + h - 2} + \dots + β_{h - 1} x_{t + 1}$
${\hat{x}}_{t} = β_{1} x_{t + 1} + β_{2} x_{t + 2} + \dots + β_{h - 1} x_{t + h - 1}$

$β_{1} \to β_{t - 1}$ 在 ${\hat{x}}_{t + h}$ 和 ${\hat{x}}_{t}$ 是相同的；对于依赖项的排列： ${\hat{x}}_{t + h}$ 是从后往前； ${\hat{x}}_{t + h}$ 是从前往后

“解除依赖项”是 $x_{t + h}$ 和 $x_{t}$ 中间的元素！

The partial autocorrelation function (PACF) of a stationary process, $x_{t}$ , denoted $ϕ_{h h}$ , for $h = 1, 2, \dots$ , is

ϕ_{11} = corr (x_{t + 1}, x_{t}) = ρ (1),

ϕ_{h h} = corr (x_{t + h} - {\hat{x}}_{t + h}, x_{t} - {\hat{x}}_{t}), h \geq 2

注意这里是 $ρ (1)$ 而不是 $γ (1)$ , 这里是 $c o r r$ 而不是 $c o v$ !

参考 $ρ$ 的公式： $ρ (h) = \frac{γ (t + h, t)}{\sqrt{γ (t + h, t + h) γ (t, t)}}$

我们可以得到

ϕ_{h h} = corr (x_{t + h} - {\hat{x}}_{t + h}, x_{t} - {\hat{x}}_{t}) = \frac{c o v (x_{t + h} - {\hat{x}}_{t + h}, x_{t} - {\hat{x}}_{t})}{\sqrt{v a r (x_{t + h} - {\hat{x}}_{t + h})} \sqrt{v a r (x_{t} - {\hat{x}}_{t})}}, h \geq 2

PACF of an AR(1)

Consider the PACF of the AR(1) process given by $x_{t} = ϕ x_{t - 1} + w_{t}$ , with $| ϕ | < 1$ .
By definition, $ϕ_{11} = ρ (1) = ϕ$ . To calculate $ϕ_{22}$ , consider the regression of $x_{t + 2}$ on $x_{t + 1}$ , say, ${\hat{x}}_{t + 2} = β x_{t + 1}$ . We choose $β$ to minimize

m i n i m i z e E (x_{t + 2} - {\hat{x}}_{t + 2})^{2} = E (x_{t + 2} - β x_{t + 1})^{2} = γ (0) - 2 β γ (1) + β^{2} γ (0) .

利用 $E (x_{t} x_{t + h}), μ = 0 \to E (x_{t} x_{t + h}) = γ (h)$ 这个性质

二次方程的最优化问题利用求导找零点即可解决

为什么要进行minimize? 以AR(1)举例：我们的目的是将 $x_{t + 2}$ 去除 $x_{t + 1}$ 的影响, 从而实现更高的独立性而 $x_{t + 2}$ 是由 $x_{t + 1}$ 通过某种“变化”而来，用公式表示为 $x_{t + 2} = ϕ x_{t + 1} + w_{t + 2}$ , 我们求 $β$ 本质上就是在逼近这个 $ϕ$ , 尽可能去除 $ϕ x_{t + 1}$ 而保留 $w_{t + 2}$ , 从公式来看就是 $m i n i m i z e E (x_{t + 2} - {\hat{x}}_{t + 2})^{2}$ 所以我们可以看到，在AR(1)最小化问题中， $β$ 最终是等于 $ϕ$ 的，但是在更加复杂的AR模型中，我们就需要利用minimize来求解！

Hence,

\begin{aligned} ϕ_{22} & = corr (x_{t + 2} - {\hat{x}}_{t + 2}, x_{t} - {\hat{x}}_{t}) \\ = corr (x_{t + 2} - ϕ x_{t + 1}, x_{t} - ϕ x_{t + 1}) \\ = corr (w_{t + 2}, x_{t} - ϕ x_{t + 1}) = 0 \end{aligned}

Thus, $ϕ_{22} = 0$ . 我们可以推广到 AR(p) 模型中：

ϕ_{h h} = 0 for all h > p

PACF of an Invertible MA(q)

For an invertible MA( $q$ ), we can write $x_{t} = - \sum_{j = 1}^{\infty} π_{j} x_{t - j} + w_{t}$ . Moreover, no finite representation exists. From this result, it should be apparent that the PACF will never cut off (and always Tails off). 换言之，我们没法用过某个 $\hat{x}$ 破除拥有无数依赖的依赖链.

For an MA(1), $x_{t} = w_{t} + θ w_{t - 1}$ , with $| θ | < 1$ , calculations similar to Example The PACF of an AR(1) will yield $ϕ_{22} = - θ^{2} / (1 + θ^{2} + θ^{4})$ . For the MA(1) in general, we can show that

ϕ_{h h} = \frac{- (θ)^{h} (1 - θ^{2})}{1 - θ^{2} (h + 1)}, h \geq 1.

ACF & PACF for models

	AR( $p$ )	MA( $q$ )	ARMA( $p, q$ )
ACF	Tails off	Cuts off after lag $q$	Tails off
PACF	Cuts off after lag $p$	Tails off	Tails off

Lecture 6 / 7 /8

Forecasting

目标 Objective:

Predict future values of a time series, $x_{n + m}$ , $m = 1, 2, \dots$ , based on the data collected to present, $x_{1 : n} = {x_{1}, x_{2}, \dots, x_{n}}$ .

Mean square error (MSE):

E {(x_{n + m} - g (x_{1 : n}))}^{2}

其中 $g (x_{1 : n})$ 是一个根据观察值 $x_{1 : n}$ 的函数

Minimum mean square error (MSE) predictor:

x_{n + m}^{n} = E (x_{n + m} ∣ x_{1 : n})

换言之，对于MSE误差来说来说，条件期望(是一个函数)是最优的函数，可以达到“minimum MSE”

基于 infinite past 的预测，通常不会写成 $x_{n + 1}^{n}$ ，而是直接用条件期望的表示形式来表达预测值 ${\hat{x}}_{t + 1}$

Minimum mean square error (MSE)

E {(x_{n + m} - x_{n + m}^{n})}^{2}

证明待补充，还没看懂

Linear predictor

Predictors of the form:

Given data ${x_{1}, x_{2}, \dots, x_{n}}$

x_{n + m}^{n} = α_{0} + \sum_{k = 1}^{n} α_{k} x_{k}

$x_{n + m}^{n}$ 的意义：

if $n = m = 1$ , then $x_{2}^{1}$ is the one-step-ahead linear forecast of $x_{2}$ given $x_{1}$
- $x_{2}^{1} = α_{0} + α_{1} x_{1}$
if $n = 2$ , $x_{3}^{2}$ is the one-step-ahead linear forecast of $x_{3}$ given $x_{1}$ and $x_{2}$ .
- $x_{3}^{2} = α_{0} + α_{1} x_{1} + α_{2} x_{2}$
In general, the $α$ s in $x_{2}^{1}$ and $x_{3}^{2}$ will be different.

Best linear predictors (BLPs) for Stationary Processes

对于MSE的minimize，我们只需要对变量求导并求出零点即可

对于BLPs，我们需要调整 $α_{k}$ 的值使得整个MSE最小，故得：

Minimize $E (x_{n + m} - x_{n + m}^{n})^{2}$ ：

\frac{\partial E (x_{n + m} - x_{n + m}^{n})^{2}}{\partial α_{k}} = E [(x_{n + m} - x_{n + m}^{n}) x_{k}] = 0, k = 1, 2, \dots, n

Assume $x_{0} = 1$ , for $a_{0}, a_{1}, . . ., a_{n}$ :

\frac{\partial E (x_{n + m} - x_{n + m}^{n})^{2}}{\partial α_{k}} = E [(x_{n + m} - x_{n + m}^{n}) x_{k}] = 0, k = 0, 1, \dots, n

We generally consider $E (x_{t}) = μ = 0$ and $a_{0} = 0$ , there is the prove:

when $k = 0$ , $E (x_{n + m} - x_{n + m}^{m}) = 0$ , $E (x_{n + m}^{m}) = E (x_{n + m}) = μ$ , bringing in $x_{n + m}^{n} = α_{0} + \sum_{k = 1}^{n} α_{k} x_{k}$ yield $μ = α_{0} + \sum_{k = 1}^{n} α_{k} μ$ .

Hence, the form of the BLP is

x_{n + m}^{n} = μ + \sum_{k = 1}^{n} α_{k} (x_{k} - μ) = \sum_{k = 1}^{n} α_{k} x_{k} = \sum_{k = 0}^{n} α_{k} x_{k} (when a_{0} = 0)

One-step ahead prediction

The BLP of $x_{n + 1}$ :

x_{n + 1}^{n} = ϕ_{n 1} x_{n} + ϕ_{n 2} x_{n - 1} + \dots + ϕ_{n n} x_{1}

Using BLPs' s minimize property:

\begin{aligned} E [(x_{n + 1} - x_{n + 1}^{n}) x_{n + 1 - k}] & = 0, k = 1, 2, \dots, n \\ ⇓ \\ E ((x_{n + 1} - \sum_{j = 1}^{n} ϕ_{n j} x_{n + 1 - j}) x_{n + 1 - k}) & = 0, k = 1, 2, \dots, n \\ ⇓ \\ \sum_{j = 1}^{n} ϕ_{n j} γ (k - j) = γ (k), & k = 1, 2, \dots, n \end{aligned}

注意写成 $E [(x_{n + 1} - x_{n + 1}^{n}) x_{n + 1 - k}]$ 而不是 $E [(x_{n + 1} - x_{n + 1}^{n}) x_{k}]$ 主要是展开后可以很方便写成 $γ (k)$ 形式: $E [(x_{n + 1} - x_{n + 1}^{n}) x_{n + 1 - k}] \to E [x_{n + 1} x_{n + 1 - k} - x_{n + 1}^{n} x_{n + 1 - k}] \to γ (k) - E (x_{n + 1}^{n} x_{n + 1 - k})$

matrix form:

[\begin{matrix} γ (0) & γ (1) & \dots & γ (n - 1) \\ γ (1) & γ (0) & \dots & γ (n - 2) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ γ (n - 1) & γ (n - 2) & \dots & γ (0) \end{matrix}] [\begin{matrix} ϕ_{n 1} \\ ϕ_{n 2} \\ ⋮ \\ ϕ_{n n} \end{matrix}] = [\begin{matrix} γ (1) \\ γ (2) \\ ⋮ \\ γ (n) \end{matrix}]

notation as:

\begin{array}{r} (3.64) & Γ_{n} ϕ_{n} = γ_{n} \\ ϕ_{n} = Γ_{n}^{- 1} γ_{n} \end{array}

where $Γ_{n}$ ：

is a positive definite matrix 是正定矩阵
is a non-singular matrix 是非奇异矩阵（只有一个解） where $ϕ_{n}$ $γ_{n}$ ：
$ϕ_{n}$ is an $n \times 1$ vector is an $1 \times n$ vector $(ϕ_{n 1}, . . ., ϕ_{n n})^{'}$
$γ_{n}$ is an $n \times 1$ vector is an $1 \times n$ vector $(γ (1), . . ., γ (n))^{'}$

正定矩阵作用：
有唯一的最小值，其导数也是正定的，可以通过求导进行优化
可以进行内积， $< x, y >:= x^{t} A y$
正定矩阵 $A$ 的判别：
利用二次型 $x^{t} A x$ 恒大于0
特征值都大于0
各阶顺序主子式都大于0

It is sometimes convenient to write the one-step-ahead forecast in vector notation

x_{n + 1}^{n} = ϕ_{n}^{'} x

where:

$x = (x_{n}, x_{n - 1}, . . ., x_{1})^{'}$
$ϕ_{n}^{'}$ is an $1 \times n$ vector $(ϕ_{n 1}, . . ., ϕ_{n n})$

不加 $^{'}$ 的都是列向量，反之是行向量

The mean square one-step-ahead prediction error is:

P_{n + 1}^{n} = E (x_{n + 1} - x_{n + 1}^{n})^{2} = γ (0) - γ_{n}^{'} Γ_{n}^{- 1} γ_{n}

Prediction for an AR(2)

AR2: $x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + w_{t},$

The one-step-ahead prediction of $x_{2}$ base on $x_{1}$ , using equation 3.64:

x_{2}^{1} = ϕ_{11} x_{1} = \frac{γ (1)}{γ (0)} x_{1} = ρ (1) x_{1}

$x_{3}$ based on two observations $x_{1}$ and $x_{2}$ : $x_{3}^{2} = ϕ_{21} x_{2} + ϕ_{22} x_{1}$ , to solve $ϕ_{21}$ and $ϕ_{22}$ :

\begin{aligned} ϕ_{21} γ (0) + ϕ_{22} γ (1) & = γ (1) \\ ϕ_{21} γ (1) + ϕ_{22} γ (0) & = γ (2) \end{aligned}

or:

(\begin{matrix} ϕ_{21} \\ ϕ_{22} \end{matrix}) = {(\begin{matrix} γ (0) & γ (1) \\ γ (1) & γ (0) \end{matrix})}^{- 1} (\begin{matrix} γ (1) \\ γ (2) \end{matrix}),

As for AR(2), it should be apparent from the model that $x_{3}^{2} = ϕ_{1} x_{2} + ϕ_{2} x_{1}$ , that is, $ϕ_{n 1} = ϕ_{1}$ , $ϕ_{n 2} = ϕ_{2}$ because

\begin{array}{r} E {[x_{3} - (ϕ_{1} x_{2} + ϕ_{2} x_{1})] x_{1}} = E (w_{3} x_{1}) = 0 \\ E {[x_{3} - (ϕ_{1} x_{2} + ϕ_{2} x_{1})] x_{2}} = E (w_{3} x_{2}) = 0 \end{array}

If the time series is a causal AR( $p$ ) process, then, for $n \geq p$ , $$x_{n+1}^n = \phi_1 x_n + \phi_2 x_{n-1} + \cdots + \phi_p x_{n-p+1}. \tag{3.67}$$

Durbin–Levinson Algorithm

Computes $x_{n + 1}^{n}$ and $P_{n + 1}^{n}$ recursively as

ϕ_{00} = 0, P_{1}^{0} = γ (0)

For $n \geq 1$ :

ϕ_{n n} = \frac{ρ (n) - \sum_{k = 1}^{n - 1} ϕ_{n - 1, k} ρ (n - k)}{1 - \sum_{k = 1}^{n - 1} ϕ_{n - 1, k} ρ (k)}, P_{n + 1}^{n} = P_{n}^{n - 1} (1 - ϕ_{n n}^{2})

For $n \geq 2$ :

ϕ_{n k} = ϕ_{n - 1, k} - ϕ_{n n} ϕ_{n - 1, n - k}, k = 1, 2, \dots, n - 1

example - Using the Durbin–Levinson Algorithm

To use the algorithm, start with $ϕ_{00} = 0$ , $P_{1}^{0} = γ (0)$ . Then, for $n = 1$ ,

ϕ_{11} = ρ (1), P_{2}^{1} = γ (0) [1 - ϕ_{11}^{2}] .

For $n = 2$ ,

ϕ_{22} = \frac{ρ (2) - ϕ_{11} ρ (1)}{1 - ϕ_{11} ρ (1)}, ϕ_{21} = ϕ_{11} - ϕ_{22} ϕ_{11},

P_{3}^{2} = P_{2}^{1} [1 - ϕ_{22}^{2}] = γ (0) [1 - ϕ_{11}^{2}] [1 - ϕ_{22}^{2}] .

For $n = 3$ ,

ϕ_{33} = \frac{ρ (3) - ϕ_{21} ρ (2) - ϕ_{22} ρ (1)}{1 - ϕ_{21} ρ (1) - ϕ_{22} ρ (2)},

ϕ_{32} = ϕ_{22} - ϕ_{33} ϕ_{21}, ϕ_{31} = ϕ_{21} - ϕ_{33} ϕ_{22},

P_{4}^{3} = P_{3}^{2} [1 - ϕ_{33}^{2}] = γ (0) [1 - ϕ_{11}^{2}] [1 - ϕ_{22}^{2}] [1 - ϕ_{33}^{2}],

and so on. Note that, in general, the standard error of the one-step-ahead forecast is the square root of

P_{n + 1}^{n} = γ (0) \prod_{j = 1}^{n} [1 - ϕ_{j j}^{2}] .

example - The PACF of an AR(2)

AR2: $x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + w_{t},$

\begin{aligned} ϕ_{11} & = ρ (1) = \frac{ϕ_{1}}{1 - ϕ_{2}} \\ ϕ_{22} & = \frac{ρ (2) - ρ (1)^{2}}{1 - ρ (1)^{2}} = \frac{[ϕ_{1} (\frac{ϕ_{1}}{1 - ϕ_{2}}) + ϕ_{2}] - {(\frac{ϕ_{1}}{1 - ϕ_{2}})}^{2}}{1 - {(\frac{ϕ_{1}}{1 - ϕ_{2}})}^{2}} = ϕ_{2} \\ ϕ_{21} & = ρ (1) [1 - ϕ_{2}] = ϕ_{1} \\ ϕ_{33} & = \frac{ρ (3) - ϕ_{1} ρ (2) - ϕ_{2} ρ (1)}{1 - ϕ_{1} ρ (1) - ϕ_{2} ρ (2)} = 0. \end{aligned}

In fact, in AR(p) model, because of the property of Prediction for an AR(2),

\begin{aligned} x_{p + 1}^{p} & = ϕ_{p 1} x_{p} + ϕ_{p 2} x_{p - 1} + \dots + ϕ_{p p} x_{1} \\ = ϕ_{1} x_{p} + ϕ_{2} x_{p - 1} + \dots + ϕ_{p} x_{1} . \end{aligned}

This result shows that for an AR( $p$ ) model, the partial autocorrelation coefficient at lag $p$ , $ϕ_{p p}$ , is also the last coefficient in the model, $ϕ_{p}$ .

可以说 AR模型的Linear predictor 就是AR模型本身

The Innovations Algorithm

The one-step-ahead predictors, $x_{t + 1}^{t}$ , and their mean-squared errors, $P_{t + 1}^{t}$ , can be calculated iteratively as

x_{1}^{0} = 0, P_{1}^{0} = γ (0)

x_{t + 1}^{t} = \sum_{j = 1}^{t} θ_{t j} (x_{t + 1 - j} - x_{t + 1 - j}^{t - j}), t = 1, 2, \dots

P_{t + 1}^{t} = γ (0) - \sum_{j = 0}^{t - 1} θ_{t, t - j}^{2} P_{j + 1}^{j}, t = 1, 2, \dots

where, for $j = 0, 1, \dots, t - 1$ ,

θ_{t, t - j} = (γ (t - j) - \sum_{k = 0}^{j - 1} θ_{j, j - k} θ_{t, t - k} P_{k + 1}^{k}) / P_{j + 1}^{j} .

Given data $x_{1}, \dots, x_{n}$ , the innovations algorithm can be calculated successively for $t = 1$ , then $t = 2$ and so on, in which case the calculation of $x_{n + 1}^{n}$ and $P_{n + 1}^{n}$ is made at the final step $t = n$ . The $m$ -step-ahead predictor and its mean-square error based on the innovations algorithm are given by

x_{n + m}^{n} = \sum_{j = m}^{n + m - 1} θ_{n + m - 1, j} (x_{n + m - j} - x_{n + m - j - 1}^{n + m - j}),

P_{n + m}^{n} = γ (0) - \sum_{j = m}^{n + m - 1} θ_{n + m - 1, j}^{2} P_{n + m - j - 1}^{n + m - j},

where the $θ_{n + m - 1, j}$ are obtained by continued iteration of $θ_{t, t - j}$ .

example - Prediction for an MA(1)

The innovations algorithm lends itself well to prediction for moving average processes.

MA(1): $x_{t} = w_{t} + θ w_{t - 1}$ , $γ (0) = (1 + θ^{2}) ϕ_{w}^{2}$ , $γ (1) = θ γ_{w}^{2}$ and $γ (h) = 0 for h>1$

Using Innovations Algorithm:

\begin{aligned} θ_{n 1} & = θ σ_{w}^{2} / P_{n - 1}^{n} \\ θ_{n j} & = 0, j = 2, \dots, n \\ P_{1}^{0} & = (1 + θ^{2}) σ_{w}^{2} \\ P_{n + 1}^{n} & = (1 + θ^{2} - θ θ_{n 1}) σ_{w}^{2} \end{aligned}

Finally, the one-step-ahead predictor is

x_{n + 1}^{n} = θ (x_{n} - x_{n}^{n - 1}) \frac{σ_{w}^{2}}{P_{n}^{n - 1}}

Forecasting ARMA models

Forecasting AR(p) and MA(q)

The Durbin-Levinson algorithm is convenient for AR(p) processes
The innovations algorithm is convenient for MA(q) processes.

Review causality and invertibility

因果性（Causality）：指的是当前值 $x_{t}$ 仅依赖于当前及之前的随机扰动项（白噪声项） $w_{t}, w_{t - 1}, \dots$ ，而不依赖未来的 $w_{t + 1}, w_{t + 2}, \dots$ 。这就意味着，对于未来的时刻 $t > n$ ，我们对 $w_{t}$ 的条件期望 ${\tilde{w}}_{t} = E (w_{t} | x_{n}, x_{n - 1}, \dots) = 0$ 应该是零
可逆性（Invertibility）：指的是可以将当前的白噪声项 $w_{t}$ 用过去的观测值 $x_{t}, x_{t - 1}, \dots$ 表示出来。这表明，对于任何过去的扰动项 $w_{t}$ （其中 $t \leq n$ ），我们可以通过过去的观测值来估计或重构 $w_{t}$ ，因此条件期望 ${\tilde{w}}_{t} = E (w_{t} | x_{n}, x_{n - 1}, \dots) = w_{t}$

Thus:

\begin{matrix} (3.81) & {\tilde{w}}_{t} = E (w_{t} | x_{n}, x_{n - 1}, \dots) = {\begin{cases} 0, & t > n \\ w_{t}, & t \leq n \end{cases} \end{matrix}

Forecasting ARMA Processes

We assume $x_{t}$ is a causal and invertible ARMA( $p$ , $q$ ) process, $ϕ (B) x_{t} = θ (B) w_{t}$ , where $w_{t} \sim iid N (0, σ_{w}^{2})$ . In the non-zero mean case, $E (x_{t}) = μ_{x}$ , simply replace $x_{t}$ with $x_{t} - μ_{x}$ in the model.

First, we consider two types of forecasts. We write $x_{n + m}^{n}$ to mean the minimum mean square error predictor of $x_{n + m}$ based on the data ${x_{n}, \dots, x_{1}}$ , that is,

x_{n + m}^{n} = E (x_{n + m} | x_{n}, \dots, x_{1}) .

For ARMA models, it is easier to calculate the predictor of $x_{n + m}$ , assuming we have the complete history of the process ${x_{n}, x_{n - 1}, \dots, x_{1}, x_{0}, x_{- 1}, \dots}$ . We will denote the predictor of $x_{n + m}$ based on the infinite past as

{\tilde{x}}_{n + m} = E (x_{n + m} | x_{n}, x_{n - 1}, \dots, x_{1}, x_{0}, x_{- 1}, \dots) .

In general, $x_{n + m}^{n}$ and ${\tilde{x}}_{n + m}$ are not the same, but the idea here is that, for large samples, ${\tilde{x}}_{n + m}$ will provide a good approximation to $x_{n + m}^{n}$ .

Now, write $x_{n + m}$ in its causal and invertible forms:

\begin{matrix} (3.82) & x_{n + m} = \sum_{j = 0}^{\infty} ψ_{j} w_{n + m - j}, ψ_{0} = 1 \end{matrix}

\begin{matrix} (3.83) & w_{n + m} = \sum_{j = 0}^{\infty} π_{j} x_{n + m - j}, π_{0} = 1. \end{matrix}

将公式（3.82）中 $x_{n + m}$ 的表示代入到条件期望中(taking conditional expectations)：

{\tilde{x}}_{n + m} = E (\sum_{j = 0}^{\infty} ψ_{j} w_{n + m - j} | x_{n}, x_{n - 1}, \dots)

由于期望的线性性质，可以将求和符号和常数 $ψ_{j}$ 移到期望外面：

{\tilde{x}}_{n + m} = \sum_{j = 0}^{\infty} ψ_{j} E (w_{n + m - j} | x_{n}, x_{n - 1}, \dots) .

根据性质 (3.81) 对于 $j < m$ 的项，条件期望为 0；而对于 $j \geq m$ 的项，条件期望就是 $w_{n + m - j}$ 本身, 得到

\begin{matrix} (3.84) & {\tilde{x}}_{n + m} = \sum_{j = 0}^{\infty} ψ_{j} {\tilde{w}}_{n + m - j} = \sum_{j = m}^{\infty} ψ_{j} w_{n + m - j} . \end{matrix}

Similarly, taking conditional expectations in (3.83), we have

\begin{matrix} (3.85) & 0 = {\tilde{x}}_{n + m} + \sum_{j = 1}^{\infty} π_{j} {\bar{x}}_{n + m - j}, \end{matrix}

Using (3.82) (3.84), we can write

x_{n + m} - {\tilde{x}}_{n + m} = \sum_{j = 0}^{m - 1} ψ_{j} w_{n + m - j},

so the mean-square prediction error can be written as

\begin{matrix} (3.86) & P_{n + m}^{m} = E (x_{n + m} - {\tilde{x}}_{n + m})^{2} = σ_{w}^{2} \sum_{j = 0}^{m - 1} ψ_{j}^{2} . \end{matrix}

Long-Range Forecasts

Replacing $x_{n + m}$ with $x_{n + m} - μ_{x}$ in (3.82)

\begin{matrix} (3.88) & {\tilde{x}}_{n + m} = μ_{x} + \sum_{j = m}^{\infty} ψ_{j} w_{n + m - j} . \end{matrix}

Noting that the $ψ$ -weights dampen to zero exponentially fast, it is clear that

\begin{matrix} (3.89) & {\tilde{x}}_{n + m} \to μ_{x} \end{matrix}

exponentially fast (in the mean square sense) as $m \to \infty$ .

Moreover, by (3.86), the mean square prediction error

\begin{matrix} (3.90) & P_{n + m}^{n} \to σ_{w}^{2} \sum_{j = 0}^{\infty} ψ_{j}^{2} = γ_{x} (0) = σ_{x}^{2}, \end{matrix}

exponentially fast as $m \to \infty$ .

It should be clear from (3.89) and (3.90) that ARMA forecasts quickly settle to the mean with a constant prediction error as the forecast horizon, $m$ , grows. 从 (3.89) 和 (3.90) 中可以清楚地看出，随着预测范围 $m$ 的增长，ARMA 预测很快就会趋于均值，预测误差保持不变。

Truncated Prediction for ARMA

截断预测是一种用于时间序列分析的方法，指的是在模型预测未来值时，因为只能利用有限的历史数据而无法观测无限的过去数据，或者无法利用未来的观测值，因此对模型的计算进行简化和近似

ARMA模型：

x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + \dots + ϕ_{p} x_{t - p} + θ_{1} w_{t - 1} + θ_{2} w_{t - 2} + \dots + θ_{q} w_{t - q} + w_{t},

我们仅有数据 $x_{1}, x_{2}, \dots, x_{n}$
AR 部分的回归结构（依赖于过去的 $x_{t}$ ）天然支持递归预测
对于 $t > n$ 或 $t \leq 0$ ，设 $w_{t} = 0$ ，因为这些噪声项不可观测此时，截断预测通过以下假设简化计算：
假设未知噪声项 $w_{t} = 0$ （对于 $t > n$ or $t \leq 0$ ）
递归地使用过去的预测值代替未来的未知值。

The truncated prediction formula is given as:

\begin{matrix} (3.92) & {\tilde{x}}_{n + m}^{n} = ϕ_{1} {\tilde{x}}_{n + m - 1}^{n} + \dots + ϕ_{p} {\tilde{x}}_{n + m - p}^{n} + θ_{1} {\tilde{w}}_{n + m - 1}^{n} + \dots + θ_{q} {\tilde{w}}_{n + m - q}^{n} . \end{matrix}

Where:

${\tilde{x}}_{t}^{n}$ : - ${\tilde{x}}_{t}^{n} = x_{t}$ for $1 \leq t \leq n$ (observed values). - ${\tilde{x}}_{t}^{n} = 0$ for $t \leq 0$ (before the start of the series).
Truncated prediction errors ${\tilde{w}}_{t}^{n}$ : - ${\tilde{w}}_{t}^{n} = 0$ for $t \leq 0$ or $t > n$ (unobserved noise outside the series range). - For $1 \leq t \leq n$ , ${\tilde{w}}_{t}^{n}$ is calculated as:

{\tilde{w}}_{t}^{n} = ϕ (B) {\tilde{x}}_{t}^{n} - θ_{1} {\tilde{w}}_{t - 1}^{n} - \dots - θ_{q} {\tilde{w}}_{t - q}^{n} .

Example to drive MMSE predictor and its MSE

Example1

MA(1) $x_{t} = w_{t} + θ w_{t - 1}$ . Derive the minimum mean-square error one-step forecast based on the infinite past, and determine the mean-square error of this forecast.

对于一般的MA模型，我们并不使用Durbin–Levinson Algorithm和The Innovations Algorithm，而使用定义求解

Minimum mean square error (MSE) predictor:

x_{n + m}^{n} = E (x_{n + m} ∣ x_{1 : n})

解：

\begin{aligned} {\hat{x}}_{t + 1} & = E (x_{t + 1} | X_{- \infty : t}) \\ = E (w_{t + 1} + θ w_{t} | X_{- \infty : t}) \\ = E (θ w_{t} | X_{- \infty : t}) = θ w_{t} \end{aligned}

上面的过程是错误的，我们需要将预测公式写成已知观测值的形式，而 $w_{t}$ 是不可观测的噪声项! 正确的方法是通过递归公式将 $w_{t}$ 展开为 $x_{t}, x_{t - 1}, \dots$ 的函数，确保最终的预测公式只依赖于已知的观测值

根据 MA(1) 模型定义：

x_{t + 1} = w_{t + 1} + θ w_{t} .

将 $w_{t}$ 用展开公式表示为：

w_{t} = \sum_{j = 0}^{\infty} (- θ)^{j} x_{t - j} .

代入 $x_{t + 1}$ 的表达式，得到：

x_{t + 1} = w_{t + 1} - \sum_{j = 0}^{\infty} (- θ)^{j + 1} x_{t - j}

取条件期望：

{\hat{x}}_{t + 1} = E (x_{t + 1} | X_{- \infty : t}) = E (w_{t + 1} - \sum_{j = 0}^{\infty} (- θ)^{j + 1} x_{t - j} | X_{- \infty : t}) = - \sum_{j = 0}^{\infty} (- θ)^{j + 1} x_{t - j}

MSE:

E {(x_{t + 1} - {\hat{x}}_{t + 1})}^{2} = E (w_{t}^{2}) = σ_{w}^{2}

Example2

For an AR(1) model, determine the general form of the $m$ -step-ahead forecast $x_{t + m}^{t}$ and MSE

AR(1) Model: $x_{t} = ϕ x_{t - 1} + w_{t}$

x_{t + m} = ϕ^{m} x_{t} + \sum_{j = 0}^{m - 1} ϕ^{j} w_{t + m - j}

\begin{aligned} x_{t + m}^{t} & = E (x_{t + m} | x_{1 : t}) \\ = E (ϕ^{m} x_{t} + \sum_{j = 0}^{m - 1} ϕ^{j} w_{t + m - j} ∣ x_{1 : t}) \\ = ϕ^{m} x_{t} \end{aligned}

Mean Squared Error (MSE):

\begin{aligned} MSE & = E [(x_{t + m} - x_{t + m}^{t})^{2}] \\ = E {(\sum_{j = 0}^{m - 1} ϕ^{j} w_{t + m - j})}^{2} \\ = σ_{w}^{2} \sum_{j = 0}^{m - 1} ϕ^{2 j} \\ = \frac{1 - ϕ^{2 m}}{1 - ϕ^{2}} σ_{w}^{2} \end{aligned}

Estimation

We assume

we have $n$ observations, $x_{1}, \dots, x_{n}$
from a causal and invertible Gaussian ARMA( $p, q$ ) process
The data has zero mean

Our goal is

estimate the parameters, $ϕ_{1}, \dots, ϕ_{p}, θ_{1}, \dots, θ_{q}$ , and $σ_{w}^{2}$
determining $p$ and $q$ later in this section.

Yule-Walker estimation

Yule-Walker 方程是自回归模型（AR 模型）参数估计的一种方法，基于样本自协方差和理论自协方差的一致性。

对于 $A R (p)$

x_{t} = ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + \dots + ϕ_{p} x_{t - p} + w_{t},

$ϕ_{1}, \dots, ϕ_{p}$ 是需要估计的参数；
$w_{t}$ 是零均值白噪声，具有方差 $σ_{w}^{2}$

其矩阵Yule-Walker形式如下

Γ_{p} ϕ = γ_{p} .

[\begin{matrix} γ (0) & γ (1) & γ (2) & \dots & γ (p - 1) \\ γ (1) & γ (0) & γ (1) & \dots & γ (p - 2) \\ γ (2) & γ (1) & γ (0) & \dots & γ (p - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ γ (p - 1) & γ (p - 2) & γ (p - 3) & \dots & γ (0) \end{matrix}] [\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ ⋮ \\ ϕ_{p} \end{matrix}] = [\begin{matrix} γ (1) \\ γ (2) \\ γ (3) \\ ⋮ \\ γ (p) \end{matrix}] .

具体展开形式假设 $p = 3$ ，则矩阵形式为：

[\begin{matrix} γ (0) & γ (1) & γ (2) \\ γ (1) & γ (0) & γ (1) \\ γ (2) & γ (1) & γ (0) \end{matrix}] [\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \end{matrix}] = [\begin{matrix} γ (1) \\ γ (2) \\ γ (3) \end{matrix}] .

For MA and ARMA models, the Yule–Walker estimators are not optimal

BLPs 中也使用了Yule–Walker estimators 的矩阵来估计Linear predictor 的参数，但是目标是不一样的。BLPs目标是通过观测值预测未来，而Yule–Walker estimators用于估计求参数

在计算 Yule-Walker 方程时，直接求解线性方程组 $Γ_{p} ϕ = γ_{p}$ 通常需要矩阵求逆，这在高阶 (p 很大) 的情况下计算复杂度较高，而Durbin-Levinson 算法，通过递归方式计算 Yule-Walker 方程的解，无需直接求逆协方差矩阵

Maximum likelihood estimator

Assume { $x_{t}$ } is Gaussian ( $w_{t}$ is i.i.d. Gaussian forARMA process)

For the causal AR(1) model, the process is defined as:

\begin{matrix} (3.106) & x_{t} = μ + ϕ (x_{t - 1} - μ) + w_{t}, $ $ w h e r e $ | ϕ | < 1 $ a n d $ w_{t} \sim iid N (0, σ_{w}^{2}) $ . G i v e n d a t a $ x_{1}, x_{2}, \dots, x_{n} $, t h e l i k e l i h o o d i s : $ $ L (μ, ϕ, σ_{w}^{2}) = f (x_{1}, x_{2}, \dots, x_{n} ∣ μ, ϕ, σ_{w}^{2}) . $ $ L i k e l i h o o d D e c o m p o s i t i o n F o r a n A R (1) p r o c e s s, t h e l i k e l i h o o d c a n b e w r i t t e n a s : $ $ L (μ, ϕ, σ_{w}^{2}) = f (x_{1}) f (x_{2} ∣ x_{1}) \dots f (x_{n} ∣ x_{n - 1}), \end{matrix}

Since $x_{t} ∣ x_{t - 1} \sim N (μ + ϕ (x_{t - 1} - μ), σ_{w}^{2})$ , we have:

f (x_{t} ∣ x_{t - 1}) = f_{w} ((x_{t} - μ) - ϕ (x_{t - 1} - μ))

where $f_{w} (\cdot)$ is the density of $w_{t}$ , the normal density with mean zero and variance $σ_{w}^{2}$

Final Likelihood The likelihood can be written as:

L (μ, ϕ, σ_{w}) = f (x_{1}) \prod_{t = 2}^{n} f_{w} ((x_{t} - μ) - ϕ (x_{t - 1} - μ)) . $ $ U s i n g t h e c a u s a l r e p r e s e n t a t i o n : $ $ x_{1} = μ + \sum_{j = 0}^{\infty} ϕ^{j} w_{1 - j},

we see that $x_{1}$ is normally distributed with mean $μ$ and variance $\frac{σ_{w}^{2}}{1 - ϕ^{2}}$ .

Full Likelihood Function The likelihood for the AR(1) process is:

\begin{matrix} (3.107) & L (μ, ϕ, σ_{w}^{2}) = (2 π σ_{w}^{2})^{- n / 2} (1 - ϕ^{2})^{1 / 2} \exp [- \frac{S (μ, ϕ)}{2 σ_{w}^{2}}], \end{matrix}

where:

\begin{matrix} (3.108) & S (μ, ϕ) = (1 - ϕ^{2}) (x_{1} - μ)^{2} + \sum_{t = 2}^{n} {[(x_{t} - μ) - ϕ (x_{t - 1} - μ)]}^{2} . \end{matrix}

For a normal ARMA(p,q) model, the likelihood expression can be simplified in terms of the innovations

Model parameters:

β = (ϕ_{1}, \dots, ϕ_{p}, θ_{1}, \dots, θ_{q})^{'}

Likelihood function:

L (β, σ_{w}^{2}) = \prod_{t = 1}^{n} f (x_{t} ∣ x_{t - 1}, \dots, x_{1})

The conditional distribution of $x_{t}$ given the past values is:

x_{t} ∣ x_{t - 1}, \dots, x_{1} \sim N (x_{t}^{t - 1}, P_{t}^{t - 1}) $ $ w h e r e : - $ P_{t}^{- 1} = γ (0) \prod_{j = 1}^{t - 1} (1 - ϕ_{j j}^{2}) $ - $ γ (0) = σ_{w}^{2} \sum_{j = 0}^{\infty} ψ_{j}^{2} $ i s t h e a u t o c o v a r i a n c e a t l a g 0. * * F u l l L i k e l i h o o d F u n c t i o n * *

L(\beta, \sigma_w^2) = (2\pi \sigma_w^2)^{-n/2} \left[ r_1(\beta) r_2(\beta) \cdots r_n(\beta) \right]^{-1/2} \exp \left[ -\frac{S(\beta)}{2 \sigma_w^2} \right]

S(\beta) = \sum_{t=1}^n \left[ \frac{\left( x_t - x_t^{-1}(\beta) \right)^2}{r_t(\beta)} \right]

- - - * * L a r g e s a m p l e d i s t r i b u t i o n * * F o r a n A R M A ($ p, q $) p r o c e s s, t h e m a x i m u m l i k e l i h o o d e s t i m a t o r s (M L E) a n d u n c o n d i t i o n a l / c o n d i t i o n a l l e a s t s q u a r e s e s t i m a t o r s s a t i s f y :

\begin{pmatrix} \hat{\phi} \ \hat{\theta} \end{pmatrix} - \begin{pmatrix} \phi \ \theta \end{pmatrix} \sim AN \left( 0, \frac{\sigma_w^2}{n} \begin{pmatrix} \Gamma_{\phi\phi} & \Gamma_{\phi\theta} \ \Gamma_{\theta\phi} & \Gamma_{\theta\theta} \end{pmatrix}^{-1} \right),

where $AN$ represents "**asymptotically normal**." The covariance matrix is given as: $$ \begin{pmatrix} \Gamma_{\phi\phi} & \Gamma_{\phi\theta} \\ \Gamma_{\theta\phi} & \Gamma_{\theta\theta} \end{pmatrix} = \text{Cov}((x, y), (x, y)), $$ where: - $x = (x_1, \dots, x_p)'$, corresponding to the AR($p$) terms, - $y = (y_1, \dots, y_q)'$, corresponding to the MA($q$) terms. 结论： - 在样本量 $n \to \infty$ 时，估计量（如 MLE 或最小二乘估计量）服从正态分布 - 随着 $n$ 增加，估计量的精度提升 - 可以利用渐近分布，可以检验参数的显著性，判断模型是否合理 #### Overfitting Variance of the Estimator for an AR(1) Process with Large Sample Size **If we estimate an AR(1) model:** The variance of \(\hat{\phi}_1\) is approximately:

\text{Var}(\hat{\phi}_1) \approx \frac{1 - \phi_1^2}{n}.

* * I f w e e s t i m a t e a n A R (2) m o d e l : * * T h e v a r i a n c e o f \({\hat{ϕ}}_{1} \) i s a p p r o x i m a t e l y :

\text{Var}(\hat{\phi}_1) \approx \frac{1 - \phi_2^2}{n} = \frac{1}{n}.

结论 - 过度拟合模型不会影响模型的正确性 - 例如，如果对 AR(1) 拟合 AR(2)，当样本很大时，多余的参数 $\phi_2 = 0$，但拟合结果与真实模型不会冲突 - 过度拟合可能降低估计精度 - 如果对实际为 AR(1) 的过程拟合 AR(2) 模型，增加的参数（如 $\phi_2$）实际为零，但是会导致 $\phi_1$ 的方差会增大 ## Lecture 9 ### ARIMA models A process $x_t$ is said to be ARIMA($p, d, q$) if:

\nabla^d x_t = (1 - B)^d x_t

i s A R M A ($ p, q $) . T h e g e n e r a l A R I M A ($ p, d, q $) m o d e l i s w r i t t e n a s :

\phi(B)(1 - B)^d x_t = \theta(B)w_t, \tag{3.144}

### Building ARIMA Models #### Diagnostics 诊断，确定模型是否使用与数据，包含残差分析和模型比较 Standardized Residuals:

e_t = \frac{x_t - \hat{x}_t^{-1}}{\sqrt{\hat{P}_t^{-1}}}

where: - $\hat{x}_t^{-1}$: One-step-ahead prediction. - $\hat{P}_t^{-1}$: Estimated one-step-ahead prediction error variance. Model Evaluation Metrics: - **AIC**: $AIC = -2 \ln(L) + 2k$ - **BIC**: $BIC = -2 \ln(L) + k \ln(n)$, where: - $L$ is the likelihood of the model - $k$ is the number of parameters - $n$ is the sample size. #### General Steps to Build ARIMA Models 1. **Plot the data**: - Visualize the time series data to check for trends, seasonality, and any anomalies. - Determine whether transformations (e.g., log or differencing) are necessary. 2. **Transform the data (if needed)**: - Apply transformations to stabilize variance (e.g., Box-Cox transformations). - For instance, if variability increases over time, consider taking logarithms or differencing. **Example:** ![](SDSC6012.assets/file-20241210225954612.png) **需要对数化处理得到稳定的过程** ![](SDSC6012.assets/file-20241211012754732.png) 3. **Identify the dependence orders ($p, d, q$)**: - Use **Autocorrelation Function (ACF)** and **Partial Autocorrelation Function (PACF)** plots to identify: - $p$: Order of the autoregressive (AR) part. - $d$: Number of differences required to make the series stationary. - $q$: Order of the moving average (MA) part. **Example:** ![](SDSC6012.assets/file-20241210230053608.png) 我们可以该模型适合$ARIMA(0,1,2)$ 或 $ARIMA(1,1,0)$ 4. **Estimate parameters**: - Fit the model to estimate the parameters of ARIMA$(p, d, q)$. **Example:** 拟合 $ARIMA(0,1,2)$ 函数得到对应的参数 $$ \hat{x}_t = 0.008_{(0.001)} + 0.303_{(0.065)} \hat{w}_{t-1} + 0.204_{(0.064)} \hat{w}_{t-2} + \hat{w}_t,

Perform diagnostics:
- Check the residuals to ensure they resemble white noise.
- Evaluate model fit and adjust as necessary.
Model selection:
- Compare different models using criteria such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

SDSC6012 ​

References ​

Lecture1 ​

时序中的随机变量 ​

Stochastic process 随机过程 ​

随机过程 vs. 随机变量： ​

相关符号 ​

White noise 白噪声 ​

Gaussian white noise 高斯白噪声 ​

相关符号 ​

Time series modeling 时间序列模型 ​

均值性质 ​

均值与随机过程 ​

Moving Average Series ​

Random Walk with Drift ​

Signal Plus Noise ​

协方差 ​

协方差的定义 ​

均值的平方转换为方差/协方差 ​

协方差的意义 ​

协方差的性质 ​

Autocovariance function 自协方差函数 γ(s,t) ​

协方差与随机过程 ​

Example1 - white noise ​

Example2 ​

Example3 - random walk ​

Lecture2 ​

Strict stationarity 严格平稳性 ​

Weak stationarity 弱平稳性 ​

自协方差函数 γ(h) ​

γ(h) 的性质 ​

时间序列乘积转换为γ(h) ​

Autocorrelation Function 自相关函数 ACF ​

平稳性、ACF与随机过程 ​

white noise ​

random walk ​

MA(1) process ​

知识补充：无穷级数 ​

知识补充：几何级数 ​

AR(1) process ​

相关性估计 ​

利用ACF判断白噪声 ​

Backshift and forward-shift operator ​

差分 ​

Lecture3 / Lecture4 ​

Linear process ​

AR(p) ​

AR(1) model ​

Explosive AR Models and Causality ​

判断 AR(n) Causality 的方法 ​

Every Explosion Has a Cause ​

ϕ(B) to ψ(B) by matching ​

MA(q) ​

MA(1) ​

Non-uniqueness of MA Models and Invertibility ​

判断 MA(q) Invertibility 的方法 ​

θ(B) to π(B) by matching ​

Lecture5 ​

ARMA(p,q) ​

Avoid parameter redundancy ​

Stationarity ​

Causality ​

Invertibility ​

Example of Stationarity, Causality, Invertibility ​

Convert to MA process ​

ARMA 的自相关函数 ​

PACF ​

PACF of an AR(1) ​

PACF of an Invertible MA(q) ​

ACF & PACF for models ​

Lecture 6 / 7 /8 ​

Forecasting ​

Linear predictor ​

Best linear predictors (BLPs) for Stationary Processes ​

One-step ahead prediction ​

Prediction for an AR(2) ​

Durbin–Levinson Algorithm ​

example - Using the Durbin–Levinson Algorithm ​

example - The PACF of an AR(2) ​

The Innovations Algorithm ​

SDSC6012

References

Lecture1

时序中的随机变量

Stochastic process 随机过程

随机过程 vs. 随机变量：

相关符号

White noise 白噪声

Gaussian white noise 高斯白噪声

相关符号

Time series modeling 时间序列模型

均值性质

均值与随机过程

Moving Average Series

Random Walk with Drift

Signal Plus Noise

协方差

协方差的定义

均值的平方转换为方差/协方差

协方差的意义

协方差的性质

Autocovariance function 自协方差函数 $γ (s, t)$

协方差与随机过程

Example1 - white noise

Example2

Example3 - random walk

Lecture2

Strict stationarity 严格平稳性

Weak stationarity 弱平稳性

自协方差函数 $γ (h)$

$γ (h)$ 的性质

时间序列乘积转换为 $γ (h)$

Autocorrelation Function 自相关函数 ACF

平稳性、ACF与随机过程

white noise

random walk

MA(1) process

知识补充：无穷级数

知识补充：几何级数

AR(1) process

相关性估计

利用ACF判断白噪声

Backshift and forward-shift operator

差分

Lecture3 / Lecture4

Linear process

AR(p)

AR(1) model

Explosive AR Models and Causality

判断 AR(n) Causality 的方法

Every Explosion Has a Cause

$ϕ (B)$ to $ψ (B)$ by matching

MA(q)

MA(1)

Non-uniqueness of MA Models and Invertibility

判断 MA(q) Invertibility 的方法

$θ (B)$ to $π (B)$ by matching

Lecture5

ARMA(p,q)

Avoid parameter redundancy

Stationarity

Causality

Invertibility

Example of Stationarity, Causality, Invertibility

Convert to MA process

ARMA 的自相关函数

PACF

PACF of an AR(1)

PACF of an Invertible MA(q)

ACF & PACF for models

Lecture 6 / 7 /8

Forecasting

Linear predictor

Best linear predictors (BLPs) for Stationary Processes

One-step ahead prediction

Prediction for an AR(2)

Durbin–Levinson Algorithm

example - Using the Durbin–Levinson Algorithm

example - The PACF of an AR(2)

The Innovations Algorithm