CS3958: Advanced Algorithms
这是 ACM 班 2022 Fall 高级算法课程的个人整理小笔记. 众所周知, 我都是在期末复习的时候才认真做一遍笔记的. 课程框架:
- Basic Probabilistic Tools
- Concentration Inequalities
- Martingale
- Optimization
- First-Order Optimization
- Online Optimization
- Markov Chain and Sampling
- An Application of MCMC (Markov Chain Monte Carlo)
- Perspectives for studying MCMC
- Stochastic Process
- Spectrum
- Operator
Lec1
Concentration Inequalities
Theorem 2 (Markov Inequality) For any non-negative r.v. $X$ and $a > 0$ ,
Proof:
Example 4 (Coupon Collector) Let $X$ be the number of draws.
Proof: Let $X_i$ be the number of draws with $i$ cards collected.
另解: 令 $X_i$ 为有 $i$ 个 coupon 后得到一个新的 coupon 的抽取次数.
Theorem 3 (Chebyshev’s Inequality) For any r.v. $X$ with bounded $\mathbf{E}[X]$ and $a > 0$,
Proof:
Example 3: (Streaming Model) Input a sequence $a_1$, $a_2$, $\dots$, $a_m$. Compute $m$.
Brute-Force: maintain a counter $k$ with $O(\log m)$ memory. Best algorithm for the exact answer.
Morris’ algorithm:
1 | X <- 0 |
Theorem 1: $\mathbf{E}[\hat{m}] = m$. $\hat{m}$ is the output of Morris’ algorithm, $m$ is the true answer.
Proof: By induction.
Uses $O(\log \log m)$ bits of memory. We want to
For fixed $\varepsilon$, smaller $\delta$ is better.
Lemma 4:
Proof: $m=1$, $\mathbf{E}[(2^{X_m})^2]$ = 4.
With Lemma 4,
With Chebyshev
If $\varepsilon$ is smaller, $\frac{1}{2\varepsilon^2} > 1$, the bound is useless.
The Averaging Trick
We can run Morris’s algorithm $t$ times in parallel and the final output is the average
We have
For a fixed $\delta$, $t \ge \frac{1}{2\varepsilon^2 \delta}$, $\mathbf{Pr}[|\hat{m}^* - m| \ge \varepsilon m] \le \delta$. New algorithm uses $O(\frac{\log \log n}{\varepsilon^2 \delta})$ bits of memory.
Chernoff Bound
Theorem 5 (Chernoff Bound) Let $X_1, \dots, X_n$ be independent r.v. such that $X_i \sim \text{Ber}(p_i)$. Let $X = \sum_{i=1}^n X_i$ and $\mu = \mathbf{E}[X] = \sum_{i=1}^n p_i$. Then we have
If $0 < \delta < 1$.
Proof: core idea: use $f(X) = e^{aX}$ for $a > 0$ and use Markov.
Corollary 6 For any $0 < \delta < 1$,
Proof: 略
The Median Trick
Further boost the performance of Morrie’s algorithm using the median trick. Choose $t = \frac{3}{2 \varepsilon^2}$ in the averaging trick and run it $s$ times in parallel.
At last output the median $\hat{m}^{*}$ of $\hat{m_1}^, \hat{m_2}^, \dots, \hat{m_s}^$.
Let $Y_i$ be the indicator of the (good) event that
Then $Y := \sum_{i=1}^s Y_i$ satisfies $\mathbf{E}[Y] \ge\frac{2}{3} s$. If the median is bad, then at least half of $\hat{m_i}^*$ is bad which is
$Y \le \frac{1}{2} s$. By Chernoff,
Therefore, for $t = O(\frac{1}{\varepsilon^2})$ and $s = O(\log \frac{1}{\delta})$, we have $\mathbf{Pr}[|\hat{m}^* - m| \ge \varepsilon m] \le \delta$.
This new algorithm uses
bits of memory. ($O(\log \frac{1}{\delta}) << O(\frac{1}{\delta})$).
Lec 2
Threshold Behavior of Random Graphs
Threshold Behavior: For a random graph $G(n, p)$, these is a threshold function $r$ such that:
- when $p << r(n)$, almost no graph satisfies the property.
- when $p >> r(n)$, almost every graph satisfies the property.
Formally, (Threshold function) Given a graph property $P$, a function $r: \mathbf{N} \rightarrow [0, 1]$ is called a threshold if:
- (a) if $p(n) << r(n)$, $\mathbf{Pr}[G \text{ satisfies } P] \rightarrow 0$ when $n \rightarrow \infty$;
- (b) if $p(n) >> r(n)$, $\mathbf{Pr}[G \text{ satisfies } P] \rightarrow 1$ when $n \rightarrow \infty$;
Theorem 2 The property “G contains a 4-clique” has a threshold function $n^{-\frac{2}{3}}$.
Proof: Let
Let $X = \sum_{S \in \binom{[n]}{4}} X_S$, then by the linearity
If $p << n^{-\frac{2}{3}}$, $\mathbf{E}[X] = o(1)$.
For (b), we can not use Markov because large expectation does not imply large values with high probability. Therefore, we have to consider the variance. (反算 + Chebyshev)
Calculate by consider $|S \cap T| = 2, 3$ we get (if $p >> n^{-\frac{2}{3}}$)
Hoeffding’s Inequality
Example 1 (Tossing coins) How many times should we toss the coin (denote the times as $T$) such that with $99\%$ prob, $\hat{p} \in [(1-\varepsilon)p, (1+\varepsilon)p]$ for a given precision $\varepsilon$ ?$p$ is the probability pf “head”.
Use Chernoff
So $T \ge \frac{3 \log 200}{\varepsilon^2 p} = O(\frac{1}{\varepsilon^2})$.
Chernoff only works for independent Bernoulli r.v. Then we have its generalized version for almost surely bounded r.v. $X_i$.
Theorem 3 (Hoeffding’s inequality) Let $X_1, \dots , X_n$ be independent r.v. where $X_i \in [a_i ,b _i]$ almost surely ($a_i \le b_i$). Let $X = \sum_{i=1}^n X_i$ and $\mu := \mathbf{E}[X]$. Then
Proof: The key is to have a nice upper bound on the MGF (moment generating function) like what we did in Chernoff bound. So Firstly we have
Lemma (Hoeffding’s lemma) Let $X$ be a r.v. with $\mathbf{E}[X] = 0$ and $X \in [a, b]$. Then
for all $\alpha \in \mathbb{R}$.
We replace $X_i$ by $X_i - \mathbf{E}[X_i]$ here and by symmetry we only need to prove
Then we have
Optimize $\alpha = \frac{4t}{\sum_{i=1}^n (b_i - a_i)^2}$ Q.E.D.
Comparing Chernoff Bound and Hoeffding’s Inequality
Let $X_1,\ \dots,\ X_n$ be i.i.d. r.v. where $X_i \sim \text{Ber}(p)$. Let $X = \sum_{i=1}^n X_i$.
Let $t = \delta \mu$. Use Chernoff
Use Hoeffding
So Chernoff > Hoeffding.
Example 2 (Meal Delivery) Let $X_1,\ \dots,\ X_n$ be i.i.d. r.v. representing the weight of each meal. For any $i$, $\mu = \mathbf{E}[X_i] = 300$, and $X_i \in [250, 350]$. Then we can give a estimate of the total number $n$ by $\hat{n} = \frac{X}{\mu} = \frac{\sum_{i=1}^n X_i}{\mu}$. $n > 200$.
Concentration on Margtinalges
Example 3 (Fair games) For each round the gambler wins $X_t$ money. Because it is a fair game, $\mathbf{E}[X_t] = 0$. Let $Z_t = \sum_{i=1}^t X_i$ be the amount of money he won after t-th rounds. Then it is clear
Proof:
($Z_t$ is $\sigma(X_0,\ \dots,\ X_t)$ measureable.)
Definition 5 In a probability space $(\Omega, \mathscr{F}, \mathbf{Pr})$, a sequence of finite varabiles $\{Z_n\}_{n \ge 0}$ is a martingale if
Sometimes we say $\{Z_n\}$ is a martingale w.r.t. another sequence $\{X_n\}$ if
More formally, for a filtration $\mathscr{F}_1 \subseteq \mathscr{F}_2 \subseteq \dots \mathscr{F}$, and $Z_i$ is $\mathscr{F}_i$-measuable, then we call $\{Z_n\}$ is a martingale if
Supermartingale:
Submartingale:
If $\{Z_n\}$ is a martingale then the following property is immediate
Proposition 6 For any $n \ge 1$, $\mathbf{E}[Z_n] = \mathbf{E}[Z_0]$.
Example 4 (1-dim random walk) $X_t \in \{-1, +1\}$. $Z_t = \sum_{i=1}^t X_i$. Then $\{Z_n\}$ is a martingale w.r.t. $\{X_t\}$.
Example 5 (The product of independent random variables) $X_1,\ \dots,\ X_n$ are independent r.v. with $\mathbf{E}[X_i] = 1$. Let $P_k = \prod_{i=1}^k X_i$.
Example 6 (Polya’s urn) 一开始罐子里只有一黑一白两个球. 每次摸一个球, 将它复制并将两个都放回.令 $X_n$ 为第 n 轮后罐子里的黑球数量. Let $Z_n = \frac{X_n}{n}$. Then $\{Z_n\}_{n \ge 2}$ is a martingale w.r.t. $\{X_n\}_{n \ge 2}$. 注意, 我们的轮数从第二轮开始算. 也就是第 n 轮后罐子里有 n 个球.
Example 7 (Doob’s martingale) Let $X_1,\ \dots,\ X_n$ be a sequence of r.v. (unnecessarily independent). Let $f(\overline{X_n}) = f(X_1,\ \dots,\ X_n) \in \mathbb{R}$. For $i \ge 0$, we define
In particular, we have $Z_0 = \mathbf{E}[f(\overline{X_n})]$ and $Z_n = f(\overline{X_n})$. $\{Z_n\}$ 可以被看作一个随着信息不断增加,对函数信息估计不断准确的过程。
Proposition 7 $\{Z_n\}$ is a martingale w.r.t. $\{X_n\}$.
Proof:
这里用了一下双期望公式:
如果有两个条件期望,看 sigma 代数比较大的那个.
Azuma-Hoeffding’s Inequality
Hoeffding 的推广.
设有一列随机变量 $\{X_n\}_{n \ge 1}$ 满足 $X_i \in [a_i,\ b_i]$. 不失一般性可设 $\mathbf{E}[X_i] = 0$ (否则做替换 $X_i - \mathbf{E}[X_i]$) 令 $S_k = \sum_{i=1}^k X_i$. 如果 $\{S_n\}$ w.r.t. $\{X_n\}$ 是鞅, 那么
Proof: 略
Lec 3
Martingale (cont’d)
Example 1 (Balls-in-a-bag) 袋子里有 $g$ 个绿球和 $r$ 个红球. 我们想通过抽球来估计球的比例 $\frac{r}{r+g}$. 有两种方法(场景):
Draw with replacement
Let $X_i := 1[\text{the ball is red}]$. Then $X_i \sim \text{Ber}(\frac{r}{r+g})$.By Hoeffding,
$\mathbf{E}[X] = n\frac{r}{r+g}$.
Draw without replacement
Also $\mathbf{E}[X] = n\frac{r}{r+g}$. 因为不放回抽取可以看作先对抽球序列做一个排列,然后按排列一个个抽. 在所有排列中,第 i 个位置为 red 的排列个数当然是 $\mathbf{E}[X_i] = \frac{r}{r+g}$.
接下来考虑函数 $f(X_1, X_2, \dots, X_n) = \sum_{i=1}^n X_i$ 以及它的 Doob 序列 $Z_i = \mathbf{E}[f(\overline{X_n})\ \vert\ \overline{X_i}] $.
为了符合 Azuma-Hoeffding 的形式,我们需要做一个差分. 令 $Y_i = Z_i - Z_{i-1},\ i \ge 1$. 因此
注意 Azuma-Hoeffding 需要我们给出一个 $Y_i$ 的上下界. 令 $S_i = \sum_{j=1}^i X_j$,
Therefore
McDiarmid’s Inequality
我们上面使用的这个 Doob 序列非常强大. 因此我们可以采用类似的思想进行推广.
Definition 1 (c-Lipschitz function) A function $f(x_1,\ \dots,\ x_n)$ satisfies c-Lipschitz condition if
Theorem 2 (McDiarmid’s Inequality) 若 f 为 c-Lipschitz function 且 $X_1, \dots, X_n$ 为 independ random variables,那么我们有
Proof: 略,和上面的证明类似
下面是两个 McDiarmid’s Inequality 的应用.
Example 2 (Pattern Matching) 令 $B \in \{0, 1\}^k$ 为一个固定字符串. 对于一个随机的字符串 $X \in \{0, 1\}^n$, $B$ 在 $X$ 中期望出现的次数为?
记 $X_i$ 为 $X$ 的第 i 位,$Y_i,\ 1 \le i \le n-k+1$ 为 $1[X_i X_{i+1} \dots X_{i+k-1} = B]$. 那么所求即为
记 $f(X_1, \dots X_n) = Y$,显然 $f$ 是 k-Lipschitz 的. 那么
Example 3 (Chromatic Number of $G(n, p)$)
The most natural way: introduce a varaible $X_e = 1 [\text{the edge exists}]$ for every pair of vertices. Then $\chi(G)$ is 1-Lipschitz.
这个 bound 并不是很好,因为要让右边为常数,$t = \Omega(n)$,没啥用
我们考虑改变一个点的染色至多影响其所有相邻的边。因此令所有点为 $\{v_1, \dots, v_n\}$,然后令 $Y_i,\ 2 \le i \le n$ 为 $v_i$ 与 $\{v_1, \dots, v_{i-1}\}$ 之间的连边情况(encode 起来)。然后将染色数看作 $f(Y_2, \dots, Y_n)$. 显然 $f$ 也是 1-Lipschitz 的。
Stopping Time
问题:如果下标是一个 random variable $\tau$,是否仍有 $\mathbf{E}[Z_{\tau}] = \mathbf{E}[Z_0]$?先定义好什么叫 “下标是随机变量”。
Definition 3 (Stopping Time) 令 $\tau \in \mathbb{N} \cup \{\infty\}$ 为一个随机变量. 我们称其为一个 stopping time,如果 $\forall t \ge 0$,事件 “$\tau \le t$” 是 $\mathscr{F}_t$-measurable 的。
用人话讲就是,在 $t$ 时刻是否停了,根据前面的信息就可以确定。例如,一个赌徒第一次连赢五场就停手,记这个时间为 $\tau$,它就是停时(赢没赢五场看最近五场就知道);但是他最后一次连赢五场这个时间就不是停时,因为你只看前面的轮次无法确定是否是 “最后一次”。
Optional Stopping Theorem (OST)
这个定理给 $\mathbf{E}[Z_\tau] = \mathbf{E}[Z_0]$ 成立提供了三个充分条件。
Theorem 4 (Optional Stopping Theorem) $\{X_t\}_{t \ge 0}$ 是一个 martingale 并且 $\tau$ 是一个停时 w.r.t. $\{\mathscr{F}_t\}_{t \ge 0}$. 那么 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0]$ 如果下面三个条件之一满足:
- $\tau$ a.s. 有界。即 $\exists n \in \mathbb{N}$ s.t. $\mathbf{Pr}[\tau \le n] = 1$ ;
- $\mathbf{Pr}[\tau < \infty] = 1$ 且存在有限的 M 使得 $\forall t < \tau,\ |X_t| \le M$ ;
- $\mathbf{E}[\tau] < \infty$ 且存在常数 $c$ 使得 $\mathbf{E}[|X_{t+1}-X_t|\ \vert\ \mathscr{F}_t] \le c$ 对于任意 $t < \tau$。
Example 4 (Boy or Girl) 下面三种情况的性别比例是否是1:1?
- 直到生出男孩,一直生;
- 直到男孩数>女孩数,一直生;
- 直到男孩数>女孩数或者总孩子数达到10个,一直生.
我们可以直接用 1-dim random walk 来看这个问题。记 $X_t$ 为 $t$ 步后的坐标(也即男孩数-女孩数)。
- 第一种情况:直接满足 OST 条件三。显然期望有限,并且由于每次移动一步,$c=1$;
- 第二种情况:三个条件都不满足。实际上它肯定不是1:1;
- 第三种情况:直接满足条件一。当然也可以满足条件二。
满足 OST 条件后 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0] = 0$。即期望为1:1.
Lec 4
Proof of OST
First we have
So naturally we can decompose $X_r$ into two terms
So we only need to verify that
The first condition: obvisouly $1[\tau > n] = 0$.
The second one:
because $\mathbf{Pr}[\tau < \infty] = 1$.
The third one:
Because $\mathbf{E}[\tau] = \sum_{t=0}^\infty \mathbf{Pr}[\tau > t] < \infty$. Therefore $\sum_{t \ge n} \mathbf{Pr}[\tau > t] \rightarrow 0$ when $n \rightarrow \infty$.
Doobs martingale inequality
用 OST 可以得出一个序列最大值的 concentration 性质。
Claim 2 Let $\{X_t\}_{t \ge 0}$ be a martingale w.r.t. itself where $X_t \ge 0$ for every $t$. Then
Proof: Define $\tau$ as the index of the first element greater $\alpha$ occurs.
It satisfies OST first condition. Then
1-dim Random Walk with Two Absorbing Barriers
两个吸收壁 $x=-a$, $x=b$. 记 $\tau$ 为停止时间,求 $\mathbf{E}[\tau]$.
可以看出无论这个人在哪里,只要他往一个方向走 $a+b$ 步($2^{-(a+b)}$ 概率)那么一定结束。因此在无穷的时间内 $\mathbf{E}[\tau] < \infty$. 并且 $\mathbf{E}[|X_{t+1}-X_t|\ \vert\ \mathscr{F}_t] \le 1$. 因此 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0] = 0$. 此外 $\mathbf{E}[X_\tau] = P_a(-a) + (1-P_a) b$, 故 $P_a = \frac{b}{a+b}$.
我们需要构造一个和时间 $t$ 有关的新 martingale. 构造 $Y_t = X_t^2 - t$. 容易验证它是鞅且满足 OST 第三条件. 故
Pattern Matching
给定一个 pattern $P \in \{0, 1\}^l$。我们知道在随机串中 $P$ 出现次数的期望
$$
\mathbf{E}[\text