CS3958-Advanced Algorithm-Notes

| Views: | Total Words: 28k | Reading Time: 26 mins.

CS3958: Advanced Algorithms

这是 ACM 班 2022 Fall 高级算法课程的个人整理小笔记. 众所周知, 我都是在期末复习的时候才认真做一遍笔记的. 课程框架:

  • Basic Probabilistic Tools
    • Concentration Inequalities
    • Martingale
  • Optimization
    • First-Order Optimization
    • Online Optimization
  • Markov Chain and Sampling
    • An Application of MCMC (Markov Chain Monte Carlo)
    • Perspectives for studying MCMC
    • Stochastic Process
    • Spectrum
    • Operator

Lec1

Concentration Inequalities

Theorem 2 (Markov Inequality) For any non-negative r.v. $X$ and $a > 0$ ,

Proof:

Example 4 (Coupon Collector) Let $X$ be the number of draws.

Proof: Let $X_i$ be the number of draws with $i$ cards collected.

另解: 令 $X_i$ 为有 $i$ 个 coupon 后得到一个新的 coupon 的抽取次数.

Theorem 3 (Chebyshev’s Inequality) For any r.v. $X$ with bounded $\mathbf{E}[X]$ and $a > 0$,

Proof:

Example 3: (Streaming Model) Input a sequence $a_1$, $a_2$, $\dots$, $a_m$. Compute $m$.

Brute-Force: maintain a counter $k$ with $O(\log m)$ memory. Best algorithm for the exact answer.

Morris’ algorithm:

1
2
3
X <- 0
On each input: X <- X+1 with prob 2^(-X).
return 2^X-1

Theorem 1: $\mathbf{E}[\hat{m}] = m$. $\hat{m}$ is the output of Morris’ algorithm, $m$ is the true answer.

Proof: By induction.

Uses $O(\log \log m)$ bits of memory. We want to

For fixed $\varepsilon$, smaller $\delta$ is better.

Lemma 4:

Proof: $m=1$, $\mathbf{E}[(2^{X_m})^2]$ = 4.

With Lemma 4,

With Chebyshev

If $\varepsilon$ is smaller, $\frac{1}{2\varepsilon^2} > 1$, the bound is useless.

The Averaging Trick

We can run Morris’s algorithm $t$ times in parallel and the final output is the average

We have

For a fixed $\delta$, $t \ge \frac{1}{2\varepsilon^2 \delta}$, $\mathbf{Pr}[|\hat{m}^* - m| \ge \varepsilon m] \le \delta$. New algorithm uses $O(\frac{\log \log n}{\varepsilon^2 \delta})$ bits of memory.

Chernoff Bound

Theorem 5 (Chernoff Bound) Let $X_1, \dots, X_n$ be independent r.v. such that $X_i \sim \text{Ber}(p_i)$. Let $X = \sum_{i=1}^n X_i$ and $\mu = \mathbf{E}[X] = \sum_{i=1}^n p_i$. Then we have

If $0 < \delta < 1$.

Proof: core idea: use $f(X) = e^{aX}$ for $a > 0$ and use Markov.

Corollary 6 For any $0 < \delta < 1$,

Proof:

The Median Trick

Further boost the performance of Morrie’s algorithm using the median trick. Choose $t = \frac{3}{2 \varepsilon^2}$ in the averaging trick and run it $s$ times in parallel.

At last output the median $\hat{m}^{*}$ of $\hat{m_1}^, \hat{m_2}^, \dots, \hat{m_s}^$.

Let $Y_i$ be the indicator of the (good) event that

Then $Y := \sum_{i=1}^s Y_i$ satisfies $\mathbf{E}[Y] \ge\frac{2}{3} s$. If the median is bad, then at least half of $\hat{m_i}^*$ is bad which is

$Y \le \frac{1}{2} s$. By Chernoff,

Therefore, for $t = O(\frac{1}{\varepsilon^2})$ and $s = O(\log \frac{1}{\delta})$, we have $\mathbf{Pr}[|\hat{m}^* - m| \ge \varepsilon m] \le \delta$.

This new algorithm uses

bits of memory. ($O(\log \frac{1}{\delta}) << O(\frac{1}{\delta})$).

Lec 2

Threshold Behavior of Random Graphs

Threshold Behavior: For a random graph $G(n, p)$, these is a threshold function $r$ such that:

  • when $p << r(n)$, almost no graph satisfies the property.
  • when $p >> r(n)$, almost every graph satisfies the property.

Formally, (Threshold function) Given a graph property $P$, a function $r: \mathbf{N} \rightarrow [0, 1]$ is called a threshold if:

  • (a) if $p(n) << r(n)$, $\mathbf{Pr}[G \text{ satisfies } P] \rightarrow 0$ when $n \rightarrow \infty$;
  • (b) if $p(n) >> r(n)$, $\mathbf{Pr}[G \text{ satisfies } P] \rightarrow 1$ when $n \rightarrow \infty$;

Theorem 2 The property “G contains a 4-clique” has a threshold function $n^{-\frac{2}{3}}$.

Proof: Let

Let $X = \sum_{S \in \binom{[n]}{4}} X_S$, then by the linearity

If $p << n^{-\frac{2}{3}}$, $\mathbf{E}[X] = o(1)$.

For (b), we can not use Markov because large expectation does not imply large values with high probability. Therefore, we have to consider the variance. (反算 + Chebyshev)

Calculate by consider $|S \cap T| = 2, 3$ we get (if $p >> n^{-\frac{2}{3}}$)

Hoeffding’s Inequality

Example 1 (Tossing coins) How many times should we toss the coin (denote the times as $T$) such that with $99\%$ prob, $\hat{p} \in [(1-\varepsilon)p, (1+\varepsilon)p]$ for a given precision $\varepsilon$ ?$p$ is the probability pf “head”.

Use Chernoff

So $T \ge \frac{3 \log 200}{\varepsilon^2 p} = O(\frac{1}{\varepsilon^2})$.

Chernoff only works for independent Bernoulli r.v. Then we have its generalized version for almost surely bounded r.v. $X_i$.

Theorem 3 (Hoeffding’s inequality) Let $X_1, \dots , X_n$ be independent r.v. where $X_i \in [a_i ,b _i]$ almost surely ($a_i \le b_i$). Let $X = \sum_{i=1}^n X_i$ and $\mu := \mathbf{E}[X]$. Then

Proof: The key is to have a nice upper bound on the MGF (moment generating function) like what we did in Chernoff bound. So Firstly we have

Lemma (Hoeffding’s lemma) Let $X$ be a r.v. with $\mathbf{E}[X] = 0$ and $X \in [a, b]$. Then

for all $\alpha \in \mathbb{R}$.

We replace $X_i$ by $X_i - \mathbf{E}[X_i]$ here and by symmetry we only need to prove

Then we have

Optimize $\alpha = \frac{4t}{\sum_{i=1}^n (b_i - a_i)^2}$ Q.E.D.

Comparing Chernoff Bound and Hoeffding’s Inequality

Let $X_1,\ \dots,\ X_n$ be i.i.d. r.v. where $X_i \sim \text{Ber}(p)$. Let $X = \sum_{i=1}^n X_i$.

Let $t = \delta \mu$. Use Chernoff

Use Hoeffding

So Chernoff > Hoeffding.

Example 2 (Meal Delivery) Let $X_1,\ \dots,\ X_n$ be i.i.d. r.v. representing the weight of each meal. For any $i$, $\mu = \mathbf{E}[X_i] = 300$, and $X_i \in [250, 350]$. Then we can give a estimate of the total number $n$ by $\hat{n} = \frac{X}{\mu} = \frac{\sum_{i=1}^n X_i}{\mu}$. $n > 200$.

Concentration on Margtinalges

Example 3 (Fair games) For each round the gambler wins $X_t$ money. Because it is a fair game, $\mathbf{E}[X_t] = 0$. Let $Z_t = \sum_{i=1}^t X_i$ be the amount of money he won after t-th rounds. Then it is clear

Proof:

($Z_t$ is $\sigma(X_0,\ \dots,\ X_t)$ measureable.)

Definition 5 In a probability space $(\Omega, \mathscr{F}, \mathbf{Pr})$, a sequence of finite varabiles $\{Z_n\}_{n \ge 0}$ is a martingale if

Sometimes we say $\{Z_n\}$ is a martingale w.r.t. another sequence $\{X_n\}$ if

More formally, for a filtration $\mathscr{F}_1 \subseteq \mathscr{F}_2 \subseteq \dots \mathscr{F}$, and $Z_i$ is $\mathscr{F}_i$-measuable, then we call $\{Z_n\}$ is a martingale if

Supermartingale:

Submartingale:

If $\{Z_n\}$ is a martingale then the following property is immediate

Proposition 6 For any $n \ge 1$, $\mathbf{E}[Z_n] = \mathbf{E}[Z_0]$.

Example 4 (1-dim random walk) $X_t \in \{-1, +1\}$. $Z_t = \sum_{i=1}^t X_i$. Then $\{Z_n\}$ is a martingale w.r.t. $\{X_t\}$.

Example 5 (The product of independent random variables) $X_1,\ \dots,\ X_n$ are independent r.v. with $\mathbf{E}[X_i] = 1$. Let $P_k = \prod_{i=1}^k X_i$.

Example 6 (Polya’s urn) 一开始罐子里只有一黑一白两个球. 每次摸一个球, 将它复制并将两个都放回.令 $X_n$ 为第 n 轮后罐子里的黑球数量. Let $Z_n = \frac{X_n}{n}$. Then $\{Z_n\}_{n \ge 2}$ is a martingale w.r.t. $\{X_n\}_{n \ge 2}$. 注意, 我们的轮数从第二轮开始算. 也就是第 n 轮后罐子里有 n 个球.

Example 7 (Doob’s martingale) Let $X_1,\ \dots,\ X_n$ be a sequence of r.v. (unnecessarily independent). Let $f(\overline{X_n}) = f(X_1,\ \dots,\ X_n) \in \mathbb{R}$. For $i \ge 0$, we define

In particular, we have $Z_0 = \mathbf{E}[f(\overline{X_n})]$ and $Z_n = f(\overline{X_n})$. $\{Z_n\}$ 可以被看作一个随着信息不断增加,对函数信息估计不断准确的过程。

Proposition 7 $\{Z_n\}$ is a martingale w.r.t. $\{X_n\}$.

Proof:

这里用了一下双期望公式:

如果有两个条件期望,看 sigma 代数比较大的那个.

Azuma-Hoeffding’s Inequality

Hoeffding 的推广.

设有一列随机变量 $\{X_n\}_{n \ge 1}$ 满足 $X_i \in [a_i,\ b_i]$. 不失一般性可设 $\mathbf{E}[X_i] = 0$ (否则做替换 $X_i - \mathbf{E}[X_i]$) 令 $S_k = \sum_{i=1}^k X_i$. 如果 $\{S_n\}$ w.r.t. $\{X_n\}$ 是鞅, 那么

Proof: 略

Lec 3

Martingale (cont’d)

Example 1 (Balls-in-a-bag) 袋子里有 $g$ 个绿球和 $r$ 个红球. 我们想通过抽球来估计球的比例 $\frac{r}{r+g}$. 有两种方法(场景):

  • Draw with replacement

    Let $X_i := 1[\text{the ball is red}]$. Then $X_i \sim \text{Ber}(\frac{r}{r+g})$.By Hoeffding,

    $\mathbf{E}[X] = n\frac{r}{r+g}$.

  • Draw without replacement

    Also $\mathbf{E}[X] = n\frac{r}{r+g}$. 因为不放回抽取可以看作先对抽球序列做一个排列,然后按排列一个个抽. 在所有排列中,第 i 个位置为 red 的排列个数当然是 $\mathbf{E}[X_i] = \frac{r}{r+g}$.

    接下来考虑函数 $f(X_1, X_2, \dots, X_n) = \sum_{i=1}^n X_i$ 以及它的 Doob 序列 $Z_i = \mathbf{E}[f(\overline{X_n})\ \vert\ \overline{X_i}] $.

    为了符合 Azuma-Hoeffding 的形式,我们需要做一个差分. 令 $Y_i = Z_i - Z_{i-1},\ i \ge 1$. 因此

    注意 Azuma-Hoeffding 需要我们给出一个 $Y_i$ 的上下界. 令 $S_i = \sum_{j=1}^i X_j$,

Therefore

McDiarmid’s Inequality

我们上面使用的这个 Doob 序列非常强大. 因此我们可以采用类似的思想进行推广.

Definition 1 (c-Lipschitz function) A function $f(x_1,\ \dots,\ x_n)$ satisfies c-Lipschitz condition if

Theorem 2 (McDiarmid’s Inequality) 若 f 为 c-Lipschitz function 且 $X_1, \dots, X_n$ 为 independ random variables,那么我们有

Proof: 略,和上面的证明类似

下面是两个 McDiarmid’s Inequality 的应用.

Example 2 (Pattern Matching) 令 $B \in \{0, 1\}^k$ 为一个固定字符串. 对于一个随机的字符串 $X \in \{0, 1\}^n$, $B$ 在 $X$ 中期望出现的次数为?

记 $X_i$ 为 $X$ 的第 i 位,$Y_i,\ 1 \le i \le n-k+1$ 为 $1[X_i X_{i+1} \dots X_{i+k-1} = B]$. 那么所求即为

记 $f(X_1, \dots X_n) = Y$,显然 $f$ 是 k-Lipschitz 的. 那么

Example 3 (Chromatic Number of $G(n, p)$)

  • The most natural way: introduce a varaible $X_e = 1 [\text{the edge exists}]$ for every pair of vertices. Then $\chi(G)$ is 1-Lipschitz.

    这个 bound 并不是很好,因为要让右边为常数,$t = \Omega(n)$,没啥用

  • 我们考虑改变一个点的染色至多影响其所有相邻的边。因此令所有点为 $\{v_1, \dots, v_n\}$,然后令 $Y_i,\ 2 \le i \le n$ 为 $v_i$ 与 $\{v_1, \dots, v_{i-1}\}$ 之间的连边情况(encode 起来)。然后将染色数看作 $f(Y_2, \dots, Y_n)$. 显然 $f$ 也是 1-Lipschitz 的。

    Stopping Time

    问题:如果下标是一个 random variable $\tau$,是否仍有 $\mathbf{E}[Z_{\tau}] = \mathbf{E}[Z_0]$?先定义好什么叫 “下标是随机变量”。

    Definition 3 (Stopping Time) 令 $\tau \in \mathbb{N} \cup \{\infty\}$ 为一个随机变量. 我们称其为一个 stopping time,如果 $\forall t \ge 0$,事件 “$\tau \le t$” 是 $\mathscr{F}_t$-measurable 的。

    用人话讲就是,在 $t$ 时刻是否停了,根据前面的信息就可以确定。例如,一个赌徒第一次连赢五场就停手,记这个时间为 $\tau$,它就是停时(赢没赢五场看最近五场就知道);但是他最后一次连赢五场这个时间就不是停时,因为你只看前面的轮次无法确定是否是 “最后一次”。

Optional Stopping Theorem (OST)

这个定理给 $\mathbf{E}[Z_\tau] = \mathbf{E}[Z_0]$ 成立提供了三个充分条件

Theorem 4 (Optional Stopping Theorem) $\{X_t\}_{t \ge 0}$ 是一个 martingale 并且 $\tau$ 是一个停时 w.r.t. $\{\mathscr{F}_t\}_{t \ge 0}$. 那么 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0]$ 如果下面三个条件之一满足:

  • $\tau$ a.s. 有界。即 $\exists n \in \mathbb{N}$ s.t. $\mathbf{Pr}[\tau \le n] = 1$ ;
  • $\mathbf{Pr}[\tau < \infty] = 1$ 且存在有限的 M 使得 $\forall t < \tau,\ |X_t| \le M$ ;
  • $\mathbf{E}[\tau] < \infty$ 且存在常数 $c$ 使得 $\mathbf{E}[|X_{t+1}-X_t|\ \vert\ \mathscr{F}_t] \le c$ 对于任意 $t < \tau$。

Example 4 (Boy or Girl) 下面三种情况的性别比例是否是1:1?

  • 直到生出男孩,一直生;
  • 直到男孩数>女孩数,一直生;
  • 直到男孩数>女孩数或者总孩子数达到10个,一直生.

我们可以直接用 1-dim random walk 来看这个问题。记 $X_t$ 为 $t$ 步后的坐标(也即男孩数-女孩数)。

  • 第一种情况:直接满足 OST 条件三。显然期望有限,并且由于每次移动一步,$c=1$;
  • 第二种情况:三个条件都不满足。实际上它肯定不是1:1;
  • 第三种情况:直接满足条件一。当然也可以满足条件二。

满足 OST 条件后 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0] = 0$。即期望为1:1.

Lec 4

Proof of OST

First we have

So naturally we can decompose $X_r$ into two terms

So we only need to verify that

The first condition: obvisouly $1[\tau > n] = 0$.

The second one:

because $\mathbf{Pr}[\tau < \infty] = 1$.

The third one:

Because $\mathbf{E}[\tau] = \sum_{t=0}^\infty \mathbf{Pr}[\tau > t] < \infty$. Therefore $\sum_{t \ge n} \mathbf{Pr}[\tau > t] \rightarrow 0$ when $n \rightarrow \infty$.

Doobs martingale inequality

用 OST 可以得出一个序列最大值的 concentration 性质。

Claim 2 Let $\{X_t\}_{t \ge 0}$ be a martingale w.r.t. itself where $X_t \ge 0$ for every $t$. Then

Proof: Define $\tau$ as the index of the first element greater $\alpha$ occurs.

It satisfies OST first condition. Then

1-dim Random Walk with Two Absorbing Barriers

两个吸收壁 $x=-a$, $x=b$. 记 $\tau$ 为停止时间,求 $\mathbf{E}[\tau]$.

可以看出无论这个人在哪里,只要他往一个方向走 $a+b$ 步($2^{-(a+b)}$ 概率)那么一定结束。因此在无穷的时间内 $\mathbf{E}[\tau] < \infty$. 并且 $\mathbf{E}[|X_{t+1}-X_t|\ \vert\ \mathscr{F}_t] \le 1$. 因此 $\mathbf{E}[X_\tau] = \mathbf{E}[X_0] = 0$. 此外 $\mathbf{E}[X_\tau] = P_a(-a) + (1-P_a) b$, 故 $P_a = \frac{b}{a+b}$.

我们需要构造一个和时间 $t$ 有关的新 martingale. 构造 $Y_t = X_t^2 - t$. 容易验证它是鞅且满足 OST 第三条件. 故

Pattern Matching

给定一个 pattern $P \in \{0, 1\}^l$。我们知道在随机串中 $P$ 出现次数的期望

$$
\mathbf{E}[\text

Author: SiriusNEO

Published on: Metric Space

All posts on this blog are licensed under the CC BY-NC-SA 4.0 license unless otherwise noted.