Disclaimer: After reading this article, I recommend reading the following: the Probabilistic Ball post. These short expositions of some underlying concepts of High Dimensional Data Science from a Geometric and Probabilistic perspective will (hopefully) give a taste of the curse of dimensionality and concentration of measure (on which I will write more thoroughly later on). This article is however self-contained.
Motivation: When one works with data, a very usual occurrence is that said data rests in high dimensions. You could have a wide variety of values (numerical or not) within one observation. There are inherent benefits to high dimensional data, such as more information (or precision) for each observation, there are however drawbacks in terms of computation time and other less obvious issues. I hope through this article, we will review some basic inequalities which are foundational to the study of high dimensional data, as they can allow us to understand how data behaves asymptotically.
Let's now tackle some inequalities which will help us create tight bounds on our probabilities. Let's start with Markov's inequality. Take a non-negative random variable \(X\) then we have for some \(t\)
$$
P(X \geq t) \leq \frac{E[X]}{t}
$$
Note that:
$$
E[X] = \int_{X(\Omega)} x P(x) dx = \int_{X(A)} x P(x) dx + \int_{X(A^{c})} x P(x) dx
$$
Which allows us to say:
$$E[X] = \int_{0}^{t} x P(x) dx + \int_{t}^{\infty} x P(x) dx$$ $$\geq \int_{t}^{\infty} x P(x) dx $$
$$ \geq \int_{t}^{\infty} t P(x) dx $$
$$ = t P(X \geq t)$$
This inequality can be naturally extended into the Chebyshev's Inequality for a random variable which has finite variance
$$
P(|X - E[X]| \geq t) \leq \frac{Var(X)}{t^2}
$$
by noticing that \(E[(X - E[X])^2] = Var(X)\). We can write the inequality in a (perhaps) more provocative way (at least to me):
$$
P(X \geq E[X] + t \text{ OR } X \leq E[X] -t ) \leq \frac{Var(X)}{t^2}
$$
The above tells us that the sample mean of a random variable which has a finite variance will converge to its true mean. This is a very useful property which gives us more confidence in the elementary statistic which is the mean. An example for a distribution where Chebyshev's inequality does not hold is the Cauchy distribution given its heavy tails, a useful exercise is to attempt computing its first two moments and see why we get stuck!
It won't hurt us to review one way to derive the Law of Large Numbers, which can be directly deduced from Chebyshev's inequality. Take some random numbers \(x_1 , \dots , x_n\) from a random variable \(X\)then we have :
$$P(|\frac{1}{n}\sum_{i = 1}^{n}x_i - E[X]| \geq \epsilon) \space \text{(By Chebyshev)}$$
$$\leq \frac{var(\frac{1}{n}\sum_{i = 1}^{n}x_i )}{\epsilon^2}$$
$$\leq \frac{var(\sum_{i = 1}^{n}x_i)}{n^2 \epsilon^2}$$
$$\leq \frac{\sum_{i = 1}^{n}var(x_i)}{n^2 \epsilon^2}$$
$$\leq \frac{var(X)}{n \epsilon^2}$$
We now have this fundamental theorem, but lets move on to finding tighter bounds on our probabilities!
Our Chebychev's inequality is very useful, especially when working on proofs which require bounds, however from a computational aspect, the bounds are quite weak given they do not tighten exponentially relative to the sample size. One first option in order to make our bounds more precise is to use moment generating functions. For some random variable \(X\) the moment generating function is: \(M_X(t) = E[e^{tX}]\) for some Real \(t\). The rationale behind this device is the fact that the series expansion of \(e^x = 1 + tX + \frac{t^2 X^2}{2!} + \frac{t^3 X^3}{3!} + ...\). This indeed is a quite appealing property, where we have to, however, remind our selves that some moments might not be defined. Back to our bounds!
Recall that while showing our Markov's bound, we established that \(E[X] \geq tP(X \geq t)\) and very naturally we find that : \(P(X > \epsilon) = P(e^{tX} > e^{t \epsilon}) \leq e^{-t \epsilon}E[e^{tX}]\). Note that \(t\) here is a variable we have full control over, we therefore find the infimum on the right hand side of the bound based on \(t\). What we just derived is called Chernov's Bounding Method.
it will be interesting from here to move on the Hoeffding inequality, but that will be in a later exploration!
No comments:
Post a Comment