202405201744
Status: #idea
Tags: Probability
State: #awakened

Probability Measure (According to Kolmogorov)

A Probability measure is a special type of measure which maps elements from a $σ -$ algebra to the interval $[0, 1]$ . That is really it.

It has the same properties as you would expect for a standard measure:

$P (\emptyset) = 0$
If $A_{1}, A_{2}, \dots$ are disjoint then $P (⋃_{i = 1}^{\infty} A_{i}) = \sum_{i = 1}^{\infty} P (A_{i})$

This is the standard definition of probability. Note that there is a different but congruent definition of probability which can be built not from Measure Theory axioms, but from the concept Expected Value directly. This is Probability Measure (Based on Expected Value).

Since they are congruent and have the same properties one can go from one to the other with impunity, but it is important to note.

Properties of Probability Measures

Formula for Complement

But from that we can derive everything we come to know about probability.
For example, since we know that $P (Ω)$ (where $Ω$ is the sample space) is $1$ , and we know that for any set $A$ , $A \cup A^{c} = Ω$ and $A \cap A^{c} = \emptyset$ , we can use the second axiom of measures to say to derive this well-known fact:

\begin{aligned} Ω & = A \cup A^{c} \\ P (Ω) & = P (A \cup A^{c}), using both sides as inputs to the probability measure P \\ P (Ω) & = P (A) + P (A^{c}), by second axiom \\ 1 & = P (A) + P (A^{c}), by definition of a probability measure \\ P (A^{c}) & = 1 - P (A) \end{aligned}

You can also prove easily that $A \subseteq B$ , then $P (A) \leq P (B)$ , we leave that as an exercise to the reader. Lel.

Using Measure Theory as the foundation of Probability Theory makes all the other derivations similarly beautiful.

Formula for unions : The Inclusion-Exclusion Principle

Let's first start by proving the following $P (A \cup B) = P (A) + P (B) - P (A \cap B)$ ,
This is my derivation of it.

\begin{aligned} A \cup B & = (A ∖ B) \cup B, we rewrite the LHS in a more convenient fashion \\ P (A \cup B) & = P (A ∖ B \cup B \\ = P (A ∖ B) + P (B), by second axiom \\ = P (A \cap B^{c}) + P (B) \\ = P (A) - P (A \cap B) + P (B), since A ∖ B is the part of A that is not in B \\ = P (A) + P (B) - P (A \cap B), simple rearrangement \end{aligned}

Boom!
This gives us the rigorous proof for why this equality holds, now what is the general form of the formula? What if instead of $A$ and $B$ , we have $A_{1}, A_{2}, A_{3}, \dots, A_{n}$ which are all in our $σ -$ algebra.

What is the formula for $P (⋃_{i = 1}^{n} A_{i})$ ?
Pasted image 20240520183722.png
screencap from Wikipedia cause I ain't typing allat.

How to Prove it?

Induction
Indicator Variables and Expectation
I did the latter for an assignment, for the former... well good luck with that.

Continuity of Probability Measures

If $A_{1}, A_{2}, \dots \in σ -algebra Ξ$ then,

P (⋃_{i = 1}^{\infty} A_{i}) = lim_{m \to \infty} P [⋃_{i = 1}^{m} A_{i})]

This looks rather obvious (I mean it looks really similar to how we definite infinite summations,) but there's actually more to this statement than meets the eye. This is a really important theorem that is used all the time in Probability Theory. Also, among other things we are NOT taking the limit inside the brackets, an actual rigorous proof is required to show the equivalency.

Also $⋃$ shouldn't be seen as a sequential operator which operates $A_{1}$ and $A_{2}$ , and then $A_{1} \cup A_{2}$ and $A_{3}$ , etc. Instead it takes everything at once. While the following statement is true

⋃_{i = 1}^{\infty} A_{i} = lim_{m \to \infty} (⋃_{i = 1}^{m} A_{i})

(this limit refers to increasing inclusions of sets) it is not wise to move the limits in or out.

Indeed, one should be careful when introducing limits in a Measure Theory context, especially when it comes to bringing a limit in and out of a measure. Unless there's a specific argument to support it, or a convergence theorem that says we can, one should not assume it is correct to do so.

A proof of this theorem can be found on YouTube, the second link in the references covers it during the lecture.

Corollary

1.

If $A_{1} \subseteq A_{2} \subseteq A_{3} \subset \dots$
Then by the previous result:

P (⋃_{i = 1}^{\infty} A_{i}) = lim_{m \to \infty} P (A_{m})

2.

If $A_{1} \supseteq A_{2} \supseteq A_{3} \supseteq \dots$
Then by previous result and DeMorgan's:

P (⋂_{i = 1}^{\infty} A_{i}) = lim_{m \to \infty} P (A_{m})

This is not a typo, in fact it makes perfect sense. If I keep taking intersections of a sequence of set which are non-increasing (in the sense that $A_{i} \supseteq A_{j}$ for all $i, j$ where $i < j$ ), then at the end of the infinite road, the one set that will be in all of them is $A_{m}$ .

Observation, we see that if the sequence is non-increasing, or non-decreasing the limit of an infinite sequence of set will simply be $lim_{m \to \infty} P (A_{m})$ .

Union-Bound Property

Let $A_{1}, A_{2}, \dots$ all be events, then:

P (⋃_{n = 1}^{\infty} A_{i}) \leq \sum_{i = 1}^{\infty} P (A_{i})

Intuitively, if all the $A_{i}$ are disjoint, then by the second axiom we have equality, but if even one pair $A_{i}$ , $A_{j}$ overlaps, we will double count their intersections. This problem will compound the more pairs that overlaps exist.

This can be proven pretty neatly using Indicator Variables, but it can be also shown directly using the $A_{i}$ to $B_{i}$ transformation that is used to show the continuity of probability measures. The second link of references is really useful for understanding all of that.

Relevant Links

Probability Spaces
Probability Measure (Based on Expected Value)
Probability Measures Lecture