Lecture Notes :Conditioning and Bayes' Rule


Key Learning 

  • Conditional Probability of A occurring given B happened is P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)} 

Getting into the thick of it

Information is always partial and how do we actually deal with this? When we get a little bit of information about some information then what do we do

Subtlety in the Additivity theorem 

Let’s assume we’re dealing with a square. 
The probability law is as follows AA is random patch in that square, the probability of that patch is the area of it. P(A)=area(A)P(A) = area(A) so when we are looking at the probability of the square using our additive law union of all points should give us the probability of all points, and because it’s a universal set it should be 1

P(square)=P(x,y(x,y))P(square) = P(\cup_{x,y} {(x,y)})

P(square)=x,yP((x,y))P(square)=\sum_{x,y} P({(x,y)}) 

but the area of a unit point is zero

P(square)=0=0P(square)= \sum 0 = 0 

but P(square)P(square) is all points in that square, thus did we just prove that 1 = 0?

Can we really apply our additivity axiom. Here's the catch. The additivity axiom applies to the case where we have a sequence of disjoint events and we take their union. Is this a sequence of sets? Can you make up the whole unit square by taking a sequence of elements inside it and cover the whole unit square? Well if you try, if you start looking at the sequence of one element points, that sequence will never be able to exhaust the whole unit square

So there's a deeper reason behind that. And the reason is that infinite sets are not all of the same size. The integers are an infinite set. And you can arrange the integers in a sequence. But the continuous set like the units square is a bigger set. It's so-called uncountable. It has more elements than any sequence could have. So this union here is not of this kind, where we would have a sequence of events. It's a different kind of union. It's a Union that involves a union of many, many more sets. So the countable additivity axiom does not apply in this case. Because, we're not dealing with a sequence of sets. And so this is the incorrect step.

So at some level you might think that this is puzzling and awfully confusing. On the other hand, if you think about areas of the way you're used to them from calculus, there's nothing mysterious about it. Every point on the unit square has zero area. When you put all the points together, they make up something that has finite area. So there shouldn't be any mystery behind it.

  • Zero probability doesn’t mean impossible but just means it’s very very very rare. Nothing is impossible!

Conditional Probability

And the starting point is the following-- You know something about the world. And based on what you know when you set up a probability model and you write down probabilities for the different outcomes. Then something happens, and somebody tells you a little more about the world, gives you some new information. This new information, in general, should change your beliefs about what happened or what may happen. So whenever we're given new information, some partial information about the outcome of the experiment, we should revise our beliefs. And conditional probabilities are just the probabilities that apply after the revision of our beliefs, when we're given some information

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

Let’s rewrite this 

P(AB)=P(B)P(AB)P(A \cap B)= P(B) P(A|B)

Update Notes

Last Updated : 13th January 2020, @meghana b 
Last Updated : 10th January 2020, @meghana b