It is a function of Y: for any particular value of Y=y, what do we expect X to be? So E[X∣Y]=g(Y).
The expectation of this function EY[g(Y)]=EY[E[X∣Y]]=E[X].
This is also known as the law of total expectation because EY[g(Y)]=∑yg(y)Pr(Y=y)=∑yE[X∣Y=y]Pr(Y=y).
Taking it a couple steps further,
∑yE[X∣Y=y]Pr(Y=y)
=∑y(∑xxPr(X=x∣Y=y))Pr(Y=y)
=∑x,yxPr(X=x∣Y=y)Pr(Y=y)
=∑x,yxPr(X=x,Y=y)
=∑xx∑yPr(X=x,Y=y)
=∑xxPr(X=x)
=E[X]
Markov Chains
Premise: represent a sequence of random variables X1,X2,⋯,Xn as a directed graph, where each node represents a value that the random variables can take on, and holds a probability distribution to describe transitions to the next node
X: state space. set of possible values of Xi
P: state transition matrix. entry Pi,j at row i and column j is Pi,j=Pr(Xn+1=j∣Xn=i)
π0: initial distribution. π0(i)=Pr(X0=i). in general πn is the distribution of Xn
The sum of each row of P is 1. Because of the convention of defining Pi,j as above, the distributions πi are row vectors and the πn+1=πnP is a transposed version of the usual matrix-vector multiplication.
A stationary distribution π is a distribution on X such that π=πP. It is a vector associated with an eigenvalue of 1.
Iterated Expectation
Markov Chains
Absorption Probabilities
Hitting Times