• PGM_Foundations


    Chain Rule and Bayesian Rule

    From the definition of the conditional distribution, we see that

    $$P(alpha_1 cap ... cap alpha_k)=P(alpha_1)P(alpha_2 vert alpha_1)...P(alpha_k vert alpha_1 cap ...cap alpha_{k-1})~~(Chain~Rule)$$

    $$P(alpha lvert eta)=frac{P(eta lvert alpha)P(alpha)}{P(eta)}~~(Bayesian~Rule)$$

    A more general conditional version of Bayes’ rule, where all our probabilities are conditioned on some background event $gamma$, also holds $$P(alpha vert eta cap gamma)=frac{P(eta vert alpha cap gamma)P(alpha cap gamma)}{P(eta cap gamma)}$$

     Independence and Conditional Independence

    Independent Events

    Event $alpha$ is independent of $eta$ in $P$ is denoted by $P vDash alpha  ot eta$. Then $$P vDash alpha  ot eta~if~P(alpha lvert eta)=P(alpha)$$ or $$P vDash alpha  ot eta~if~and~only~if~P(alpha cap eta)=P(alpha)P(eta)$$. The 2 versions are equal, because $P(alpha cap eta)=P(alpha lvert eta)P(eta)$. lets toss a coin, $alpha=the~first~toss~results~in~a~head$, $eta=the~second~toss~results~in~a~head$, this is a case where 2 different physical processes lead independence. $alpha=the~die~outcome~is~even$, $eta=the~die~outcome~is~1~or~2$, this is a case where the same process leads independence(at first sight it seems $alpha$ and $eta$ is dependent, but they are not).

    Conditional Independence 

    While independence is a useful property, it is not often that we encounter two independent events. A more common situation is when two events are independent given an additional event. Event $alpha$ is conditionally independent of $eta$ given $gamma$ in $P$ is denoted by $P vDash (alpha ot eta lvert gamma)$. Then $$P vDash (alpha ot eta lvert gamma)~if~P(alpha lvert eta cap gamma)=P(alpha lvert gamma)$$ or $$P vDash (alpha ot eta lvert gamma)~if~and~only~if~P(alpha cap eta lvert gamma)=P(alpha lvert gamma)P(eta lvert gamma)$$

    Independent Variables

    Until now, we have focused on independence between events. Thus, we can say that two events, such as one toss landing heads and a second also landing heads, are independent. However, we would like to say that any pair of outcomes of the coin tosses is independent. To capture such statements, we can examine the generalization of independence to sets of random variables. Let $X,Y,Z$ be 3 random variables, we say $X$ is conditionally independent of $Y$ given $Z$ in $P$ if $P$ satisfies ($X=x ot Y=y lvert Z=z$) for all states $xin X~,~y in Y~,~z in Z$. Some properties hold for conditional independence:

    (1) Decomposition: $(X ot Y,W lvert Z)Rightarrow (X ot Y lvert Z)$

    PROOF: IF $(X ot Y,W lvert Z)$, THEN $P(X,Y,W lvert Z)=P(X lvert Z)P(Y,W lvert Z)$.

    Therefore, $P(X,Y lvert Z)=sum_w P(X,Y,w lvert Z)=sum_w P(X lvert Z)P(Y,w lvert Z)=P(X lvert Z)sum_w P(Y,w lvert Z)=P(X lvert Z)P(Y lvert Z)$

    (2) Weak Union: $(Xot Y,W lvert Z)Rightarrow (Xot Y lvert Z,W)$

    PROOF: IF $(X ot Y,W lvert Z)$, THEN $P(X,Y,W lvert Z)=P(X lvert Z)P(Y,W lvert Z)$. Therefore,

    $P(X,Y lvert Z,W)=P(X lvert Y,W,Z)P(Y lvert Z,W)~(Bayesian Rule)$

    $=P(X lvert Z)P(Y lvert Z,W)~(X ot Y,W lvert Z)$

    $=P(X lvert Z,W)P(Y lvert Z,W)~(Decomposition: X ot W lvert Z)$

    (3) Contraction: $(Xot W lvert Z,Y) & (X ot Y lvert Z) Rightarrow (X ot Y,W lvert Z)$

    PROOF: $P(X,Y,W lvert Z)=P(X lvert Y,W,Z)P(Y,W lvert Z)~(Bayesian Rule)$

    $=P(X lvert Y,Z)P(Y,W lvert Z)~(Xot W lvert Z,Y)$

    $=P(X lvert Z)P(Y,W lvert Z)~(X ot Y lvert Z)$

    (4) Intersection: For positive distributions(probability values are all positive), and for mutually disjoint sets $X,Y,Z,W$: $(X ot Y lvert Z,W) & (X ot W lvert Z,Y) Rightarrow (X ot Y,W lvert Z)$.

    PROOF: $P(X,Y,W lvert Z)=P(X lvert Y,W,Z)P(Y,W lvert Z)~(Bayesian Rule)$

    $=P(X lvert W,Z)P(Y,W lvert Z)$

    Querying a Distribution

    Our focus throughout this book is on using a joint probability distribution over multiple random variables to answer queries of interest.

    Probability Queries

    The evidence: a subset $E$ of random variables in the model, and an instantiation $e$ to these variables;

    the query variables: a subset $Y$ of random variables in the network.

    Our task is to compute $P(Y lvert E=e)$, the posterior probability distribution over the values $y$ of $Y$ , conditioned on the fact that $E=e$.

    MAP Queries

    The aim is to find the MAP assignment(the most likely assignment) to all of the non-evidence variables $W$ where $W=mathcal{X}-E$, $E$  is the evidence variables. The task is to find the most likely assignment to the variables in $W$ given the evidence $E = e$: $$MAP(W lvert e)=mathop{argmax}limits_{w} P(w,e)$$

    Marginal MAP Queries

    In the marginal MAP query, we have a subset of variables $W$ that forms our query. The task is to find the most likely assignment to the variables in $W$ given the evidence $E = e$: $$MAP(W lvert e)=mathop{argmax}limits_{w}P(w lvert e)$$. If we let $Z=mathcal{X}-Y-E$, the marginal MAP task is to compute: $$MAP(Y lvert e)=mathop{argmax}limits_{w} sum_{Z}P(Y,Z lvert e)$$

     Graphs

    Nodes and Edges

    For directed graphs, there are notions like child and parent of a node $X$, indgree and outdegree of a node $X$.

    For undirected graphs, there are notions like neiogbors of $X$.

    For both graphs, there are notions like boundary of a node $X$ which is Pa($X$) for directed graphs and Nb($X$) of undirected graphs, degree of a graph which is the maximal degree of a node in the graph.

    Subgraphs

    Clique(also called complete subgraph) and Maximal Clique:

    Upward Closure: For a subset of nodes $X$, If $forall x in X~,~Boundry_x~subset X$, we say $X$ is upwardly closed in $mathcal{K}$. We define the upward closure of $X$ to be the minimal upwardly closed subet $Y$ that contains $X$. We define the upwardly closed subgraph of $X$, denoted $mathcal{K}^{+}[X]$, to be the induced subgraph over $Y$, $mathcal{K}[Y]$.

    For example, the set A, B, C, D, E is the upward closure of the set {C} in K. The upwardly closed subgraph of {C} is shown in figure 2.4b. The upwardly closed subgraph of {C, D, I} is shown in figure 2.4c.

    Paths and Trails

    $X_1,...,X_k$ forms a path in $mathcal{K}=(mathcal{X},varepsilon)$, if, for every $i=1,...,k-1$ we have that either $X_i ightarrow X_{i+1}$ or $X_i - X_{i+1}$. A path is directed if, for at least one $i$, we have $X_i ightarrow X_{i+1}$ . (Note directed path is not that every edge should be directed.)

    $X_1,...,X_k$ forms a trail in $mathcal{K}=(mathcal{X},varepsilon)$, if, for every $i=1,...,k-1$ we have $X_i ightleftharpoons X_{i+1}$.

    In the graph K of figure 2.3, A, C, D, E, I is a path, and hence also a trail. On the other hand, A, C, F, G, D is a trail, which is not a path.

    connected graph: between every $X_i, X_j$ there is a trail.

    We say that $X$ is an ancestor of $Y$ and $Y$ is a descendant of $X$, if there exists a directed path $X_1,..., X_k$ with $X_1=X~,~X_k=Y$. F, G, I are descendants of C. The ancestors of C are A, via the path A, C, and B, via the path B, E, D, C.

    Cycles and Loops

    A cycle is a directed path $X_1,...,X_k$ where $X_1=X_k$.  A graph is acyclic if it contains no cycles.

    A directed acyclic graph(DAG) is one of the central concepts in this book. An acyclic graph containing both directed and undirected edges is called a partially DAG(PDAG).

    Exercises

    exercise 2.2 

    2.2.1. Show that for binary random variables $X, Y$ , the event-level independence $(x^0 perp y^0)$ implies random variable independence ($Xperp Y$).

    Given: 

    $(x^0 perp y^0) Rightarrow P(x^0 lvert y^0)=P(x^0), P(y^0 lvert x^0)=P(y^0)$

    Find:

    $P(x^1 lvert y^0)=1-P(x^0 lvert y^0)=1-P(x^0)=P(x^1)$

    $P(y^1 lvert x^0)=1-P(y^0 lvert x^0)=1-P(y^0)=P(y^1)$

    $P(x^1 lvert y^1)=1-P(x^0|y^1)=1-frac{P(y^1 lvert x^0)P(x^0)}{P(y^1)}=1-frac{P(y^1)P(x^0)}{P(y^1)}=1-P(x^0)=P(x^1)$

    using Bayes rule we can know  $P(y^0 lvert x^1)=P(y^0)~,~P(x^0 lvert y^1)=P(x^0)~,~P(y^1 lvert x^1)=P(y^1)$

    2.2.2 Show a counterexample for nonbinary variables.

      $x^0$=0.5 $x^1$=0.25 $x^2$=0.25
    $y^0$=0.5 0.25 0.1 0.125
    $y^1$=0.5 0.25 0.125 0.15

    2.2.3 Is it the case that, for a binary-valued variable $Z$, we have that $(X perp Y) lvert z^0$ implies $((X perp Y) lvert Z)$?

    No. The distribution $P(Xcap Y)=P(X)P(Y)$ conditioned on $z^0$ has nothing to do with the independence of $X,Y$ conditioned on $z^1$, so $P(X cap Y)$ is not necessarily equal to $P(X)P(Y)$.

    exercise 2.5

    Let $X,Y,Z$ be 3 disjoint subsets of variables such that $mathcal{X}=Xcup Ycup Z$. Prove that $Pmodels (X perp Y lvert Z)$ if and only if we can write $P$ in the form: $P(mathcal{X}=phi_1(X,Z)phi_2(Y,Z))$ 

  • 相关阅读:
    MATLAB——sigmoid传递函数
    MATLAB——BP神经网络
    MATLAB——神经网络构造线性层函数linearlayer
    MATLAB——线性神经网络
    MTALAB——神经网络mae()、mse()、sse()
    详解 Java 中的三种代理模式!
    HTTP 无状态中的状态到底指的是什么?
    单例模式的 8 种写法,整理非常全!
    数据库连接池到底应该设多大?
    Spring 框架用到的 9 个设计模式汇总!
  • 原文地址:https://www.cnblogs.com/chaseblack/p/6702008.html
Copyright © 2020-2023  润新知