Probabilistic Graphical Models
Specifying a joint distribution over many variables is intractable, especially when some random variable has many states, even continuous(of course it can be continuous). Therefore we use PGM which decomposes a complex distribution into smaller structures. For example, the following figure shows one possible graph for medical diagnosis.
There are 2 perspectives to interpret the graph. One is the graph is a representation of a set of independencis, one is the graph is a skeleton that breaks up the distribution into smaller factors, each of which has a smaller space of possibilities, as shown in the figure above. It turns out that the 2 perspectives are equivalent. By factoring the joint distribution, much less parameters are needed to specify the distribution. For example, assume each variable $F$, $H$, $M$, $C$ takes states of yes/no, and $S$ takes 4 states of spring, summer, fall, winter, then the joint distribution needs $2 imes 2 imes 2 imes 2 imes 4-1=63$ nonredundant parameters to be specified(because the sum over all entries muct sum to 1, so when 63 entries are determined, the rest one is fixed), while after factorization, the required number of parameters is 3+4+4+4+2=17 for $P(S),~P(F|S),~P(H|S)$, $P(C|H,F),~P(M|F)$ respectively.
3 components-representation, inference and learning-are critical components in constructing an intelligent system. PGM did all of this 3 perspectives. It declares a graph-based representation that encodes our world. It use the representation to answer queries like $P(F|S,M)$(Inference). It can be learned by combining expert knowledge(like som main dependencies) and accumulated data.
Overview and Roadmap
Chapter 3 | Bayesian Network Representation |
Chapter 4 | Markov Network and its unification with Bayesian Network, Conditional Random Fields |
Chapter 5 | Deeper into the representation of the parameters in PGM |
Chapter 6 | PGM evolving with time |
Chapter 7 | Look into models that have continuous variables |
Chapter 8 | Exponential Family |
Chapter 9 | Exact Inference(Computationally Intractable) |
Chapter 10 | Alternative view of Exact Inference |
Chapter 11 | Approximate Inference(Less cost compared with Exact Inference) |
Chapter 12 | A very different approximate inference method: Particle-based method |
Chapter 13 | |
Chapter 14 | Inference in continuous and hybrid (continuous/discrete) networks |
Chapter 15 | Special-purpose methods for the particular settings of networks that model dynamical systems. |
Chapter 16 | Fundamental concepts underlying the general task of learning models from data |
Chapter 17 | Learning parameters for a Bayesian network with a given structure, from fully observable data |
Chapter 18 | The harder problem of learning both Bayesian network structure and the parameters, still from fully observed data |
Chapter 19 | Bayesian network learning task in a setting where we have access only to partial observations of the relevant variables |
Chapter 20 | Learning Markov networks from data, which is significantly harder than the corresponding problem for Bayesian networks |
Chapter 21 | Causal model |
Chapter 22 | Utility functions |
Chapter 23 | Influence diagrams which extend Bayesian networks by introducing actions and utilities |
A reader's guide