似然和对数似然Likelihood & LogLikelihood

似然和对数似然Likelihood & LogLikelihood
One of the most fundamental concepts of modern statistics is that of likelihood. In each of the discrete random variables we have considered thus far, the distribution depends on one or more parameters that are, in most statistical applications, unknown. In the Poisson distribution, the parameter is λ. In the binomial, the parameter of interest is p (since n is typically fixed and known).

Likelihood is a tool for summarizing the data’s evidence about unknown parameters. Let us denote the unknown parameter(s) of a distribution generically by θ. Since the probability distribution depends on θ, we can make this dependence explicit by writing f(x) as f(x ; θ). For example, in the Bernoulli distribution the parameter is θ =  π , and the distribution is

Once a value of X has been observed, we can plug this observed value x into f(x ; π ) and obtain a function of π only. For example, if we observe X = 1, then plugging x = 1 into (2) gives the function π . If we observe X = 0, the function becomes 1 − π .

Whatever function of the parameter we get when we plug the observed data x into f(x ; θ), we call that function thelikelihood function.

We write the likelihood function as

For example, suppose that X has a Bernoulli distribution with unknown parameter π . We can graph the probability distribution for any fixed value of π  . For example, if π = .5 we get this:

Now suppose that we observe a value of X, say X = 1. Plugging x = 1 into the distribution

For discrete random variables, a graph of the probability distribution f(x ; θ) has spikes at specific values of x, whereas a graph of the likelihood L(θ ; x) is a continuous curve (e.g. a line) over the parameter space, the domain of possible values for θ.

L(θ ; x) summarizes the evidence about θ contained in the event X = x. L(θ ; x) is high for values of θ that make X =x more likely, and small for values of θ that make X = x unlikely. In the Bernoulli example, observing X = 1 gives some (albeit weak) evidence that π  is nearer to 1 than to 0, so the likelihood for x = 1 rises as p moves from 0 to 1.

For example, if we observe

$L (π | x) = \frac{n!}{(n - x)! x!} π^{x} (1 - π)^{n - x} .$

Any multiplicative constant which does not depend on

$L (π | x) \propto π^{x} (1 - π)^{n - x} .$

Loglikelihood

In most cases, for various reasons, but often computational convenience, we work with the loglikelihood

$l (θ | x) = \log L (θ | x)$

$l (π | x) = x \log π + (n - x) \log (1 - π) .$

In many problems of interest, we will derive our loglikelihood from a sample rather than from a single observation. If we observe an independent sample

$\begin{array}{rcl} L (θ | x) & = & \prod_{i = 1}^{n} f (x_{i} | θ) \\ = & \prod_{i = 1}^{n} L (θ | x_{i}) \end{array}$

$\begin{array}{rcl} l (θ | x) & = & log \prod_{i = 1}^{n} f (x_{i} | θ) \\ = & \sum_{i = 1}^{n} log f (x_{i} | θ) = \sum_{i = 1}^{n} l (θ | x_{i}) . \end{array}$

In regular problems, as the total sample size
- it becomes more sharply peaked around its maximum, and
- its shape becomes nearly quadratic (i.e. a parabola, if there is a single parameter).
This is important since the tests such as Wald test based on

Transformations may help us to improve the shape of loglikelihood. More on this in Section 1.6 on Alternative Parametrizations. Next we will see how we use the likelihood, that is the corresponding loglikelihood, to estimate the most likely value of the unknown parameter of interest.

from: https://onlinecourses.science.psu.edu/stat504/node/27
相关阅读:
pe文件结构
 dll
术语
 创建内存映射文件
 函数的调用约定
 串口
 linux 之 tcpdump
linux 之程序管理
 perl 之eval
2020-10-27_组合快捷键
原文地址：https://www.cnblogs.com/GarfieldEr007/p/5232140.html

似然和对数似然Likelihood & LogLikelihood

Loglikelihood