Sufficient statistic - Wikipedia
Sufficient statistic - arizona
定义
统计量是一些随机样本(X_1, X_2, cdots, X_n)的函数
[T = r(X_1, X_2, cdots, X_n).
]
样本(X)的分布(f_{ heta}(X)=f(X; heta))由位置参数( heta)决定, 通常我们通过极大似然估计
[max_{ heta} quad P(X_1,X_2,cdots, X_n ; heta) = prod_{i=1}^n P(X_i; heta) = prod_{i=1}^n f_{ heta}(X_i).
]
而充分统计量是指这样的统计量:
[P({X_i}|T=t; heta) = P({X_i}|T=t),
]
即在给定(T(X)=t)的情况下, ({X_i})的条件联合分布与未知参数( heta)无关.
Example: 考虑伯努利分布, 成功的概率为(p), 失败的概率为(1-p), 有(n)个独立同分布的样本(X_1, X_2,cdots, X_n), 则:
[P({X_i};p) = p^{sum_i X_i}(1-p)^{n-sum_i X_i},
]
实际上(后面会讲到)(T=sum_i^n X_i)为其一充分统计量. 实际上,
[P({X_i}|T=t;p) = frac{P({X_i}, T=t; p)}{P(T=t;p)} = frac{mathbb{I}[{sum_{i}^nX_i=t]}cdot p^t (1-p)^{n-t}}{C_n^t p^t (1-p)^{n-t}}=frac{mathbb{I}[sum_i^n X_i = t]}{C_n^t}.
]
显然与位置参数(p)无关.
充分统计量特别的意义, 比如上面提到的极大似然估计, 由于
[P({X_i}; heta) = P({X_i}, T; heta) = P({X_i}|T; heta) :P(T; heta) = P({X_i}|T) :P(T; heta),
]
由于(P({X_i}|T))与( heta)无关, 所以最大化上式等价于
[max_{ heta} quad P(T; heta) = P(r(X_1, X_2,cdots, X_n); heta).
]
特别地, 有时候标量(T)并不充分, 需要(T=(T_1, T_2,cdots, T_k)) 整体作为充分统计量, 比如当正态分布地(mu, sigma)均为未知参数的时候, (T=(frac{1}{n}sum_i X_i, frac{1}{n-1}sum_i (X_i - ar{X})^2)). 性质和上面的别无二致, 所以下面也不特别说明了.
当置于贝叶斯框架下时, 可以发现:
[P( heta|{X_i}) = frac{P({X_i}, heta)}{P({X_i})}
= frac{P({X_i}, T, heta)}{P({X_i}, T)}
= frac{P({X_i}| T, heta) P(T| heta)}{P({X_i}, T)}
= frac{P({X_i}| T) P(T| heta)}{P({X_i}, T)}
= P( heta|T).
]
即给定({X_i})或者(T), ( heta)的条件(后验)分布是一致的.
特别地, 我们可以用互信息来定义充分统计量, (T)为充分统计量, 当且仅当
[I( heta;X) = I( heta;T(X)).
]
注: 一般情况下(I( heta;X) ge I( heta;T(X))).
充分统计量的判定
用上面的标准来判断充分统计量是非常困难的一件事, 好在有Fisher-Neyman分离定理:
Factorization Theorem: ({X_i})的联合密度函数为(f_{ heta}(X)), 则(T)是关于( heta)的充分统计量当且仅当存在非负函数(g, h)满足
[f(X_1, X_2,cdots, X_n; heta) = h(X_1, X_2,cdots, X_n) g(T; heta).
]
注: (T)可以是(T=(T_1, T_2,cdots, T_k)).
proof:
(Rightarrow)
[p(X_1,X_2,cdots, X_n; heta) = p({X_i}|T; heta) = p({X_i}|T; heta)p(T; heta) = p({X_i}|T)p(T; heta)
]
此时
[g(T; heta) = p(T; heta), \
h(X_1, X_2,cdots, X_n) = p({X_i}|T).
]
(Leftarrow)
为了符号简便, 令(X = {X_1, X_2,cdots, X_n}).
[egin{array}{ll}
p(T=t; heta)
&= int_{T(X)=t} p(X,T=t; heta) mathrm{d}X \
&= int_{T(X)=t} f(X; heta) mathrm{d}X \
&= int_{T(X)=t} h(X) g(T=t; heta) mathrm{d}X \
&= int_{T(X)=t} h(X) mathrm{d}X cdot g(T=t; heta) \
end{array}.
]
则
[egin{array}{ll}
p(X | T=t; heta)
&= frac{p(X,T=t; heta)}{p(T=t; heta)} \
&= frac{p(X; heta)}{p(T=t; heta)} \
&= frac{h(X)g(T=t; heta)}{int_{T(X)=t}h(X)mathrm{d} X cdot g(T=t; heta)} \
&= frac{h(X)}{int_{T(X)=t}h(X)}. \
end{array}
]
与( heta)无关.
注: 上述的证明存疑.
最小统计量
最小统计量S, 即
- S是充分统计量;
- 充分统计量(T), 存在(f), 使得(S=f(T)).
注: 若(T)是充分统计量, 则任意的可逆函数(f)得到的(f(T))也是充分统计量.
例子
(U[0, heta])
均匀分布, 此时
[p(X_1, X_2,cdots, X_n; heta) = frac{1}{ heta^n} mathbb{I}[0le min {X_i}] cdot mathbb{I}[max {X_i} le heta],
]
故
[T = max {X_i}, : g(T; heta) = mathbb{I}[max {X_i} cdot frac{1}{ heta^n}, : h(X) = mathbb{I}[0le min {X_i}].
]
(U[alpha, eta])
[p(X_1, X_2,cdots, X_n;alpha,eta) = frac{1}{(eta - alpha)^n} mathbb{I}[alphale min {X_i}] cdot mathbb{I}[max {X_i} le heta],
]
[T = (min {X_i}, max {X_i}), \
g(T;alpha, eta) = frac{1}{(eta - alpha)^n} mathbb{I}[alphale min {X_i}] cdot mathbb{I}[max {X_i} le heta], \
h(X) = 1.
]
Poisson
[P(X;lambda) = frac{lambda^X e^{-lambda}}{X!}.
]
[p(X_1, X_2,cdots, X_n;lambda) = e^{-nlambda} lambda^{sum_{i}X_i} cdot frac{1}{prod_i X_i!}.
]
[T = sum_iX_i, \
g(T; heta) = e^{-nlambda} cdot lambda^T, \
h(X) = frac{1}{prod_{i} X_i!}.
]
Normal
[P(X;mu,sigma) = frac{1}{sqrt{2pisigma^2}} exp(-frac{(X-mu)^2}{2sigma^2}).
]
[p(X_1, X_2,cdots, X_n;mu, sigma) = (2pisigma^2)^{-frac{n}{2}} exp (-frac{1}{2sigma^2}sum_{i=1}^n (X_i - ar{X})^2) exp(-frac{n}{2sigma^2})(mu-ar{X})^2.
]
若(sigma)已知:
[T=frac{1}{n}sum X_i = ar{X} , \
g(T;mu) = (2pisigma^2)^{-frac{n}{2}} exp(-frac{n}{2sigma^2})(mu-T)^2, \
h(X) = exp (-frac{1}{2sigma^2}sum_{i=1}^n (X_i - ar{X})^2).
]
若(sigma)未知:
[T = (ar{X}, s^2), s^2 = frac{sum_{i=1}^n(X_i-ar{X})^2}{n-1}, \
g(T;mu,sigma) = (2pisigma^2)^{-frac{n}{2}}exp(-frac{n-1}{2sigma^2}s^2) exp(-frac{n}{2sigma^2})(mu-ar{X})^2, \
h(X) = 1.
]
指数分布
[p(X) = frac{1}{lambda} e^{-frac{X}{lambda}}, quad X ge 0.
]
[p(X_1, X_2,cdots, X_n;lambda) = frac{1}{lambda^n} e^{-frac{sum_{i=1}^n X_i}{lambda}}.
]
[T = sum_{i=1}^n X_i, \
g(T;lambda) = frac{1}{lambda^n} e^{-frac{T}{lambda}}, \
h(X) = 1.
]
Gamma
[Gamma(alpha, eta) = frac{1}{Gamma(alpha) eta^{alpha}}X^{alpha-1} e^{-frac{X}{eta}}.
]
[p(X_1, X_2,cdots, X_n;alpha, eta) = frac{1}{(Gamma(alpha) eta^{alpha})^n}(prod_{i} X_i)^{alpha-1} e^{-frac{sum_iX_i}{eta}}.
]
[T = (prod_i X_i, sum_i X_i), \
g(T; heta) = frac{1}{(Gamma(alpha) eta^{alpha})^n}(prod_{i} X_i)^{alpha-1} e^{-frac{sum_iX_i}{eta}}, \
h(X) = 1.
]