PMF
Suppose there is a sequence of independent Bernoulli trials, each trial having two potential outcomes called "success" and "failure". In each trial the probability of success is $p$ and of failure is $(1-p)$. We are observing this sequence until a predefined number $r$ of failures has occurred. Then the random number of successes we have seen, $X$, will have the negative binomial (or Pascal) distribution: $$f(x; r, p) = Pr(X=x) = {x + r-1choose x}p^{x}(1-p)^{r}$$ for $x = 0, 1, 2, cdots$.
Proof:
$$ egin{align*} sum_{x =0}^{infty}P(X = x) &= sum_{x= 0}^{infty} {x + r-1choose x}p^{x}(1-p)^{r}\ &= (1-p)^{r}sum_{x=0}^{infty} (-1)^{x}{-rchoose x}p^{x};;quadquad (mbox{identity} (-1)^{x}{-rchoose x}= {x+r-1choose x})\ &= (1-p)^r(1-p)^{-r};;quadquadquadquadquadquad (mbox{binomial theorem})\ &= 1 end{align*} $$ Using the identity $(-1)^{x}{-rchoose x}= {x+r-1choose x}$: $$ egin{align*} {x+r-1choose x} &= {(x+r-1)!over x!(r-1)!}\ &= {(x+r-1)(x+r-2) cdots rover x!}\ &= (-1)^{x}{(-r-(x-1))(-r-(x-2))cdots(-r)over x!}\ &= (-1)^{x}{(-r)(-r-1)cdots(-r-(x-1))over x!}\ &= (-1)^{x}{(-r)(-r-1)cdots(-r-(x-1))(-r-x)!over x!(-r-x)!}\ &=(-1)^{x}{-rchoose x} end{align*} $$
Mean
The expected value is $$mu = E[X] = {rpover 1-p}$$
Proof:
$$ egin{align*} E[X] &= sum_{x=0}^{infty}xf(x; r, p)\ &= sum_{x=0}^{infty}x{x + r-1choose x}p^{x}(1-p)^{r}\ &=sum_{x=1}^{infty}{(x+r-1)!over(r-1)!(x-1)!}p^{x}(1-p)^{r}\ &=sum_{x=1}^{infty}r{(x+r-1)!over r(r-1)!(x-1)!}p^{x}(1-p)^{r}\ &= {rpover 1-p}sum_{x=1}^{infty}{x + r-1choose x-1}p^{x-1}(1-p)^{r+1}\ &={rpover 1-p}sum_{y=0}^{infty}{y+(r+1)-1choose y}p^{y}(1-p)^{r+1}quadquadquad mbox{setting} y= x-1\ &= {rpover 1-p} end{align*} $$ where the last summation follows $Ysimmbox{NB}(r+1; p)$.
Variance
The variance is $$sigma^2 = mbox{Var}(X) = {rpover(1-p)^2}$$
Proof:
$$ egin{align*} Eleft[X^2 ight] &= sum_{x=0}^{infty}x^2f(x; r, p)\ &= sum_{x=0}^{infty}x^2{x + r-1choose x}p^{x}(1-p)^{r}\ &=sum_{x=1}^{infty}x{(x+r-1)!over(r-1)!(x-1)!}p^{x}(1-p)^{r}\ &=sum_{x=1}^{infty}rx{(x+r-1)!over r(r-1)!(x-1)!}p^{x}(1-p)^{r}\ &= {rpover 1-p}sum_{x=1}^{infty}x{x + r-1choose x-1}p^{x-1}(1-p)^{r+1}\ &={rpover 1-p}sum_{y=0}^{infty}(y+1){y+(r+1)-1choose y}p^{y}(1-p)^{r+1}quadquadquad (mbox{setting} y= x-1)\ &= {rpover 1-p}left(sum_{y=0}^{infty}y{y+(r+1)-1choose y}p^{y}(1-p)^{r+1}+sum_{y=0}^{infty}{y+(r+1)-1choose y}p^{y}(1-p)^{r+1} ight)\ &= {rpover 1-p}left({(r+1)pover 1-p} + 1 ight)quadquadquadquadquadquad(Ysimmbox{NB}(r+1; p), E[Y] = {(r+1)pover1-p})\ &= {rpover 1-p}cdot{rp+1over 1-p} end{align*} $$ Thus the variance is $$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= {rpover 1-p}cdot{rp+1over 1-p}- left({rpover 1-p} ight)^2\ &= {rpover 1-p}left({rp+1over 1-p} - {rpover 1-p} ight)\ &= {rpover(1-p)^2} end{align*} $$
Examples
1. Find the expected value and the variance of the number of times one must throw a die until the outcome 1 has occurred 4 times.
Solution:
Let $X$ be the number of times and $Y$ be the number of success in the trials. Obviously, we have $X = Y+4$. Then the problem can be rewritten as ``the expected value and the variance of the number of times one must throw a die until the outcome 1 has NOT occurred 4 times''. That is, $r = 4$, $p = {5over 6}$ and $Ysimmbox{NB}(r; p)$. Thus $$E[X] = E[Y+4]= E[Y] + 4 = {rpover 1-p}+4 = 24$$ $$mbox{Var}(X) = mbox{Var}(Y+4) = mbox{Var}(Y) = {rpover(1-p)^2}= 120$$
Reference
- Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
- Chen, H. Advanced Statistical Inference. Class Notes. PDF