PMF
Suppose that independent trials, each having a probability $p$, $0 < p < 1$, of being a success, are performed until a success occurs. If we let $X$ equal the number of failures required, then the geometric distribution mass function is $$f(x; p) =Pr(X=x) = (1-p)^{x}p$$ for $x=0, 1, 2, cdots$.
Proof:
$$ egin{align*} sum_{x=0}^{infty}f(x; p) &= sum_{x=0}^{infty}(1-p)^{x}p\ &= psum_{x=0}^{infty}(1-p)^{x}\ & = pcdot {1over 1-(1-p)}\ & = 1 end{align*} $$
Mean
The expected value is $$mu = E[X] = {1-pover p}$$
Proof:
Firstly, we know that $$sum_{x=0}^{infty}p^x = {1over 1-p}$$ where $0 < p < 1$. Thus $$ egin{align*} {dover dp}sum_{x=0}^{infty}p^x &= sum_{x=1}^{infty}xp^{x-1}\ &= {1over(1-p)^2} end{align*} $$ The expected value is $$ egin{align*} E[X] &= sum_{x=0}^{infty}x(1-p)^{x}p\ &=p(1-p)sum_{x=1}^{infty}x(1-p)^{x-1}\ &= p(1-p){1over(1-(1-p))^2}\ &= {1-pover p} end{align*} $$
Variance
The variance is $$sigma^2 = mbox{Var}(X) = {1-pover p^2}$$
Proof:
$$ egin{align*} Eleft[X^2 ight] &=sum_{x=0}^{infty}x^2(1-p)^{x}p\ &= (1-p)sum_{x=1}^{infty}x^2(1-p)^{x-1}p end{align*} $$ Rewrite the right hand summation as $$ egin{align*} sum_{x=1}^{infty} x^2(1-p)^{x-1}p&= sum_{x=1}^{infty} (x-1+1)^2(1-p)^{x-1}p\ &= sum_{x=1}^{infty} (x-1)^2(1-p)^{x-1}p + sum_{x=1}^{infty} 2(x-1)(1-p)^{x-1}p + sum_{x=1}^{infty} (1-p)^{x-1}p\ &= Eleft[X^2 ight] + 2E[X] + 1\ &= Eleft[X^2 ight] + {2-pover p} end{align*} $$ Thus $$Eleft[X^2 ight] = (1-p)Eleft[X^2 ight] + {(1-p)(2-p) over p}$$ That is $$Eleft[X^2 ight]= {(1-p)(2-p)over p^2}$$ So the variance is $$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= {(1-p)(2-p)over p^2} - {(1-p)^2over p^2}\ &= {1-pover p^2} end{align*} $$
Examples
1. Let $X$ be geometrically distributed with probability parameter $p={1over2}$. Determine the expected value $mu$, the standard deviation $sigma$, and the probability $Pleft(|X-mu| geq 2sigma ight)$. Compare with Chebyshev's Inequality.
Solution:
The geometric distribution mass function is $$f(x; p) = (1-p)^{x}p, x=0, 1, 2, cdots$$ The expected value is $$mu = {1-pover p} = 1$$ The standard deviation is $$sigma = sqrt{1-pover p^2} = 1.414214$$ The probability that $X$ takes a value more than two standard deviations from $mu$ is $$Pleft(|X-1| geq 2.828428 ight) = P(Xgeq 4) = 0.0625$$ R code:
1 - sum(dgeom(c(0:3), 1/2)) # [1] 0.0625
Chebyshev's Inequality gives the weaker estimation $$Pleft(|X - mu| geq 2sigma ight) leq {1over4} = 0.25$$
2. A die is thrown until one gets a 6. Let $V$ be the number of throws used. What is the expected value of $V$? What is the variance of $V$?
Solution:
The PMF of geometric distribution is $$f(x; p) = (1-p)^xp, = 0, 1, 2, cdots$$ where $p = {1over 6}$. Let $X = V-1$, so the expected value of $V$ is $$ egin{align*} E[V] &= E[X+1]\ &= E[X] + 1\ &= {1-pover p} + 1\ &= {1-{1over6} over {1over6}} + 1\ &= 6 end{align*} $$ The variance of $V$ is $$ egin{align*} mbox{Var}(V) &= mbox{Var}(X+1)\ &= mbox{Var}(X)\ &= {1-pover p^2}\ &= {1-{1over 6} over left({1over6} ight)^2}\ &= 30 end{align*} $$ Note that this is another form of the geometric distribution which is so-called the shifted geometric distribution (i.e. $X$ equals to the number of trials required). By the above process we can see that the expected value of the shifted geometric distribution is $$mu = {1over p}$$ and the variance of the shifted geometric distribution is $$sigma^2 = {1-pover p^2}$$
3. Assume $W$ is geometrically distributed with probability parameter $p$. What is $P(W < n)$?
Solution:
$$ egin{align*} P(W < n) &= 1 - P(W geq n)\ &= 1-(1-p)^n end{align*} $$
4. In order to test whether a given die is fair, it is thrown until a 6 appears, and the number $n$ of throws is counted. How great should $n$ be before we can reject the null hypothesis $$H_0: mbox{the die is fair}$$ against the alternative hypothesis $$H_1: mbox{the probability of having a 6 is less than 1/6}$$ at significance level $5\%$?
Solution:
The probability of having to use at least $n$ throws given $H_0$ (i.e. the significance probability) is $$P = left(1 - {1over 6} ight) ^n$$ We will reject $H_0$ if $P < 0.05$. R code:
n = 1 while (n > 0){ + p = (5/6) ^ n + if (p < 0.05) break + n = n + 1 + } n # [1] 17
That is, we have to reject $H_0$ if $n$ is at least 17.
Reference
- Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
- Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 5 & 10. ISBN: 978-87-7681-409-0.