PMF
Suppose that a sample of size $n$ is to be chosen randomly (without replacement) from an urn containing $N$ balls, of which $m$ are white and $N-m$ are black. If we let $X$ denote the number of white balls selected, then $$f(x; N, m, n) = Pr(X = x) = {{mchoose x}{N-mchoose n-x}over {Nchoose n}}$$ for $x= 0, 1, 2, cdots, n$.
Proof:
This is essentially the Vandermonde's identity: $${m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}$$ where $m$, $n$, $k$, $rin mathbb{N}_0$. Because $$ egin{align*} sum_{r=0}^{m+n}{m+nchoose r}x^r &= (1+x)^{m+n} quadquadquadquadquadquadquadquad mbox{(binomial theorem)}\ &= (1+x)^m(1+x)^n\ &= left(sum_{i=0}^{m}{mchoose i}x^{i} ight)left(sum_{j=0}^{n}{nchoose j}x^{j} ight)\ &= sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r quadquadmbox{(product of two binomials)} end{align*} $$ Using the product of two binomials: $$ egin{eqnarray*} left(sum_{i=0}^{m}a_i x^i ight)left(sum_{j=0}^{n}b_j x^j ight) &=& left(a_0+a_1x+cdots + a_mx^m ight)left(b_0+b_1x+cdots + b_nx^n ight)\ &=& a_0b_0 + a_0b_1x +a_1b_0x +cdots +a_0b_2x^2 + a_1b_1x^2 + a_2b_0x^2 +\ & &cdots + a_mb_nx^{m+n}\ &=& sum_{r=0}^{m+n}left(sum_{k=0}^{r}a_{k}b_{r-k} ight)x^{r} end{eqnarray*} $$ Hence $$ egin{eqnarray*} & &sum_{r=0}^{m+n}{m+nchoose r}x^r = sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r\ &implies& {m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}\ & implies& sum_{k=0}^{r}{{mchoose k}{nchoose r-k}over {m+nchoose r}} = 1 end{eqnarray*} $$
Mean
The expected value is $$mu = E[X] = {nmover N}$$
Proof:
$$ egin{eqnarray*} E[X^k] &=& sum_{x=0}^{n}x^kf(x; N, m, n)\ &=& sum_{x=0}^{n}x^k{{mchoose x}{N-mchoose n-x}over {Nchoose n}}\ &=& {nmover N}sum_{x=0}^{n} x^{k-1} {{m-1 choose x-1}{N-mchoose n-x}over {N-1 choose n-1}}\ & & (mbox{identities:} x{mchoose x} = m{m-1choose x-1}, n{Nchoose n} = N{N-1choose n-1})\ &=& {nmover N}sum_{x=0}^{n} (y+1)^{k-1} {{m-1 choose y}{(N-1) - (m - 1)choose (n-1)-y}over {N-1 choose n-1}}quadquad(mbox{setting} y=x-1)\ &=& {nmover N}Eleft[(Y+1)^{k-1} ight] quadquadquad quadquad quadquadquadquad (mbox{since} Ysim g(y; m-1, n-1, N-1)) end{eqnarray*} $$ Hence, setting $k=1$ we have $$E[X] = {nmover N}$$ Note that this follows the mean of the binomial distribution $mu = np$, where $p = {mover N}$.
Variance
The variance is $$sigma^2 = mbox{Var}(X) = np(1-p)left(1 - {n-1 over N-1} ight)$$ where $p = {mover N}$.
Proof:
$$ egin{align*} E[X^2] &= {nmover N}E[Y+1] quadquadquad quadquadquad quad (mbox{setting} k=2)\ &= {nmover N}left(E[Y] + 1 ight)\ & = {nmover N}left[{(n-1) (m-1) over N-1}+1 ight] end{align*} $$ Hence the variance is $$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= {mnover N}left[{(n-1) (m-1) over N-1}+1 - {nmover N} ight]\ &= np left[ (n-1) cdot {pN-1over N-1}+1-np ight] quadquad quad quad quadquad(mbox{setting} p={mover N})\ &= npleft[(n-1)cdot {p(N-1) + p -1 over N-1} + 1 -np ight]\ &= npleft[(n-1)p + (n-1)cdot{p-1 over N-1} + 1-np ight]\ &= npleft[1-p - (1-p)cdot {n-1over N-1} ight] \ &= np(1-p)left(1 - {n-1 over N-1} ight) end{align*} $$ Note that it is approximately equal to 1 when $N$ is sufficient large (i.e. ${n-1over N-1} ightarrow 0$ when $N ightarrow +infty$). And then it is the same as the variance of the binomial distribution $sigma^2 = np(1-p)$, where $p = {mover N}$.
Examples
1. At a lotto game, seven balls are drawn randomly from an urn containing 37 balls numbered from 0 to 36. Calculate the probability $P$ of having exactly $k$ balls with an even number for $k=0, 1, cdots, 7$.
Solution:
$$P(X = k) = {{19choose k}{18choose 7-k}over {37 choose 7}}$$
p = NA; k = 0:7 for (i in k){ + p[i+1] = round(choose(19, i) * choose(18, 7-i) + / choose(37, 7), 3) + } p # [1] 0.003 0.034 0.142 0.288 0.307 0.173 0.047 0.005
2. Determine the same probabilities as in the previous problem, this time using the normal approximation.
Solution:
The mean is $$mu = {nmover N} = {7 imes19over 37} = 3.594595$$ and the standard deviation is $$sigma = sqrt{{nmover N}left(1-{mover N} ight)left(1 - {n-1over N-1} ight)} = sqrt{{7 imes19over 37}left(1 - {19over 37} ight) left(1 - {7-1over 37-1} ight)} = 1.207174$$ The probability of normal approximation is
p = NA; k = 0:7 mu = 7 * 19 / 37 s = sqrt(7 * 19 / 37 * (1 - 19/37) * (1 - 6/36)) for (i in k){ + p[i+1] = round(dnorm(i, mu, s), 3) + } p # [1] 0.004 0.033 0.138 0.293 0.312 0.168 0.045 0.006
Reference
- Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
- Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 11. ISBN: 978-87-7681-409-0.