Discrete inference

The Binomial distribution

Hypothesis tests

If we observe \(r\) successes in \(n\) Bernoulli trials, then typically these concern testing the hypotheses \(H_0: p=p_0\) vs \(H_1: p>p_0\) for some specified probability of success \(p_0\).

Assuming \(H_0\) is true, we calculate \(p_0^+=P(R\geq r)\). Then if \(p_0^+<\alpha\), we reject \(H_0\) at the \(100\alpha\) percent level.

If our alternate hypothesis is instead \(H_1: p < p_0\), calculate \(p_0^-=P(R\leq r)\). For \(H_1: p \neq p_0\), calculate \(p_0=2\mathrm{min}(p_0^+,p_0^-)\) and interpret in the usual fashion.

Approximations

Let \(R\sim Bin(n,p)\).

For large \(n\) and \(p\) not too close to 0 or 1, \[R\to N(np,np(1-p)),\quad n\to\infty.\] For large \(n\) and \(p\) small, \[R\to Po(np),\quad n\to\infty,\;p\to0.\]

Confidence intervals for \(p\) (large \(n\))

The sample proportion \(\hat{p}=R/n\) is an unbiased estimator of \(p\), and has estimated standard error \(ESE(\hat{p})=\sqrt{\hat{p}(1-\hat{p})/n}.\)

It follows that a \(100(1-2\alpha)\) percent confidence interval for \(p\) is given by \(\hat{p}\pm z _\alpha ESE(\hat{p})\), where \(z_\alpha\) is the appropriate percentage point from the standard Normal tables.

The Poisson distribution

Hypothesis tests

If events occur randomly at a constant rate \(\lambda\), then the number of events in time \(t\), denoted \(N(t)\), is distributed as \(Po(\lambda t)\)

Suppose we observe \(n\) events in time \(t\) and want to test \(H_0:\lambda=\lambda_0\) vs \(H_1:\lambda>\lambda_0\) for some specified value \(\lambda_0\).

Assuming \(H_0\) is true, we calculate \(p_0^+=P(N(t)\geq n)\). Then if \(p_0^+<\alpha\), we reject \(H_0\) at the \(100\alpha\) percent level.

For other alternate hypotheses, modify in an analogous way to how we did above with the Binomial distribution.

Approximations

For large \(\mathbb{E}[N(t)]\), \(N(t)\) is approximately \(N(m,m)\) as \(m\to\infty\), where \(m\) is the mean of the Poisson distribution concerned.

Confidence intervals for \(\lambda\) (large \(n\))

\(\hat{\lambda}=n/t\) is an unbiased estimator of \(\lambda\), and has estimated standard error \(ESE(\hat{\lambda})=\sqrt{\hat{\lambda}/t}.\)

It follows that a \(100(1-2\alpha)\) percent confidence interval for \(\lambda\) is given by \(\hat{\lambda}\pm z _\alpha ESE(\hat{\lambda})\), where \(z _\alpha\) is the appropriate percentage point from the standard Normal tables.

The \(\chi^2\) test

The chi-square test allows us to test a theory by comparing observed numbers with expected numbers. Could any observed discrepancies from the theory be reasonably put down to chance? From a random sample, observe the number \(O_i\) falling into the \(i\)-th of \(k\) categories. Calculate \(E_i\), the expected numbers in each category \(i\), assuming the null hypothesis. Calculate the Pearson \(\chi^2\) statistic: \[\chi^2=\sum\limits_{i=1}^k \frac{(O_i-E_i)^2}{E_i}.\] For a large sample size and assuming the null hypothesis is true, then the \(\chi^2\) statistic has an approximate chi-squared distribution with \((k-1-p)\) degrees of freedom, where \(p\) is the number of parameters that have been estimated from the data, i.e. \(\chi^2\sim\chi^2_{k-1-p}\).