Analysis of variance

One way ANOVA

This deals with situations where observations can be partitioned into a number of groups. In this module, we’ll only consider the case where \(k\) groups are of equal size, i.e. \(m\) observations per group.

Letting \(Y_{ij}\) denote the \(j\)-th observation in group \(i\), the one-way ANOVA model asserts that

\[\mathbb{E}[Y_{ij}]=\mu_i,\qquad \mathrm{Var}(Y_{ij})=\sigma^2,\qquad i=1,\ldots k, \quad j=1,\ldots,m, Y_{ij}\; \mathrm{ uncorrelated}.\]

The usual one way ANOVA test is to test \(H_0:\mu_1=\mu_2=\ldots=\mu_k\) against \(H_1:\) not all means are equal.

This is done by constructing an ANOVA table:

Source	SS	DF	MS	F-ratio
Between groups	\(m\sum\limits_{i=1}^k (Y_{i\bullet}-Y_{\bullet\bullet})^2\)	\(k-1\)	\(MS_{GROUPS}=\frac{BETWEEN\;SS}{k-1}\)	\(MS_{GROUPS}/MS_{ERROR}\)
Within groups OR Error OR Residual	\(\sum\limits_{i=1}^k\sum\limits_{j=1}^m (Y_{ij}-Y_{i\bullet})^2\)	\(k(m-1)\)	\(MS_{ERROR}=\frac{WITHIN\;SS}{k(m-1)}\)
Total (corr)	\(\sum\limits_{i=1}^k\sum\limits_{j=1}^m (Y_{ij}-Y_{\bullet\bullet})^2\)	\(mk-1\)

Recall that the dots denote averaging over the respective subscript. To test \(H_0\) as stated above, reject \(H_0\) if the \(F\)-ratio is greater than the appropriate upper percentage point of the \(F_{k-1,k(m-1)}\) distribution.

Contrasts

A contrast between group means is defined as a quantity \(\lambda=\sum\limits_{i=1}^k c_i\mu_i\), where \(\sum\limits_{i=1}^k c_i=0\).

Two contrasts are orthogonal if the scalar product of their coefficients is zero. Tests concerning orthogonal contrasts can be carried out independently of each other.

\(\lambda\) can be estimated as \(\hat\lambda=\sum\limits_{i=1}^k c_iY_{i\bullet}\) since the observed group means are unbiased estimators of the population means \(\mu_i\).

We can define the sum of squares associated with a contrast as

\[L(\hat\lambda)=\frac{m\hat\lambda^2}{\sum\limits_{i=1}^k c_i^2}.\]

If we can find \(k-1\) mutually orthogonal contrasts, then

\[L(\hat\lambda_1)+L(\hat\lambda_2)+\ldots+L(\hat\lambda_{k-1})=\text{BETWEEN GROUPS SS}.\]

Two-way ANOVA

The natural extension of the one-way ANOVA situation is to classify observations into groups in two different ways: rows and columns. We’ll only consider the case where group sizes are equal.

Letting \(Y_{ijk}\) denote the \(k\)-th observation in row \(i\), column \(j\), the two-way ANOVA model asserts that

\[\mathbb{E}[Y_{ijk}]=\mu+\alpha_i+\beta_j+\gamma_{ij},\qquad\mathrm{Var}(Y_{ijk})=\sigma^2,\]

\[i=1,\ldots r, \quad j=1,\ldots,c,\quad k=1,\ldots,m, \qquad Y_{ijk}\; \mathrm{ uncorrelated}.\]

Here, \(\mu\) denotes a grand mean, \(\alpha_i\) the effect of the i-th level of the row factor, \(\beta_j\) the effect of the j-th level of the column factor, \(\gamma_{ij}\) the interaction effect of combining the i-th level of the row factor with the j-th level of the column factor.

In order for the coefficients to be uniquely determinable, we impose extra conditions:

\[\sum\limits_{i=1}^r\alpha_i=0,\quad \sum\limits_{j=1}^c\beta_j=0,\]

\[\gamma_{i1}+\gamma_{i2}+\ldots+\gamma_{ic}=0,\quad \gamma_{1j}+\gamma_{2j}+\ldots+\gamma_{rj}=0.\]

We can derive that the unbiased estimators of \(\mu,\alpha_i,\beta_j,\gamma_{ij}\) are respectively \(Y_{\bullet\bullet\bullet},\;Y_{i\bullet\bullet}-Y_{\bullet\bullet\bullet},\;Y_{\bullet j\bullet}-Y_{\bullet\bullet\bullet}\) and \(Y_{ij\bullet}-Y_{i\bullet\bullet}-Y_{\bullet j \bullet}+Y_{\bullet\bullet\bullet}\).

We can test whether the row effects, column effects or interaction effects are zero by constructing a two-way ANOVA table:

Source	SS	DF	MS	F-ratio
Row factor	\(mc\sum\limits_{i=1}^r (Y_{i\bullet\bullet}-Y_{\bullet\bullet\bullet})^2\)	\(r-1\)	\(MS_{ROW}=\frac{ROW\;SS}{r-1}\)	\(MS_{ROW}/MS_{ERROR}\)
Column factor	\(mr\sum\limits_{j=1}^c (Y_{\bullet j \bullet}-Y_{\bullet\bullet\bullet})^2\)	\(c-1\)	\(MS_{COL}=\frac{COL\;SS}{c-1}\)	\(MS_{COL}/MS_{ERROR}\)
Interaction (Row \(\times\) Column)	\(m\sum\limits_{i=1}^r\sum\limits_{j=1}^c (Y_{i j \bullet}-Y_{i\bullet\bullet}-Y_{\bullet j\bullet}+Y_{\bullet\bullet\bullet})^2\)	\((r-1)(c-1)\)	\(MS_{R\times C}=\frac{R\times C\;SS}{(r-1)(c-1)}\)	\(MS_{R\times C}/MS_{ERROR}\)
Within groups OR Error OR Residual	\(\sum\limits_{i=1}^r\sum\limits_{j=1}^c\sum\limits_{k=1}^m (Y_{ijk}-Y_{ij\bullet})^2\)	\(rc(m-1)\)	\(MS_{ERROR}=\frac{ERROR\;SS}{rc(m-1)}\)
Total (corr)	\(\sum\limits_{i=1}^r\sum\limits_{j=1}^c\sum\limits_{k=1}^m (Y_{ijk}-Y_{\bullet\bullet\bullet})^2\)	\(mcr-1\)

The F-statistics test various hypotheses (are the \(\alpha_i,\beta_j,\gamma_{ij}\) all zero?) and are distributed as \(F\) distributions. The numerator degrees of freedom come from each effect. The denominator degrees of freedom are the degrees of freedom associated with the Error/Residual/Within Groups SS.

Slides

Click here for the slides (.ppt) used in the lecture with an example of One-Way ANOVA.