Analysis of variance

One way ANOVA

This deals with situations where observations can be partitioned into a number of groups. In this module, we’ll only consider the case where \(k\) groups are of equal size, i.e. \(m\) observations per group.

Letting \(Y_{ij}\) denote the \(j\)-th observation in group \(i\), the one-way ANOVA model asserts that

\[\mathbb{E}[Y_{ij}]=\mu_i,\qquad \mathrm{Var}(Y_{ij})=\sigma^2,\qquad i=1,\ldots k, \quad j=1,\ldots,m, Y_{ij}\; \mathrm{ uncorrelated}.\]

The usual one way ANOVA test is to test \(H_0:\mu_1=\mu_2=\ldots=\mu_k\) against \(H_1:\) not all means are equal.

This is done by constructing an ANOVA table:

Source

SS

DF

MS

F-ratio

Between groups

\(m\sum\limits_{i=1}^k (Y_{i\bullet}-Y_{\bullet\bullet})^2\)

\(k-1\)

\(MS_{GROUPS}=\frac{BETWEEN\;SS}{k-1}\)

\(MS_{GROUPS}/MS_{ERROR}\)

Within groups OR Error OR Residual

\(\sum\limits_{i=1}^k\sum\limits_{j=1}^m (Y_{ij}-Y_{i\bullet})^2\)

\(k(m-1)\)

\(MS_{ERROR}=\frac{WITHIN\;SS}{k(m-1)}\)

Total (corr)

\(\sum\limits_{i=1}^k\sum\limits_{j=1}^m (Y_{ij}-Y_{\bullet\bullet})^2\)

\(mk-1\)

Recall that the dots denote averaging over the respective subscript. To test \(H_0\) as stated above, reject \(H_0\) if the \(F\)-ratio is greater than the appropriate upper percentage point of the \(F_{k-1,k(m-1)}\) distribution.

Contrasts

A contrast between group means is defined as a quantity \(\lambda=\sum\limits_{i=1}^k c_i\mu_i\), where \(\sum\limits_{i=1}^k c_i=0\).

Two contrasts are orthogonal if the scalar product of their coefficients is zero. Tests concerning orthogonal contrasts can be carried out independently of each other.

\(\lambda\) can be estimated as \(\hat\lambda=\sum\limits_{i=1}^k c_iY_{i\bullet}\) since the observed group means are unbiased estimators of the population means \(\mu_i\).

We can define the sum of squares associated with a contrast as

\[L(\hat\lambda)=\frac{m\hat\lambda^2}{\sum\limits_{i=1}^k c_i^2}.\]

If we can find \(k-1\) mutually orthogonal contrasts, then

\[L(\hat\lambda_1)+L(\hat\lambda_2)+\ldots+L(\hat\lambda_{k-1})=\text{BETWEEN GROUPS SS}.\]

Two-way ANOVA

The natural extension of the one-way ANOVA situation is to classify observations into groups in two different ways: rows and columns. We’ll only consider when group sizes are equal.

Letting \(Y_{ijk}\) denote the \(k\)-th observation in row \(i\), column \(j\), the two-way ANOVA model asserts that

\[\mathbb{E}[Y_{ijk}]=\mu+\alpha_i+\beta_j+\gamma_{ij},\qquad\mathrm{Var}(Y_{ijk})=\sigma^2,\]
\[i=1,\ldots r, \quad j=1,\ldots,c,\quad k=1,\ldots,m, \qquad Y_{ijk}\; \mathrm{ uncorrelated}.\]

Here, \(\mu\) denotes a grand mean, \(\alpha_i\) the effect of the i-th level of the row factor, \(\beta_j\) the effect of the j-th level of the column factor, \(\gamma_{ij}\) the interaction effect of combining the i-th level of the row factor with the j-th level of the column factor.

In order for the coefficients to be uniquely determinable, we impose extra conditions:

\[\sum\limits_{i=1}^r\alpha_i=0,\quad \sum\limits_{j=1}^c\beta_j=0,\]
\[\gamma_{i1}+\gamma_{i2}+\ldots+\gamma_{ic}=0,\quad \gamma_{1j}+\gamma_{2j}+\ldots+\gamma_{rj}=0.\]

We can derive that the unbiased estimators of \(\mu,\alpha_i,\beta_j,\gamma_{ij}\) are respectively \(Y_{\bullet\bullet\bullet},\;Y_{i\bullet\bullet}-Y_{\bullet\bullet\bullet},\;Y_{\bullet j\bullet}-Y_{\bullet\bullet\bullet}\) and \(Y_{ij\bullet}-Y_{i\bullet\bullet}-Y_{\bullet j \bullet}+Y_{\bullet\bullet\bullet}\).

We can test whether the row effects, column effects or interaction effects are zero by constructing a two-way ANOVA table:

Source

SS

DF

MS

F-ratio

Row factor

\(mc\sum\limits_{i=1}^r (Y_{i\bullet\bullet}-Y_{\bullet\bullet\bullet})^2\)

\(r-1\)

\(MS_{ROW}=\frac{ROW\;SS}{r-1}\)

\(MS_{ROW}/MS_{ERROR}\)

Column factor

\(mr\sum\limits_{j=1}^c (Y_{\bullet j \bullet}-Y_{\bullet\bullet\bullet})^2\)

\(c-1\)

\(MS_{COL}=\frac{COL\;SS}{c-1}\)

\(MS_{COL}/MS_{ERROR}\)

Interaction (Row \(\times\) Column)

\(mc\sum\limits_{i=1}^r\sum\limits_{j=1}^c (Y_{i j \bullet}-Y_{i\bullet\bullet}-Y_{\bullet j\bullet}+Y_{\bullet\bullet\bullet})^2\)

\((r-1)(c-1)\)

\(MS_{R\times C}=\frac{ROW\;SS}{(r-1)(c-1)}\)

\(MS_{R\times C}/MS_{ERROR}\)

Within groups OR Error OR Residual

\(\sum\limits_{i=1}^r\sum\limits_{j=1}^c\sum\limits_{k=1}^m (Y_{ijk}-Y_{ij\bullet})^2\)

\(rc(m-1)\)

\(MS_{ERROR}=\frac{ERROR\;SS}{rc(m-1)}\)

Total (corr)

\(\sum\limits_{i=1}^r\sum\limits_{j=1}^c\sum\limits_{k=1}^m (Y_{ijk}-Y_{\bullet\bullet\bullet})^2\)

\(mcr-1\)

The F-statistics test various hypotheses (are the \(\alpha_i,\beta_i,\gamma_{ij}\) all zero?) and are distributed as \(F\) distributions. The numerator degrees of freedom come from each effect. The denominator degrees of freedom if the degrees of freedom of the Error/Residual/Within Groups SS.

Slides

Click here for the slides (.ppt) used in the lecture with an example of One-Way ANOVA.