Adapted from here.

One-way ANOVA

Intro

One-Way Analysis of Variance (ANOVA) is a method for comparing the means of \(a\) populations. This kind of problem arises in two different settings.

  • When \(a\) independent random samples are drawn from \(a\) populations.
  • When the effects of \(a\) different treatments on a homogeneous group of experimental units is studied, the group of experimental units is subdivided into \(a\) subgroups and one treatment is applied to each subgroup. The \(a\) subgroups are then viewed as independent random samples from \(a\) populations.

Assumptions required for One-Way ANOVA:

  1. Random samples are independently selected from \(a\) (treatments) populations.
  2. The \(a\) populations are approximately normally distributed.
  3. All \(a\) population variances are equal.

Model

The assumptions are conveniently summarized in the following statistical model:

\[ X_{ij} = \mu_i + e_{ij} \]

where \(e_{ij}\) are i.i.d. \(N(0,\sigma^2)\), \(\;i=1,2,\ldots,a\), \(\;j=1,2,\ldots,n_i\).

Example: Tests were conducted to compare three top brands of golf balls for mean distance traveled when struck by a driver. A robotic golfer was employed with a driver to hit a random sample of 5 golf balls of each brand in a random sequence. Distance traveled, in yards, for each hit is shown in the table below.

Brand A Brand B Brand C
251.2 263.2 269.7
245.1 262.9 263.2
248.0 265.0 277.5
251.1 254.5 267.4
260.5 264.3 270.5

Suppose we want to compare the mean distance traveled by the three brands of golf balls based on the three samples. One-Way ANOVA provides a method to accomplish this.

Hypotheses

The hypotheses of interest in One-Way ANOVA are:

\[ \begin{aligned} H_0&:\mu_1=\mu_2=\ldots=\mu_a\\ H_A&:\mu_i\neq\mu_j\text{ for some $i$, $j$} \end{aligned} \]

  • In the above example, \(a=3\). So the mean distance traveled by the three brands of golfballs are equal according to \(H_0\).
  • According to \(H_A\), at least one mean is not equal to the others.

Overview

Sum squares

The total variability in the response, \(X_{ij}\) is partitioned into between treatment and within treatment (error) components. When these component values are squared and summed over all the observations, terms called sums of squares are produced. There is an additive relation which states that the total sum of squares equals the sum of the treatment and error sum of squares.

\[ SS_{Total}=SST+SSE \]

The notations \(SST\), \(SSTr\), \(SSTrt\), \(SS_{Tr}\), \(SStreatment\), and \(SS(Between)\) are synonymous for “treatment sum of squares”. The abbreviations \(SSE\), \(SSerror\), and \(SS(Within)\) are synonymous for “error sum of squares”. Associated with each sum of squares is its degrees of freedom. The total degrees of freedom is \(n-1\). The treatment degrees of freedom is \(a-1\) and the error degrees of freedom is \(n-a\). The degrees of freedom satisfy an an additive relationship, as did the sums of squares.

\[ n-1=(a-1)+(n-a) \]

Mean squares

Scaled versions of the treatment and error sums of squares (the sums of squares divided by their associated degrees of freedom) are known as mean squares:

\[ \begin{aligned} MST&=SST/(a-1)\\ MSE&=SSE/(n-a) \end{aligned} \]

\(MST\) and \(MSE\) are both estimates of the error variance, \(\sigma^2\). \(MSE\) is always unbiased (its mean equals \(\sigma^2\)), while \(MST\) is unbiased only when the null hypothesis is true. When the alternative \(H_A\) is true, \(MST\) will tend to be larger than \(MSE\). The ratio of the mean squares is \(F=MST/MSE\). This should be close to \(1\) when \(H_0\) is true, while large values of \(F\) provide evidence against \(H_0\). The null hypothesis \(H_0\) is rejected for large values of the observed test statistic \(F_{obs}\).

ANOVA Table

ANOVA calculations are conveniently displayed in the tabular form shown below, which is known as an ANOVA table.

Source \(SS\) \(df\) \(MS\) \(F_{obs}\) \(p\)-value
Treatments \(SST\) \(a\!-\!1\) \(MST\) \(MST/MSE\) \(P[F\geq Fobs]\)
Error \(SSE\) \(n\!-\!a\) \(MSE\)
Total \(SS_{Total}\) \(n\!-\!1\)

Details

  • \(a\) is the number of factor levels (treatments) or populations
  • \(x_{ij}\) is the \(j\)-th observation in the \(i\)-th sample, \(j=1,2,\ldots,n_i\)
  • \(n_i\) is the sample size for the \(i\)-th sample
  • \(n=\sum_{i=1}^an_i\) is the total number of observations
  • \(\bar{x}_{i.}=\frac1{n_i}\sum_{j=1}^{n_i}x_{ij}\) is the \(i\)-th sample mean
  • \(s_i^2=\frac1{n_i-1}\sum_{j=1}^{n_i}(x_{ij}-\bar{x}_{i.})^2\) is the \(i\)-th sample variance
  • \(\bar{x}_{..}=\frac1n\sum_{i=1}^a\sum_{j=1}^{n_i}x_{ij}=\frac1n\sum_{i=1}^an_i\bar{x}_{i.}\) is the grand mean of all observations

Here are the formulas for sums of squares. We will see that there are simpler formulas when we know the sample means and sample variances for each of the a groups.

\[ \begin{aligned} SS_{Total}&=\sum_{i=1}^a\sum_{j=1}^{n_i}(x_{ij}-\bar{x}_{..})^2\\ SST&=\sum_{i=1}^a\sum_{j=1}^{n_i}(\bar{x}_{i.}-\bar{x}_{..})^2=\sum_{i=1}^an_i(\bar{x}_{i.}-\bar{x}_{..})^2\\ SSE&=\sum_{i=1}^a\sum_{j=1}^{n_i}(x_{ij}-\bar{x}_{i.})^2=\sum_{i=1}^a(n_i-1)s_i^2 \end{aligned} \]

The test statistic is

\[ F_{obs}=\frac{SST/(a-1)}{SSE/(n-a)} \]

and the p-value is \(P(F\geq F_{obs})\).

Note:

  • \(F_{obs}\) is the observed value of the test statistic
  • Under the null hypothesis \(F\) has an \(F\) distribution with \(a-1\) numerator and \(n-a\) denominator degrees of freedom
  • the \(p\)-value is \(P(F_{a-1,\,n-a}\geq F_{obs})\)
  • reject \(H_0\) at level \(\alpha\) when the \(p\text{-value}<\alpha\)
  • equivalently, when \(F_{obs}\geq F_{\alpha,\,a-1,\,n-a}\), which is the upper \(\alpha\)-th percentile of the \(F\) distribution with \(a-1\) numerator and \(n-a\) denominator degrees of freedom

Example

Consider again the tests conducted to compare three brands of golf balls for mean distance traveled when struck by a driver. Again, distances traveled, in yards, for each hit and are shown below.

Brand A Brand B Brand C
251.2 263.2 269.7
245.1 262.9 263.2
248.0 265.0 277.5
251.1 254.5 267.4
260.5 264.3 270.5

Here we’ll compare the mean distance traveled by the different brands of golf balls using ANOVA.

\(i\) \(\bar{x}_{i.}\) \(s_i^2\) \(n_i\)
1 251.18 33.487 5
2 261.98 18.197 5
3 269.66 27.253 5
  1. Total sample size \(n=\sum_{i=1}^an_i=5+5+5=15\)
  2. \(a=3\) groups
  3. The treatment degrees of freedom is \(a-1=2\). The error degrees of freedom is \(n-a=15-3=12\)
  4. The grand mean is \[\bar{x}_{..}=\frac1n\sum_{i=1}^an_i\bar{x}_{i.}=(5\times251.18+5\times261.98+5\times269.66)/15=260.94\]
  5. The treatment sum of squares is \[SST=\sum_{i=1}^an_i(\bar{x}_{i.}-\bar{x}_{..})^2=5\Big[(251.18-260.94)^2+(261.98-260.94)^2+(269.66-260.94)^2\Big]=861.89\]
  6. The error sum of squares is \[SSE=\sum_{i=1}^a(n_i-1)s_i^2=4\Big[33.487+18.197+27.253\Big]=315.75\]
  7. The quantities can be summarized in an ANOVA table
    Source \(SS\) \(df\) \(MS\) \(F_{obs}\) \(p\)-value
    Treatments \(861.89\) \(2\) \(430.94\) \(16.378\) \(0.0003715\)
    Error \(315.75\) \(12\) \(26.31\)
    Total \(1177.64\) \(14\)
  8. The observed test statistic is \(F_{obs}=16.37\) with 2 numerator and 12 denominator degrees of freedom
  9. The p-value is \(P(F_{a-1,\,n-a}\geq F_{obs})=\) 1-pf(16.37,2,12) \(\approx3.723\times 10^{-4}\)
  10. Since the \(p\text{-value}<.05\), reject \(H_0\) at \(\alpha=0.05\) and conclude that the mean travel distances for all three brands of golf balls are not the same

Check with R

library(tidyverse)

# create data frame
golf = data.frame(A = c(251.2,245.1,248.0,251.1,260.5),
                  B = c(263.2,262.9,265.0,254.5,264.3),
                  C = c(269.7,263.2,277.5,267.4,270.5))
print(golf)
##       A     B     C
## 1 251.2 263.2 269.7
## 2 245.1 262.9 263.2
## 3 248.0 265.0 277.5
## 4 251.1 254.5 267.4
## 5 260.5 264.3 270.5
# convert to long format
golf.long = golf %>% pivot_longer(1:3,"brand",values_to="distance") %>% arrange(brand)
print(golf.long,n=Inf)
## # A tibble: 15 x 2
##    brand distance
##    <chr>    <dbl>
##  1 A         251.
##  2 A         245.
##  3 A         248 
##  4 A         251.
##  5 A         260.
##  6 B         263.
##  7 B         263.
##  8 B         265 
##  9 B         254.
## 10 B         264.
## 11 C         270.
## 12 C         263.
## 13 C         278.
## 14 C         267.
## 15 C         270.
# make ANOVA table
aov.golf = aov(distance ~ brand, data=golf.long)
summary(aov.golf)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## brand        2  861.9   430.9   16.38 0.000372 ***
## Residuals   12  315.7    26.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check if assumptions are violated:

par(mfrow=c(1,2))
plot(aov.golf,which=1:2,ask=F)

Post-hoc analysis

Tukey HSD

Best for all pairwise comparisons. Easy to apply by using TukeyHSD() function on result of aov() model.

TukeyHSD(aov.golf, conf.level=.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = distance ~ brand, data = golf.long)
## 
## $brand
##      diff        lwr      upr     p adj
## B-A 10.80  2.1448757 19.45512 0.0153590
## C-A 18.48  9.8248757 27.13512 0.0002718
## C-B  7.68 -0.9751243 16.33512 0.0841836
plot(TukeyHSD(aov.golf, conf.level=.95))

Bonferroni

Best for making a smaller set of planned comparisons. For example, suppose Brand A is a control, and Brand B and Brand C are new prototype ball designs, and we are only interested in comparing A with B and A with C (this isn’t the best example since there’s only 3 groups so there aren’t many possible combinations; in general you only want to test a small fraction of the possible tests with Bonferroni correction).

Probably the easiest way is to run pairwise.t.test() with p.adjust.method="none" and just manually compare the \(p\)-values to \(\alpha/m\) (where \(m\) is the number of tests you planned to run).

pairwise.t.test(golf.long$distance, golf.long$brand, p.adjust.method="none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  golf.long$distance and golf.long$brand 
## 
##   A     B    
## B 0.006 -    
## C 1e-04 0.036
## 
## P value adjustment method: none


Kruskal-Wallis test (nonparametric alternative)

The assumptions of the usual one way ANOVA are:

\[ X_{ij} = \mu_i + e_{ij} \]

where \(e_{ij}\) are i.i.d. \(N(0,\sigma^2)\), \(\;i=1,2,\ldots,a\), \(\;j=1,2,\ldots,n_i\).

If there is strong evidence of non-normality in the residuals, then one way ANOVA may not give a valid \(p\)-value. In such cases, the Kruskal-Wallis test is useful. This is the analogue of the Wilcoxon test for more than two populations.

The process is rather complicated, and you will NOT be expected to remember this, but in case you are curious, here is the equation:

\[ H=(n-1)\frac{\sum_{i=1}^a n_i(\bar{r}_{i\cdot}-\bar{r})^2}{\sum_{i=1}^a\sum_{j=1}^{n_i}(r_{ij}-\bar{r})^2} \]

Where \(r_{ij}\) is the rank of \(x_{ij}\) in entire sample (across all groups), \(\bar{r}_{i.}\) is the mean rank of all observations in group \(i\), and \(\bar{r}\) is the mean of all \(r_{ij}\). The result of this under the null distribution has a \(\chi^2\) distribution with \(a-1\) degrees of freedom.

In R, this is done with:

kruskal.test(distance~brand, data=golf.long)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  distance by brand
## Kruskal-Wallis chi-squared = 10.864, df = 2, p-value = 0.004373

Similar to other nonparametric methods, the lack of normality assumption in the Kruskal-Wallis test means it’s more widely applicable than ANOVA, but at the cost of a bit of power. This is also a common theme in statistics. When you can make assumptions about the distributions of data, you have more information about it, and can therefore detect smaller effect sizes, giving you more power. This is why parametric methods like ANOVA are so popular. However, when those assumptions are found to be violated, you settle for a nonparametric method. Even though it gives slightly less power, it’s the best you have.


Two-way ANOVA

Intro

Example: Tests were conducted to assess the effects of two factors, engine type, and propellant type, on propellant burn rate in fired missiles. Three engine types and four propellant types were tested.

Twenty-four missiles were selected from a large production batch. The missiles were randomly split into three groups of size eight. The first group of eight had engine type 1 installed, the second group had engine type 2, and the third group received engine type 3.

Each group of eight was randomly divided into four groups of two. The first such group was assigned propellant type 1, the second group was assigned propellant type 2, and so on. Data on burn rate were collected, as follows:

\[ \begin{array}{r|cccc} \text{Engine}&\hfill\text{Prop.}&\text{type}\hfill&&\\ \text{type}&1&2&3&4\\ \hline 1&34.0&30.1&29.8&29.0\\ &32.7&32.8&26.7&28.9\\ \hline 2&32.0&30.2&28.7&27.6\\ &33.2&29.8&28.1&27.8\\ \hline 3&28.4&27.3&29.7&28.8\\ &29.3&28.9&27.3&29.1\\ \end{array} \]

We want to determine whether either factor, engine type (factor A) or propellant type (factor B), has a significant effect on burn rate.

Model

Let \(Y_{ijk}\) denote the \(k\)-th observation at the \(i\)-th level of factor A and the \(j\)-th level of factor B. The two factor model (with interaction) is.

\[ Y_{ijk}=\mu+\alpha_i+\beta_j+\gamma_{ij}+\epsilon_{ijk} \]

where

  • \(\;i=1,2,\ldots,I\), \(\;j=1,2,\ldots,J\), \(\;k=1,2,\ldots,K\)
  • \(e_{ij}\) are i.i.d. \(N(0,\sigma^2)\)
  • \(\mu\) is the overall mean
  • \(\sum_{i=1}^I\alpha_i=0\)
  • \(\sum_{j=1}^J\beta_j=0\)
  • \(\sum_{i=1}^I\gamma_{ij}=0\) for each \(\;j=1,2,\ldots,J\)
  • \(\sum_{j=1}^J\gamma_{ij}=0\) for each \(\;i=1,2,\ldots,I\)

Here we have included an interaction term, which is fairly common in this method. The interaction term allows one variable to affect the influence of the other variable. Without the interaction, whether A is 1, 2, or 3 would not be allowed to affect in any way the ability of variable B to predict the outcome. In other words, their abilities to predict the outcome are totally

The mean of \(Y_{ijk}\) is

\[ E[Y_{ijk}]=\mu+\alpha_i+\beta_j+\gamma_{ij} \]

Note the sum constraints ensure that there is a unique correspondence between the parameters (the \(\alpha\)’s, \(\beta\)’s, \(\gamma\)’s and \(\mu\)) and the means of the random variables (the \(\mu_{ijk}\)’s).

Procedure

In the two-way ANOVA model, we compute a sum square for each factor, as well as the interaction. The formulae are slightly more complicated than one-way, but the intuition is basically the same. We want to compare the variation within to the variation between.

Again, we are only considering the balanced case. Below are the sum square formulae for main effect A, main effect B, interaction effect AB, error, and total:

\[ \begin{aligned} SSA&=JK\sum_i(\bar{y}_{i..}-\bar{y}_{...})^2&&\text{df: }I-1\\ SSB&=IK\sum_i(\bar{y}_{.j.}-\bar{y}_{...})^2&&\text{df: }J-1\\ SSAB&=K\sum_{i,j}(\bar{y}_{ij.}-\bar{y}_{i..}-\bar{y}_{.j.}+\bar{y}_{...})^2&&\text{df: }(I-1)(J-1)\\ SSE&=\sum_{i,j,k}(y_{ijk}-\bar{y}_{ij.})^2&&\text{df: }IJ(K-1)\\ SS_{Total}&=\sum_{i,j,k}(y_{ijk}-\bar{y}_{...})^2&&\text{df: }IJK-1\\ \end{aligned} \]

where the dot has the usual meaning of taking the mean over that index. These formulae are significantly more complex, so we are not expecting you to be able to do them manually but you should understand their definition and be able to recognize them.

Each component SS can be normalized by dividing by its degrees of freedom, and then each effect SS can produce its own \(F\)-statistic by dividing by the SSE. To save time, we are not going to show all this manually; we’re just going to make the ANOVA table with R.

missiles.long = read.csv(text="
34.0,30.1,29.8,29.0
32.7,32.8,26.7,28.9
32.0,30.2,28.7,27.6
33.2,29.8,28.1,27.8
28.4,27.3,29.7,28.8
29.3,28.9,27.3,29.1", header=F) %>% 
   pivot_longer(1:4,names_to=NULL,values_to="rate") %>% 
   mutate(engine     = as.character(rep(1:3,each=8)),
          propellant = as.character(rep(1:4,length.out=24)))

head(missiles.long,12)
## # A tibble: 12 x 3
##     rate engine propellant
##    <dbl> <chr>  <chr>     
##  1  34   1      1         
##  2  30.1 1      2         
##  3  29.8 1      3         
##  4  29   1      4         
##  5  32.7 1      1         
##  6  32.8 1      2         
##  7  26.7 1      3         
##  8  28.9 1      4         
##  9  32   2      1         
## 10  30.2 2      2         
## 11  28.7 2      3         
## 12  27.6 2      4
aov.missiles = aov(rate~engine*propellant, data=missiles.long)
summary(aov.missiles)
##                   Df Sum Sq Mean Sq F value  Pr(>F)   
## engine             2  14.52   7.262   5.844 0.01690 * 
## propellant         3  40.08  13.361  10.753 0.00102 **
## engine:propellant  6  22.16   3.694   2.973 0.05117 . 
## Residuals         12  14.91   1.243                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observations:

  • The \(p\)-value for the test for no interaction between propellant type and engine type is \(0.05117\), so we didn’t find a significant interaction between the two effects. In other words, their effects appear to act independently of each other.
  • Note: It only makes sense to test for the main effects of factors A and B if there is no interaction between the factors (since if interaction is significant, then clearly some combination of A and B is important for predicting the outcome).
  • Here, the \(p\)-values for A and B main effects are \(0.017\) and \(0.0010\), so both effects are significant. We conclude engine and propellant types both significantly affect missile burn rate.

Check residuals just in case.

par(mfrow=c(1,2),cex=.6)
plot(aov.missiles,which=1:2,ask=F)

N-way (factorial) ANOVA

This can be easily extended to \(N\)-factors. Here we show one final example (only using R) of a 3-way factorial ANOVA analysis.

The data represents crop yields in a number of plots in a research farm. Yields are measured for different irrigation levels (2 levels: irrigated or not), sowing density (3 levels: low, medium, high), and fertilizer application (3 levels: low, medium, high).

Below is the data:

crops = read_table("https://raw.githubusercontent.com/shifteight/R-lang/master/TRB/data/splityield.txt")
str(crops)
## spec_tbl_df [72 x 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ yield     : num [1:72] 90 95 107 92 89 92 81 92 93 80 ...
##  $ block     : chr [1:72] "A" "A" "A" "A" ...
##  $ irrigation: chr [1:72] "control" "control" "control" "control" ...
##  $ density   : chr [1:72] "low" "low" "low" "medium" ...
##  $ fertilizer: chr [1:72] "N" "P" "NP" "N" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   yield = col_double(),
##   ..   block = col_character(),
##   ..   irrigation = col_character(),
##   ..   density = col_character(),
##   ..   fertilizer = col_character()
##   .. )
head(crops,18)
## # A tibble: 18 x 5
##    yield block irrigation density fertilizer
##    <dbl> <chr> <chr>      <chr>   <chr>     
##  1    90 A     control    low     N         
##  2    95 A     control    low     P         
##  3   107 A     control    low     NP        
##  4    92 A     control    medium  N         
##  5    89 A     control    medium  P         
##  6    92 A     control    medium  NP        
##  7    81 A     control    high    N         
##  8    92 A     control    high    P         
##  9    93 A     control    high    NP        
## 10    80 A     irrigated  low     N         
## 11    87 A     irrigated  low     P         
## 12   100 A     irrigated  low     NP        
## 13   121 A     irrigated  medium  N         
## 14   110 A     irrigated  medium  P         
## 15   119 A     irrigated  medium  NP        
## 16    78 A     irrigated  high    N         
## 17    98 A     irrigated  high    P         
## 18   122 A     irrigated  high    NP

Running 3-way ANOVA with all interaction effects:

aov.crops = aov(yield~irrigation*density*fertilizer, data=crops)
summary(aov.crops)
##                               Df Sum Sq Mean Sq F value   Pr(>F)    
## irrigation                     1   8278    8278  59.575 2.81e-10 ***
## density                        2   1758     879   6.328  0.00340 ** 
## fertilizer                     2   1977     989   7.116  0.00181 ** 
## irrigation:density             2   2747    1374   9.885  0.00022 ***
## irrigation:fertilizer          2    953     477   3.431  0.03956 *  
## density:fertilizer             4    305      76   0.549  0.70082    
## irrigation:density:fertilizer  4    235      59   0.422  0.79183    
## Residuals                     54   7503     139                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check residuals just in case.

par(mfrow=c(1,2),cex=.6)
plot(aov.crops,which=1:2,ask=F)