STA 610L: Module 2.12

class: center, middle, inverse, title-slide

# STA 610L: Module 2.12
## Random effects ANCOVA (introduction)
### Dr. Olanrewaju Michael Akande

---

## Introduction

- ANOVA, ANCOVA, MANOVA: what is the difference?

- .hlight[ANOVA (Analysis of Variance)]: continuous outcome, categorical predictor(s)
    - one-way ANOVA: one categorical predictor
    - two-way ANOVA: two categorical predictors
    - two-way ANOVA with interaction: you get the picture!
  
- .hlight[ANCOVA (Analysis of Covariance)]: continuous outcome, categorical predictor(s), at least one continuous predictor that is generally considered a "nuisance".

- .hlight[MANOVA (Multivariate ANOVA)]: multiple continuous outcomes, categorical predictor(s).

Historically these names had implications regarding the estimation methods used, but that is no longer always the case.

---
## Motivating example: National Educational Longitudinal Study of Education (NELS)

Hoff considers a subset of the NELS data that contains information on math scores of a random sample of 10th graders selected from a national sample of 685 large urban public schools.

We plot the math scores `$y_{ij}$` of the `$n_j$` students in each school `$j$`, ranked by the average score.

---
## NELS example

```r
load('data/nels.Rdata')
head(nels)
```

```
##   school enroll flp public urbanicity hwh   ses mscore
## 1   1011      5   3      1      urban   2 -0.23  52.11
## 2   1011      5   3      1      urban   0  0.69  57.65
## 3   1011      5   3      1      urban   4 -0.68  66.44
## 4   1011      5   3      1      urban   5 -0.89  44.68
## 5   1011      5   3      1      urban   3 -1.28  40.57
## 6   1011      5   3      1      urban   5 -0.93  35.04
```

```r
str(nels)
```

```
## 'data.frame':	12974 obs. of  8 variables:
##  $ school    : Factor w/ 684 levels "1011","1012",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ enroll    : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ flp       : num  3 3 3 3 3 3 3 3 3 3 ...
##  $ public    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ urbanicity: Factor w/ 3 levels "rural","suburban",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ hwh       : num  2 0 4 5 3 5 1 4 8 2 ...
##  $ ses       : num  -0.23 0.69 -0.68 -0.89 -1.28 -0.93 0.36 -0.24 -1.07 -0.1 ...
##  $ mscore    : num  52.1 57.6 66.4 44.7 40.6 ...
```

---
## NELS example

```r
avmscore.schools<-tapply(nels$mscore,nels$school,mean,na.rm=TRUE)
id.schools<-names(avmscore.schools)
m<-length(id.schools)

plot(c(1,m),range(nels$mscore), type="n",ylab="math score", xlab="rank of  school-specific math score  average",cex=.7)

for(school in id.schools[order( avmscore.schools )[seq(1,length(avmscore.schools),by=1)]])
{
  y<-nels$mscore[nels$school==school]
  x<-rank(avmscore.schools)[ id.schools==school]
  points( rep(x,length(y)), y,pch=16,cex=.6 ) 
  points(x, mean(y),col="blue",pch=16,cex=.8) 
  segments( x,min(y),x,max(y))
}

abline(h=mean(avmscore.schools))
```

---
## NELS example

---
## NELS example

- The school-specific averages range from 24 to 69, with 51 the average of all 685 school averages (weighting each school equally).

- The school-specific variances range from 3 to 187 -- quite a wide range!

- The school with the highest average only contains 4 observations.

---
## Which school is best?

The school with the highest average has a very small sample size `$(n_j=4)$`. Do we have strong evidence that the true mean in this school is substantially larger than that in other schools in the sample?

---
## ANOVA

One approach would be to fit a "standard" ANOVA model:

```r
m1 <- lm(mscore~school,data=nels)
anova(m1)
```

```
## Analysis of Variance Table
## 
## Response: mscore
##              Df Sum Sq Mean Sq F value    Pr(>F)
## school      683 342385  501.30  6.8118 < 2.2e-16
## Residuals 12290 904450   73.59
```

```r
summary(aov(mscore~school,data=nels))
```

```
##                Df Sum Sq Mean Sq F value Pr(>F)
## school        683 342385   501.3   6.812 <2e-16
## Residuals   12290 904450    73.6
```

Here we see clear evidence of heterogeneity in math scores across schools.

---
## ANOVA results

```r
library(sjPlot)
plot_model(m1,sort.est=TRUE)
```

---
## ANOVA results

Based on these estimates, we might conclude that the school has higher performance than some, but not all, schools.

---
## Random effects ANOVA

We may then wish to use shrinkage estimation in order to stabilize that and other estimates for schools in which few students provide data, as we have done a few times now.

A random effects ANOVA model is given by `$$y_{ij}=\mu+\alpha_j+\varepsilon_{ij},$$` where `$\varepsilon_{ij} \sim N(0,\sigma^2)$` and `$\alpha_j \sim N(0,\tau^2)$`.

```r
library(lme4)
m2 <- lmer(mscore~(1|school),data=nels)
summary(m2)
library(sjstats)
icc(m2)
```

---
## Random effects ANOVA

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: mscore ~ (1 | school)
##    Data: nels
## 
## REML criterion at convergence: 93914.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8113 -0.6534  0.0094  0.6732  4.7000 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  school   (Intercept) 23.68    4.866   
##  Residual             73.71    8.585   
## Number of obs: 12974, groups:  school, 684
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  50.9390     0.2028   251.2
```

```
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.243
##   Conditional ICC: 0.243
```

---
## Random effects ANOVA
Here we examine the distribution of random effects.

```r
library(lattice)
dotplot(ranef(m2, condVar=TRUE))
#OR
library(merTools)
plotREsim(REsim(m2,n.sims=100),stat='median',sd=TRUE)
```

---
## Random effects ANOVA

---
## Random effects ANOVA

---
## Random effects ANOVA

How do we conduct a formal test of heterogeneity in this random effects setting? Well, this is a bit more complicated than in the .hlight[fixed effects setting].

In particular, no heterogeneity corresponds to the case in which `$\tau^2=0 \iff \alpha_1=\ldots=\alpha_J=0$`, so saying something about the single parameter `$\tau^2$` has implications about the J parameters `$\alpha_j$`.

A second problem is that `$\tau^2$` cannot be `$<0$`, and we wish to test `$H_0: \tau^2=0$`, so we're conducting a hypothesis test at the boundary of the parameter space instead of in the interior (which would be the case for `$H_0: \mu=0$`).

---
## Random effects ANOVA

As shown in Stram and Lee (1994), the approximate asymptotic null distribution for `$H_0: \tau^2=0$` using a likelihood ratio test comparing our model to a model without random effects `$(y_{ij}=\mu+\varepsilon_{ij})$` in this case is a 50-50 mixture of a `$\chi^2_0$` (point mass on 0) and a `$\chi_1^2$` distribution.

In general, if we wish to compare a model with `$q+1$` random effects (calculated as terms that have a random effect, not the number of groups) to a nested model with `$q$` random effects, the asymptotic null distribution is a 50-50 mixture of `$\chi^2_{q+1}$` and `$\chi^2_q$` distributions.

---
## Random effects ANOVA

Letting LR denote twice the difference in maximized log-likelihoods in the model with and without a single random effect, you can obtain the null density in R using `$$0.5*(\text{dchisq}(x,q+1)+\text{dchisq}(x,q))$$` and the p-value via `$$0.5*(1-\text{pchisq(LR,q+1)}+1-\text{pchisq}(LR,q)).$$`

---
## Random effects ANOVA

For the NELS data fit using a frequentist random effects model, we would calculate this as follows.

```r
m3 <- lmer(mscore~(1|school),data=nels,REML=FALSE) #ML estimation
m4 <- lm(mscore~1,data=nels)
LR <- 2*(logLik(m3)-logLik(m4))
LR
```

```
## 'log Lik.' 2137.067 (df=3)
```

```r
0.5*(1-pchisq(LR[1],1)+1-pchisq(LR[1],0))
```

```
## [1] 0
```

```r
anova(m3,m4)
```

```
## Data: nels
## Models:
## m4: mscore ~ 1
## m3: mscore ~ (1 | school)
##    npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)
## m4    2 96054 96069 -48025    96050                     
## m3    3 93919 93942 -46957    93913 2137.1  1  < 2.2e-16
```

We conclude that there is significant heterogeneity across schools in the mean math scores.

---
## Bringing SES into the mix

NELS contains a measure of socioeconomic status (SES) for each student.

We generally expect some degree of correlation between SES and math score (people who are good at math often can get good jobs, and then have kids who may inherit math talents; rich parents may have more time and resources to devote to their kids).

Of course the relationship is not deterministic (there are plenty of math whizzes who did not have rich parents -- Gauss!, and there are plenty of rich parents who have kids who do not make good math scores -- college admissions scandal!).

---
## Bringing SES into the mix

Let's look overall at the association between SES and math score in NELS.

---
## Big picture

Consider "simulated" data on schools, which we represent using red, green, and blue points on the plot on the next slide, respectively.

The schools we illustrate include one low SES school, one middle SES school, and one high SES school.

Let's consider multiple ways in which we could obtain the marginal association between SES and math score on the previous slide.

---
## Big picture

---
## Random effects ANCOVA

We want our model to be able to help us understand how SES `$(x_{ij})$` and math scores are related in schools.

In the framework of the model
`$$y_{ij} = \beta_{0,j} + \beta_{1,j} x_{ij} + \varepsilon_{ij},$$`

what values of `$\beta_{j}$` are consistent with these figures?

One way to assess how SES is related to math score is to start with an ANCOVA model, allowing school-specific intercepts while including SES as a covariate `$x_{ij}$`:
`$$y_{ij}=\beta_{0,j}+\beta_1x_{ij} + \varepsilon_{ij}.$$`

In this model, we estimate the same effect of SES for each school.

---
## Random effects ANCOVA

.large[

```r
m5 <- lmer(mscore~ses+(1|school),data=nels)
summary(m5)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: mscore ~ ses + (1 | school)
##    Data: nels
## 
## REML criterion at convergence: 92558.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8753 -0.6428  0.0165  0.6693  4.4322 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  school   (Intercept) 12.22    3.495   
##  Residual             68.03    8.248   
## Number of obs: 12974, groups:  school, 684
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  50.7175     0.1542  328.99
## ses           4.3766     0.1123   38.98
## 
## Correlation of Fixed Effects:
##     (Intr)
## ses -0.042
```
]

The SES score itself is a summary score and not particularly interesting to interpret as is.

---
## Random effects ANCOVA

We can standardize the variable to get a different kind of interpretation.

.large[

```r
nels$sesstd <- nels$ses/sd(nels$ses)
m5 <- lmer(mscore~sesstd+(1|school),data=nels)
summary(m5)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: mscore ~ sesstd + (1 | school)
##    Data: nels
## 
## REML criterion at convergence: 92558.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8753 -0.6428  0.0165  0.6693  4.4322 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  school   (Intercept) 12.22    3.495   
##  Residual             68.03    8.248   
## Number of obs: 12974, groups:  school, 684
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  50.7175     0.1542  328.99
## sesstd        3.2900     0.0844   38.98
## 
## Correlation of Fixed Effects:
##        (Intr)
## sesstd -0.042
```
]

Pretty big effect of SES -- a 1 SD increase in SES is associated with a 3.3 point increase in math score on average.

---
## Random effects ANCOVA

---
## Random effects ANCOVA

```r
plot_model(m5,type='re')
```

---
## Random effects ANCOVA

Let's plot the estimated school-specific lines from the random intercept model.

```r
xplot=seq(-2.9,2.3,by=.1)
yplot=rep(60,length(xplot))
plot(xplot,yplot,type="n",ylim=c(30,70),xlab="Standardized SES",ylab="Math Score")
for(school in 1:length(id.schools))
{
  yplot=coef(m5)$school[school,1]+coef(m5)$school[school,2]*xplot
  lines(xplot,yplot)
}
```

---
## Random effects ANCOVA

---
## Random effects ANCOVA

This model allows separate intercepts for each school but assumes a common slope.

One concern is whether SES has the same relationship with math scores in all schools.

For example, some schools may have less of a disparity in scores across levels of SES than others.

We can try random slopes of SES as discussed earlier and test the two nested models.

We will revisit this in the next module.

---

class: center, middle

# What's next?

### Move on to the readings for the next module!