class: center, middle, inverse, title-slide # STA 610L: Module 2.12 ## Random effects ANCOVA (introduction) ### Dr. Olanrewaju Michael Akande --- ## Introduction - ANOVA, ANCOVA, MANOVA: what is the difference? - .hlight[ANOVA (Analysis of Variance)]: continuous outcome, categorical predictor(s) - one-way ANOVA: one categorical predictor - two-way ANOVA: two categorical predictors - two-way ANOVA with interaction: you get the picture! - .hlight[ANCOVA (Analysis of Covariance)]: continuous outcome, categorical predictor(s), at least one continuous predictor that is generally considered a "nuisance". - .hlight[MANOVA (Multivariate ANOVA)]: multiple continuous outcomes, categorical predictor(s). Historically these names had implications regarding the estimation methods used, but that is no longer always the case. --- ## Motivating example: National Educational Longitudinal Study of Education (NELS) Hoff considers a subset of the NELS data that contains information on math scores of a random sample of 10th graders selected from a national sample of 685 large urban public schools. We plot the math scores `\(y_{ij}\)` of the `\(n_j\)` students in each school `\(j\)`, ranked by the average score. --- ## NELS example ```r load('data/nels.Rdata') head(nels) ``` ``` ## school enroll flp public urbanicity hwh ses mscore ## 1 1011 5 3 1 urban 2 -0.23 52.11 ## 2 1011 5 3 1 urban 0 0.69 57.65 ## 3 1011 5 3 1 urban 4 -0.68 66.44 ## 4 1011 5 3 1 urban 5 -0.89 44.68 ## 5 1011 5 3 1 urban 3 -1.28 40.57 ## 6 1011 5 3 1 urban 5 -0.93 35.04 ``` ```r str(nels) ``` ``` ## 'data.frame': 12974 obs. of 8 variables: ## $ school : Factor w/ 684 levels "1011","1012",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ enroll : num 5 5 5 5 5 5 5 5 5 5 ... ## $ flp : num 3 3 3 3 3 3 3 3 3 3 ... ## $ public : num 1 1 1 1 1 1 1 1 1 1 ... ## $ urbanicity: Factor w/ 3 levels "rural","suburban",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ hwh : num 2 0 4 5 3 5 1 4 8 2 ... ## $ ses : num -0.23 0.69 -0.68 -0.89 -1.28 -0.93 0.36 -0.24 -1.07 -0.1 ... ## $ mscore : num 52.1 57.6 66.4 44.7 40.6 ... ``` --- ## NELS example ```r avmscore.schools<-tapply(nels$mscore,nels$school,mean,na.rm=TRUE) id.schools<-names(avmscore.schools) m<-length(id.schools) plot(c(1,m),range(nels$mscore), type="n",ylab="math score", xlab="rank of school-specific math score average",cex=.7) for(school in id.schools[order( avmscore.schools )[seq(1,length(avmscore.schools),by=1)]]) { y<-nels$mscore[nels$school==school] x<-rank(avmscore.schools)[ id.schools==school] points( rep(x,length(y)), y,pch=16,cex=.6 ) points(x, mean(y),col="blue",pch=16,cex=.8) segments( x,min(y),x,max(y)) } abline(h=mean(avmscore.schools)) ``` --- ## NELS example <img src="2-12-random-effects-ancova-intro_files/figure-html/nelsplot1b-1.png" style="display: block; margin: auto;" /> --- ## NELS example - The school-specific averages range from 24 to 69, with 51 the average of all 685 school averages (weighting each school equally). - The school-specific variances range from 3 to 187 -- quite a wide range! - The school with the highest average only contains 4 observations. --- ## Which school is best? <img src="2-12-random-effects-ancova-intro_files/figure-html/nelsplot2-1.png" style="display: block; margin: auto;" /> The school with the highest average has a very small sample size `\((n_j=4)\)`. Do we have strong evidence that the true mean in this school is substantially larger than that in other schools in the sample? --- ## ANOVA One approach would be to fit a "standard" ANOVA model: ```r m1 <- lm(mscore~school,data=nels) anova(m1) ``` ``` ## Analysis of Variance Table ## ## Response: mscore ## Df Sum Sq Mean Sq F value Pr(>F) ## school 683 342385 501.30 6.8118 < 2.2e-16 ## Residuals 12290 904450 73.59 ``` ```r summary(aov(mscore~school,data=nels)) ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## school 683 342385 501.3 6.812 <2e-16 ## Residuals 12290 904450 73.6 ``` Here we see clear evidence of heterogeneity in math scores across schools. --- ## ANOVA results ```r library(sjPlot) plot_model(m1,sort.est=TRUE) ``` --- ## ANOVA results <img src="2-12-random-effects-ancova-intro_files/figure-html/catplot1-1.png" style="display: block; margin: auto;" /> Based on these estimates, we might conclude that the school has higher performance than some, but not all, schools. --- ## Random effects ANOVA We may then wish to use shrinkage estimation in order to stabilize that and other estimates for schools in which few students provide data, as we have done a few times now. A random effects ANOVA model is given by `$$y_{ij}=\mu+\alpha_j+\varepsilon_{ij},$$` where `\(\varepsilon_{ij} \sim N(0,\sigma^2)\)` and `\(\alpha_j \sim N(0,\tau^2)\)`. ```r library(lme4) m2 <- lmer(mscore~(1|school),data=nels) summary(m2) library(sjstats) icc(m2) ``` --- ## Random effects ANOVA ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: mscore ~ (1 | school) ## Data: nels ## ## REML criterion at convergence: 93914.6 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8113 -0.6534 0.0094 0.6732 4.7000 ## ## Random effects: ## Groups Name Variance Std.Dev. ## school (Intercept) 23.68 4.866 ## Residual 73.71 8.585 ## Number of obs: 12974, groups: school, 684 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 50.9390 0.2028 251.2 ``` ``` ## # Intraclass Correlation Coefficient ## ## Adjusted ICC: 0.243 ## Conditional ICC: 0.243 ``` --- ## Random effects ANOVA Here we examine the distribution of random effects. ```r library(lattice) dotplot(ranef(m2, condVar=TRUE)) #OR library(merTools) plotREsim(REsim(m2,n.sims=100),stat='median',sd=TRUE) ``` --- ## Random effects ANOVA <img src="2-12-random-effects-ancova-intro_files/figure-html/plotre2-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANOVA <img src="2-12-random-effects-ancova-intro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANOVA How do we conduct a formal test of heterogeneity in this random effects setting? Well, this is a bit more complicated than in the .hlight[fixed effects setting]. In particular, no heterogeneity corresponds to the case in which `\(\tau^2=0 \iff \alpha_1=\ldots=\alpha_J=0\)`, so saying something about the single parameter `\(\tau^2\)` has implications about the J parameters `\(\alpha_j\)`. A second problem is that `\(\tau^2\)` cannot be `\(<0\)`, and we wish to test `\(H_0: \tau^2=0\)`, so we're conducting a hypothesis test at the boundary of the parameter space instead of in the interior (which would be the case for `\(H_0: \mu=0\)`). --- ## Random effects ANOVA As shown in Stram and Lee (1994), the approximate asymptotic null distribution for `\(H_0: \tau^2=0\)` using a likelihood ratio test comparing our model to a model without random effects `\((y_{ij}=\mu+\varepsilon_{ij})\)` in this case is a 50-50 mixture of a `\(\chi^2_0\)` (point mass on 0) and a `\(\chi_1^2\)` distribution. In general, if we wish to compare a model with `\(q+1\)` random effects (calculated as terms that have a random effect, not the number of groups) to a nested model with `\(q\)` random effects, the asymptotic null distribution is a 50-50 mixture of `\(\chi^2_{q+1}\)` and `\(\chi^2_q\)` distributions. --- ## Random effects ANOVA Letting LR denote twice the difference in maximized log-likelihoods in the model with and without a single random effect, you can obtain the null density in R using `$$0.5*(\text{dchisq}(x,q+1)+\text{dchisq}(x,q))$$` and the p-value via `$$0.5*(1-\text{pchisq(LR,q+1)}+1-\text{pchisq}(LR,q)).$$` --- ## Random effects ANOVA For the NELS data fit using a frequentist random effects model, we would calculate this as follows. ```r m3 <- lmer(mscore~(1|school),data=nels,REML=FALSE) #ML estimation m4 <- lm(mscore~1,data=nels) LR <- 2*(logLik(m3)-logLik(m4)) LR ``` ``` ## 'log Lik.' 2137.067 (df=3) ``` ```r 0.5*(1-pchisq(LR[1],1)+1-pchisq(LR[1],0)) ``` ``` ## [1] 0 ``` ```r anova(m3,m4) ``` ``` ## Data: nels ## Models: ## m4: mscore ~ 1 ## m3: mscore ~ (1 | school) ## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) ## m4 2 96054 96069 -48025 96050 ## m3 3 93919 93942 -46957 93913 2137.1 1 < 2.2e-16 ``` We conclude that there is significant heterogeneity across schools in the mean math scores. --- ## Bringing SES into the mix NELS contains a measure of socioeconomic status (SES) for each student. We generally expect some degree of correlation between SES and math score (people who are good at math often can get good jobs, and then have kids who may inherit math talents; rich parents may have more time and resources to devote to their kids). Of course the relationship is not deterministic (there are plenty of math whizzes who did not have rich parents -- Gauss!, and there are plenty of rich parents who have kids who do not make good math scores -- college admissions scandal!). --- ## Bringing SES into the mix Let's look overall at the association between SES and math score in NELS. <img src="2-12-random-effects-ancova-intro_files/figure-html/scatter-1.png" style="display: block; margin: auto;" /> --- ## Big picture Consider "simulated" data on schools, which we represent using red, green, and blue points on the plot on the next slide, respectively. The schools we illustrate include one low SES school, one middle SES school, and one high SES school. Let's consider multiple ways in which we could obtain the marginal association between SES and math score on the previous slide. --- ## Big picture <img src="2-12-random-effects-ancova-intro_files/figure-html/illustrateplot-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANCOVA We want our model to be able to help us understand how SES `\((x_{ij})\)` and math scores are related in schools. In the framework of the model `$$y_{ij} = \beta_{0,j} + \beta_{1,j} x_{ij} + \varepsilon_{ij},$$` what values of `\(\beta_{j}\)` are consistent with these figures? One way to assess how SES is related to math score is to start with an ANCOVA model, allowing school-specific intercepts while including SES as a covariate `\(x_{ij}\)`: `$$y_{ij}=\beta_{0,j}+\beta_1x_{ij} + \varepsilon_{ij}.$$` In this model, we estimate the same effect of SES for each school. --- ## Random effects ANCOVA .large[ ```r m5 <- lmer(mscore~ses+(1|school),data=nels) summary(m5) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: mscore ~ ses + (1 | school) ## Data: nels ## ## REML criterion at convergence: 92558.1 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8753 -0.6428 0.0165 0.6693 4.4322 ## ## Random effects: ## Groups Name Variance Std.Dev. ## school (Intercept) 12.22 3.495 ## Residual 68.03 8.248 ## Number of obs: 12974, groups: school, 684 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 50.7175 0.1542 328.99 ## ses 4.3766 0.1123 38.98 ## ## Correlation of Fixed Effects: ## (Intr) ## ses -0.042 ``` ] The SES score itself is a summary score and not particularly interesting to interpret as is. --- ## Random effects ANCOVA We can standardize the variable to get a different kind of interpretation. .large[ ```r nels$sesstd <- nels$ses/sd(nels$ses) m5 <- lmer(mscore~sesstd+(1|school),data=nels) summary(m5) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: mscore ~ sesstd + (1 | school) ## Data: nels ## ## REML criterion at convergence: 92558.7 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8753 -0.6428 0.0165 0.6693 4.4322 ## ## Random effects: ## Groups Name Variance Std.Dev. ## school (Intercept) 12.22 3.495 ## Residual 68.03 8.248 ## Number of obs: 12974, groups: school, 684 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 50.7175 0.1542 328.99 ## sesstd 3.2900 0.0844 38.98 ## ## Correlation of Fixed Effects: ## (Intr) ## sesstd -0.042 ``` ] Pretty big effect of SES -- a 1 SD increase in SES is associated with a 3.3 point increase in math score on average. --- ## Random effects ANCOVA <img src="2-12-random-effects-ancova-intro_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANCOVA ```r plot_model(m5,type='re') ``` <img src="2-12-random-effects-ancova-intro_files/figure-html/plotre3-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANCOVA Let's plot the estimated school-specific lines from the random intercept model. ```r xplot=seq(-2.9,2.3,by=.1) yplot=rep(60,length(xplot)) plot(xplot,yplot,type="n",ylim=c(30,70),xlab="Standardized SES",ylab="Math Score") for(school in 1:length(id.schools)) { yplot=coef(m5)$school[school,1]+coef(m5)$school[school,2]*xplot lines(xplot,yplot) } ``` --- ## Random effects ANCOVA <img src="2-12-random-effects-ancova-intro_files/figure-html/schoolspecific1b-1.png" style="display: block; margin: auto;" /> --- ## Random effects ANCOVA This model allows separate intercepts for each school but assumes a common slope. One concern is whether SES has the same relationship with math scores in all schools. For example, some schools may have less of a disparity in scores across levels of SES than others. We can try random slopes of SES as discussed earlier and test the two nested models. We will revisit this in the next module. --- class: center, middle # What's next? ### Move on to the readings for the next module!