STA 610L: Module 4.7

class: center, middle, inverse, title-slide

# STA 610L: Module 4.7
## Missing data
### Dr. Olanrewaju Michael Akande

---

## Introduction to missing data

Missing data/nonresponse is fairly common in real data and applications.

For example,

+ Failure to respond to a survey question.
  
  + Subject misses some clinic visits out of all possible.
  
  + Only subset of subjects are asked certain questions.
  
The most common software packages often throw away all subjects with incomplete data (can lead to bias and precision loss).

---
## Introduction to missing data

Ideally, analysts should first decide on how to deal with missing data before moving on to analysis.

One needs to make assumptions and ask tons of questions, for example,
  + why are the values missing?
  
  + what is the pattern of missingness?
  
  + what is the proportion of missing values in the data?

As a Bayesian, one could treat the missing values as parameters and estimate them simultaneously with the analysis, but even in that case, one must still ask the same questions.

Ask as many questions as possible to help you figure out the most plausible assumptions!

---
## Introduction to missing data

Simplest approach: complete/available case analyses -- delete cases with missing data.

Often problematic because:
  + it is just not feasible sometimes (small `$n$` large `$p$` problem) -- when we have a small number of observations but a large number of variables, we simply can not afford to throw away data, even when the proportion of missing data is small.
  
  + information loss -- even when we do not have the small `$n$`, large `$p$` problem, we still lose information when we delete cases.
  
  + biased results -- because the missing data mechanism is rarely random, features of the observed data can be completely different from the missing data.

More principled approach: impute the missing data (in a statistically proper fashion) and analyze the imputed data.

---
## Why should we care?

- Loss of power due to the the smaller sample size 
 + can't regain lost power.

- Any analysis must make an untestable assumption about the missing data
 + wrong assumption `$\Rightarrow$` biased estimates.

- Some popular analyses with missing data get biased standard errors
 + resulting in wrong p-values and confidence intervals.

- Some popular analyses with missing data are inefficient
 + so that confidence intervals are wider than they need be.

---
## What to do: loss of power

Approach by design:

- minimize amount of missing data
  + good communications with participants, for example, patients in clinical trial, participants in surveys and censuses, etc
  + follow up as much as possible; make repeated attempts using different methods
    
- reduce the impact of missing data
  + collect reasons for missing data
  + collect information predictive of missing values
    
    
---
## What to do: analysis

A suitable method of analysis would:
  + make the correct (or plausible) assumption about the missing data
  
  + give an unbiased estimate (under that assumption)
  
  + give an unbiased standard error (so that p-values and confidence intervals are correct)
  
  + be efficient (make best use of the available data)

However, we can never be sure about what the correct assumption is `$\Rightarrow$` sensitivity analyses are essential!

---
## How to approach the analysis?

Start by knowing:
  + extent of missing data
  + pattern of missing data (e.g. is `$X_1$` always missing whenever `$X_2$` is also missing?)
  + predictors of missing data and of outcome

Principled approach to missing data:
  + identify a plausible assumption (through discussions between you as a data scientist and your clients)
  + choose an analysis method that's valid under that assumption
  
Just because a method is simple to use does not make it plausible; some analysis methods are simple to describe but have complex and/or implausible assumptions.

---
## Handling missing data

Many analysts still impute missing values with a mean or some other fixed (single) value (ignores uncertainty).

However, it is generally better to rely on methods that can incorporate the uncertainty around imputed values (see Little and Rubin (2019)).

A common approach for doing this is .hlight[multiple imputation] (see .hlight[mice] package in R).

Again, imputing missing data is quite natural in the Bayesian context since each missing value is simply treated as an additional parameter.

In fact, multiple imputation basically relies on Bayesian ideas.

Thus, we will focus on handling missing data in Bayesian models.

If you would like to learn about multiple imputation, see the slides [here](https://ids-702-f20.github.io/Course-Website/slides/5-3-imputation-methods-II.html#1) and [here](https://ids-702-f20.github.io/Course-Website/slides/5-4-imputation-methods-III.html#1).

---
## Pattern of missing data

Missing data patterns may be .hlight[monotone] or .hlight[nonmonotone].

In a .hlight[monotone missing data pattern], observations missing on one variable are a subset of those missing on another variable. That is, missingness is nested.

One example of monotone missing data is study *dropout*. If a subject drops out of a study at time `$t$`, then their observations will also be missing at times `$t+1$`, `$t+2$`, and so forth.

When missing data follow such a pattern, the group of responses is never larger at a later follow-up time than it is at an earlier time.

Missing data are .hlight[nonmonotone] when missingness is not nested in this manner, or is intermittent.

---
## Types of nonresponse (missing data)

.hlight[Unit nonresponse]: the individual has no values recorded for any of the variables.

.hlight[Item nonresponse]: the individual has values recorded for at least one variable, but not all variables.

<table>
 <caption>Unit nonresponse vs item nonresponse</caption>
 <tr>
 <th> </th>
 <th height="30px" colspan="3">Variables</th>
 </tr>
 <tr>
 <th> </th>
 <td height="30px" style="text-align:center" width="100px"> X1 </td>
 <td style="text-align:center" width="100px"> X2 </td>
 <td style="text-align:center" width="100px"> Y </td>
 </tr>
 <tr>
 <td height="30px" style="text-align:left"> Complete cases </td>
 <td style="text-align:center"> &#10003 </td>
 <td style="text-align:center"> &#10003 </td>
 <td style="text-align:center"> &#10003 </td>
 </tr>
 <tr>
 <td rowspan="3"> Item nonresponse </td>
 <td rowspan="3" style="text-align:center"> &#10003 </td>
 <td height="30px" style="text-align:center"> &#10003 </td>
 <td style="text-align:center"> &#10067 </td>
 </tr>
 <tr>
 <td height="30px" style="text-align:center"> &#10067 </td>
 <td style="text-align:center"> &#10067 </td>
 </tr>
 <tr>
 <td height="30px" style="text-align:center"> &#10067 </td>
 <td style="text-align:center"> &#10003 </td>
 </tr>
 <tr>
 <td height="30px" style="text-align:left"> Unit nonresponse </td>
 <td style="text-align:center"> &#10067 </td>
 <td style="text-align:center"> &#10067 </td>
 <td style="text-align:center"> &#10067 </td>
 </tr>
</table>

---
## Strategies for handling missing data

Item nonresponse:
  + use complete/available cases analyses
  + single imputation methods
  + multiple imputation
  + model-based methods

Unit nonresponse: 
  + weighting adjustments
  + model-based methods (identifiability issues!).

.hlight[We focus primarily on item nonresponse. Discussions on unit nonresponse are beyond the scope of this course].

If you are very interested in building models for both unit and item nonresponse, here is the arXiv link to a paper on some of the research I have done on the topic: https://arxiv.org/pdf/1907.06145.pdf.

---
## Missing data mechanisms

Data are said to be .hlight[missing completely at random (MCAR)] if the reason for missingness does not depend on the values of the observed data or missing data.

For example, suppose
  - you handed out a double-sided survey questionnaire of 20 questions to a sample of participants;
  - questions 1-15 were on the first page but questions 16-20 were at the back; and
  - some of the participants did not respond to questions 16-20.
 
Then, the values for questions 16-20 for those people who did not respond would be .hlight[MCAR] if they simply did not realize the pages were double-sided; they had no reason to ignore those questions.
 
**This is rarely plausible in practice!**

---
## Missing data mechanisms

Data are said to be .hlight[missing at random (MAR)] if, conditional on the values of the observed data, the reason for missingness does not depend on the missing data.

Using our previous example, suppose
  - questions 1-15 include demographic information such as age and education;
  - questions 16-20 include income related questions; and
  - once again, some participants did not respond to questions 16-20.

Then, the values for questions 16-20 for those people who did not respond would be .hlight[MAR] if younger people are more likely not to respond to those income related questions than old people, where age is observed for all participants.
  
**This is the most commonly assumed mechanism in practice!**

---
## Missing data mechanisms

Data are said to be .hlight[missing not at random (MNAR or NMAR)] if the reason for missingness depends on the actual values of the missing (unobserved) data.

Continuing with our previous example, suppose again that
  - questions 1-15 include demographic information such as age and education;
  - questions 16-20 include income related questions; and
  - once again, some of the participants did not respond to questions 16-20.

Then, the values for questions 16-20 for those people who did not respond would be .hlight[MNAR] if people who earn more money are less likely to respond to those income related questions than old people.

**This is usually the case in real data, but analysis can be complex!**

---
## Mathematical formulation

Consider the multivariate data `$\boldsymbol{Y}_i = (\boldsymbol{Y}_1,\ldots,\boldsymbol{Y}_n)^T$`, where `$\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T$`, for `$i = 1,\ldots, n$`.

For now, we will assume the multivariate normal model as the sampling model, so that each `$\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T \sim \mathcal{N}_p(\boldsymbol{\theta}, \Sigma)$`.

It is easy to extend the formulation to allow for predictors, and also within the context of hierarchical modeling.

Suppose now that `$\boldsymbol{Y}$` contains missing values.

We can separate `$\boldsymbol{Y}$` into the observed and missing parts, that is, `$\boldsymbol{Y} = (\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis})$`.

Then for each individual, `$\boldsymbol{Y}_i = (\boldsymbol{Y}_{i,obs},\boldsymbol{Y}_{i,mis})$`.

---
## Mathematical Formulation

Let
  + `$j$` index variables (where `$i$` already indexes individuals),
  + `$r_{ij} = 1$` when `$y_{ij}$` is missing,
  + `$r_{ij} = 0$` when `$y_{ij}$` is observed.

Here, `$r_{ij}$` is known as the missingness indicator of variable `$j$` for person `$i$`.

Also, let 
  + `$\boldsymbol{R}_i = (r_{i1},\ldots,r_{ip})^T$` be the vector of missing indicators for person `$i$`.
  + `$\boldsymbol{R} = (\boldsymbol{R}_1,\ldots,\boldsymbol{R}_n)$` be the matrix of missing indicators for everyone.
  + `$\boldsymbol{\psi}$` be the set of parameters associated with `$\boldsymbol{R}$`.

Assume `$\boldsymbol{\psi}$` and `$(\boldsymbol{\theta}, \Sigma)$` are distinct.

---
## Mathematical Formulation

MCAR:
.block[
`$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{\psi})$$`
]

MAR:
.block[
`$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi})$$`
]

MNAR:
.block[
`$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi})$$`
]

---
## Implications for likelihood function

Each type of mechanism has a different implication on the likelihood of the observed data `$\boldsymbol{Y}_{obs}$`, and the missing data indicator `$\boldsymbol{R}$`.

Without missingness in `$\boldsymbol{Y}$`, the likelihood of the observed data is
.block[
`$$p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)$$`
]

With missingness in `$\boldsymbol{Y}$`, the likelihood of the observed data is instead
.block[
$$
`\begin{split}
p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
\end{split}`
$$
]

Since we do not actually observe `$\boldsymbol{Y}_{mis}$`, we would like to be able to integrate it out so we don't have to deal with it.

That is, we would like to infer `$(\boldsymbol{\theta}, \Sigma)$` (and sometimes, `$\boldsymbol{\psi}$`) using only the observed data.

---
## Likelihood function: MCAR

For MCAR, we have:
.block[
$$
`\begin{split}
p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = \int p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot \int p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma). \\
\end{split}`
$$
]

For inference on `$(\boldsymbol{\theta}, \Sigma)$`, we can simply focus on `$p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)$` in the likelihood function, since `$(\boldsymbol{R} | \boldsymbol{\psi})$` does not include any `$\boldsymbol{Y}$`.

That is, the missing-data mechanism here is .hlight[ignorable] for likelihood-based inference.

---
## Likelihood function: MAR

For MAR, we have:
.block[
$$
`\begin{split}
p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs}, \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi}) \cdot \int p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
& = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma). \\
\end{split}`
$$
]

For inference on `$(\boldsymbol{\theta}, \Sigma)$`, we can once again focus on `$p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)$` in the likelihood function. Again, the missing-data mechanism is .hlight[ignorable].

However, there can be some bias if we do not account for `$p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi})$`, especially if `$\boldsymbol{\psi}$` and `$(\boldsymbol{\theta}, \Sigma)$` are not distinct.

Also, if we want to infer the missingness mechanism through `$\boldsymbol{\psi}$`, we would need to deal with `$p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi})$` anyway.

---
## Likelihood function: MNAR

For MNAR, we have:
.block[
$$
`\begin{split}
p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\
\end{split}`
$$
]

The likelihood under MNAR cannot simplify any further.
  
In this case, we cannot ignore the missing data when making inferences about `$(\boldsymbol{\theta}, \Sigma)$` (.hlight[nonignorable missing-data mechanism]).
  
We must include the model for `$\boldsymbol{R}$` and also infer the missing data `$\boldsymbol{Y}_{mis}$`.

---
## Missing data mechanisms: how to tell in practice?

So how can we tell the type of mechanism we are dealing with?

In general, we don't know!!!

So, when conducting studies, it is very important to do everything possible to collect data on the reasons for missing values or dropouts, so that the investigator can determine the missing data mechanism.

That way, the decision can be made regarding the missing mechanism, and analysis can properly account for the missing data mechanism if necessary.

---
## Missing data mechanisms: how to tell in practice?

Rare (very!) that data are MCAR (unless planned beforehand)

Possible that data are MNAR

**Compromise**: assume data are MAR if we include enough variables in model for the missing data indicator `$\boldsymbol{R}$`.

Again, we will mostly focus on talking about missing data in the context of MCAR and MAR.

---
## Covariate-dependent missingness

What happens when there are also covariates to consider?

Well, in general, missingness in the outcomes that depends on covariates is not a problem, as long as you condition on the covariates.

As a very simple example, let `$X_i$` be a treatment group indicator, with `$Y_i \sim N(\mu_0,\sigma^2)$` if `$X_i=0$` and `$Y_i \sim N(\mu_1,\sigma^2)$` if `$X_i=1$`.

Suppose that `$X_i$` is always observed but that some `$Y_i$` are missing.

Write
`$$Pr(R_i=1 | X_i=0)=\pi_0 ~~~~~~~~Pr(R_i=1 | X_i=1)=\pi_1$$`

Conditional on treatment group, the observed `$Y_i$`'s are a random subgroup of all responses within a treatment group.

---
## Covariate-dependent missingness

Then we can show that
`$$E(Y_i \mid R_i=1,X_i)=E(Y_i \mid X_i)$$`

and
`$$f(Y_i \mid R_i=1,X_i)=f(Y_i \mid X_i)$$`

but
`$$E(Y_i \mid R_i=1) \neq E(Y_i).$$`

However, because we are not interested in `$E(Y_i)$` averaged over treatment groups, this is not a concern.

Conditional on `$X_i$`, our missingness is MCAR, so inferences based on complete data will be valid.

If we do not condition on `$X_i$`, and `$X_i$` and `$Y_i$` are related, then lack of conditioning on `$X_i$` may introduce bias into the analysis.

---
## Illustration

Simple example using data that come with the .hlight[MICE] package in R.

Dataset from NHANES includes 25 cases measured on 4 variables.

Only 13 cases with complete data.

The four variables are
  1. age (age group: 20-39, 40-59, 60+)
  2. bmi (body mass index, in `$kg/m^2$`)
  3. hyp (hypertension status: no, yes)
  4. chl (total cholesterol, in `$mg/dL$`)
  
Suppose the goal is to predict `bmi` by `age`, and `chl`.

---
## Illustration

```r
library(mice)
data(nhanes2)
dim(nhanes2)
```

```
## [1] 25  4
```

```r
summary(nhanes2)
```

```
##     age          bmi          hyp          chl       
##  20-39:12   Min.   :20.40   no  :13   Min.   :113.0  
##  40-59: 7   1st Qu.:22.65   yes : 4   1st Qu.:185.0  
##  60-99: 6   Median :26.75   NA's: 8   Median :187.0  
##             Mean   :26.56             Mean   :191.4  
##             3rd Qu.:28.93             3rd Qu.:212.0  
##             Max.   :35.30             Max.   :284.0  
##             NA's   :9                 NA's   :10
```

```r
str(nhanes2)
```

```
## 'data.frame':	25 obs. of  4 variables:
##  $ age: Factor w/ 3 levels "20-39","40-59",..: 1 2 1 3 1 3 1 1 2 2 ...
##  $ bmi: num  NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
##  $ hyp: Factor w/ 2 levels "no","yes": NA 1 1 NA 1 NA 1 1 1 NA ...
##  $ chl: num  NA 187 187 NA 113 184 118 187 238 NA ...
```

---
## Patterns of missing data

```r
md.pattern(nhanes2)
```
<img src="img/miss-data-pattern.png" width="450px" height="400px" style="display: block; margin: auto;" />

5 patterns observed from `$2^3=8$` possible patterns

---
## Patterns of missing data

.hlight[At the bottom]: total number of missing values by variables.

.hlight[On the right]: number of variables missing in each pattern.

.hlight[On the left]: number of cases for each pattern.

---
## Visualizing patterns of missing data

```r
library(VIM); library(lattice)
aggr(nhanes2,col=c("lightblue3","darkred"),numbers=TRUE,sortVars=TRUE,
     labels=names(nhanes2),cex.axis=.7,gap=3,
     ylab=c("Proportion missing","Missingness pattern"))
```

```
## 
##  Variables sorted by number of missings: 
##  Variable Count
##       chl  0.40
##       bmi  0.36
##       hyp  0.32
##       age  0.00
```

---
## Visualizing patterns of missing data

The .hlight[marginplot] function can be used to understand how missingness affects the distribution of values on other variables.

.hlight[Blue box plots] summarize the distribution of .hlight[observed data given the other variable is observed].

.hlight[Red box plots] summarize the distribution of .hlight[observed data given the other variable is missing].

If data are MCAR, you expect the boxplots to be the same (hard to evaluate in this small sample)

Let's look at the margin plot for the two continuous variables `bmi` and `chl`.

---
## Visualizing patterns of missing data

```r
marginplot(nhanes2[,c("chl","bmi")],col=c("lightblue3","darkred"),cex.numbers=1.2,pch=19)
```

---
## Visualizing patterns of missing data

Interpretation of the numbers in red.

+ 9 = number of observations with missingness in `bmi`
  
  + 10 = number of observations with missingness in `chl`
  
  + 7 = number of observations with missingness in both `bmi` and `chl`.

The scatterplot of blue points display the relationship between `bmi` and `chl` when they are both observed (13 cases).

The red points indicate the amount of data used to generate the red boxplots.

---
## Imputation during model fitting

We will do this process a bit more carefully using the next example.

For now, the code for fitting the model to predict `bmi` by `age`, and `chl` using the .hlight[brms] package and imputing the missing values within the sampler is as follows.

```r
bform <- bf(bmi | mi() ~ age * mi(chl)) +
 bf(chl | mi() ~ age) +
 set_rescor(FALSE)
fit_imp <- brm(bform, data = nhanes2,
 iter = 1e4, chains = 2, cores = 2,
 seed = 14, control=list(adapt_delta=0.99))
summary(fit_imp)
```

---
## Results

.small[

```r
bform <- bf(bmi | mi() ~ age * mi(chl)) +
 bf(chl | mi() ~ age) +
 set_rescor(FALSE)

fit_imp <- brm(bform, data = nhanes2,
 iter = 1e4, chains = 2, cores = 2,
 seed = 14, control=list(adapt_delta=0.99))
summary(fit_imp)
```

```
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: bmi | mi() ~ age * mi(chl) 
##          chl | mi() ~ age 
##    Data: nhanes2 (Number of observations: 25) 
## Samples: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## bmi_Intercept         12.16      4.08     4.15    20.42 1.00     5993     5601
## chl_Intercept        168.88     14.73   139.36   197.59 1.00     7511     6812
## bmi_age40M59          32.78     13.09     7.48    59.41 1.00     4673     4770
## bmi_age60M99           4.89     11.12   -18.26    26.40 1.00     4294     5049
## chl_age40M59          33.19     24.37   -15.56    82.17 1.00     8227     7232
## chl_age60M99          57.99     28.04     2.43   113.03 1.00     7499     7295
## bmi_michl              0.10      0.02     0.05     0.14 1.00     5993     5667
## bmi_michl:age40M59    -0.19      0.07    -0.33    -0.06 1.00     4671     4783
## bmi_michl:age60M99    -0.07      0.05    -0.16     0.04 1.00     4305     5191
## 
## Family Specific Parameters: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_bmi     2.73      0.70     1.72     4.41 1.00     2838     5261
## sigma_chl    41.19      7.96    28.97    59.91 1.00     4639     5713
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```
]

---
## A slightly more interesting example

Researchers are interested in the hypothesis that primates with larger brains produce milk with higher energy content so that brains can grow more quickly.

We consider the .hlight[outcome of energy content in milk] (kcal of energy per g of milk) and predictors including the average female body mass (kg) and the percent of total brain mass that is neocortex mass.

The neocortex is the grey, outer part of the brain that is particularly developed in mammals, especially primates.