4
$\begingroup$

I have a dataset with repeated measures, different individuals each have six appointments in total. The outcome variable is continuous. I want to know if I should use a GLMM or a LMM to see what the effect of time is on the outcome. My first thought was a LMM, because I have a continuous outcome, but the residuals of my model are not linear. Does this mean I then have to use a GLMM? And when is it better to use glmmTMB or glmer?

model_LMM <- lmer(levels ~ appointment + (1 | Individual), data = combined_global_df)

model_GLMM <- glmmTMB(levels ~ appointment + (1 | Individual),
                      data = combined_global_df,
                      family = Gamma(link = "log"))
New contributor
Maryam is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$

1 Answer 1

5
$\begingroup$

Variance-mean relationship

A difference between the use of a gamma family and a normal family will be the relationship between mean and variance.

  • For a normal family, the variance of the outcome variable is considered constant and doesn't change when the mean is different.

    This occurs in many situations. It happens when the distribution is due to additive errors, which don't change for a different mean of the distribution.

  • For a gamma family the variance is considered to scale with the square of the mean. E.g. for conditions where the mean is $q$ times larger, the variance will be $q$ times larger.

    This is as if the mean is a scaling parameter. (if you scale a distribution, then you get that the mean increases linearly with the scaling and the variance scales as a square of the scaling).


Link function

In addition, the gamma family will use a inverse link function as default. That will change the relationship between your parameters and the mean

$$E[Z]_{\text{given $x$}} = \frac{1}{a+bx}$$

instead of

$$E[Z]_{\text{given $x$}} = {a+bx}$$

The link function is not essential to a family and can be changed. The default relates to the canonical link function.

Often the canonical link function is also a practical transformation. Ie. when the relationship between a parameter and the mean is like the link function. Logistic regression is an example. Besides being the canonical link function, it also occurs in many different ways as a practical function or due to some underlying principle (for the inverse link function, I believe there might also be something practical or logic behind it, but it is not on the tip of my tongue).


Which is best

Choose the model that is suitable. The model that you believe describes the data the best.

The GLMM might have one disadvantage , which is that it is nore computational heavy. The mixed effects (Gaussian distributed) will need to be combined with the errors (Gamma distributed) which doesn't have a nice analytical solution as in LMM.

The effect of using a wrong model is not always harmfull. The advantage of a better model is more accuracy (often the fit will be more efficient and closer to reality), but a low accurate model can still be useful.

Also, be aware that if the model has a low goodness of fit (if the assumptions about the distribution are bad), then inference with hypothesis testing will be incorrect. A hypothesis about some effect implicitly includes assumptions about the statistical model. So if we reject a hypothesis, then it is a rejection of the stated hypothesis about the effect and the assumptions. In practice, if the assumptions are reasonable and trusted to be accurate, then they are left out in the conclusions.

$\endgroup$
10
  • $\begingroup$ One major difference not discussed here is that in the context of a mixed model, the identity link will result in the marginal model equating the conditional one. That's no longer the case under a non-identity link, so you are not getting estimates of the same quantity! $\endgroup$
    – PBulls
    Commented yesterday
  • $\begingroup$ @PBulls I feel that you are onto something with "the identity link will result in the marginal model equating the conditional one." But, I currently don't get it directly. $\endgroup$ Commented yesterday
  • $\begingroup$ Have a look at e.g. this and this -- most discussion is about the binomial case but it holds for any non-linear link. Highly recommend Dimitris' course material in the answer behind that first link. $\endgroup$
    – PBulls
    Commented yesterday
  • $\begingroup$ @PBulls to be honest I start to not understand your point. Obviously different link functions will make differences, but the question is about distribution families. Do you consider the link function to be part of the question? $\endgroup$ Commented yesterday
  • $\begingroup$ No, my point is that the LMM will give the marginal model -- which is often the one of interest for population-level inference -- out of the box, while any other non-identity linked mixed model won't. You need extra integration steps to get the marginalized coefficients. If you don't take those then the LMM and GLMM may well give you different estimands if it isn't the conditional model you're interested in. To rephrase: do you need a GLMM? Perhaps, but keep in mind that you can't take its coefficients/inference at face value if you want the population-level ones! $\endgroup$
    – PBulls
    Commented yesterday

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.