I want to test whether daily Stress measures over IVF predicts Pregnancy, while also including demographic and clinical variables. I was also interested in whether Stress levels changed over IVF, and whether variablity in stress was related to pregnancy. As an inexperienced user of MPlus I wanted to check I am doing what I think I'm doing - and interpreting appropriately.
USEVARIABLES ARE ID Yrs_Inf Attempts time Pregnant Stress10 Age10; WITHIN = time; BETWEEN = Yrs_Inf Age10 Attempts Pregnant; CATEGORICAL = Pregnant; CLUSTER = ID; LAGGED = Stress10(1);
MODEL: %WITHIN% auto | Stress10 ON Stress10&1; IIV | Stress10; STGrwth | Stress10 ON time;
%BETWEEN% Pregnant ON STGrwth Stress10 auto IIV Yrs_Inf AGE10 Attempts; Stress10 STGrwth auto iiv ON Yrs_Inf AGE10 Attempts; Stress10 STGrwth auto IIV WITH Stress10 auto IIV STGrwth;
Among other things I find a significant PREGNANT ON STRESS10 result, indicating more stress is predicting poorer pregnancy outcomes. And that those with more prior attempts at IVF have more variable stress profiles as indicated by a significant IIV ON ATTEMPTS result. Am I getting this right? Any help or advice most appreciated.
Fantastic thanks, Bengt. It's great to be able to look at data this way with these new Mplus additions!
In this case, T & N are not ideal. N is approx 200, T = 21, as it was daily diaries rather than true EMA. I know from reading the slides that there may be issues with estimates with smaller T's (e.g., <50) - so I'd be interested in what your perspective is on what this means for an analysis like this. Does it make it unsuitable - or just slightly less precise? Are there anyways to mitigate such issues while still having this functionality?
A second query I have is that I have some clusters with no variance (i.e., they responded the same at each time point). Does including these cause issues and should they be removed? Easily done, but this raises the issue of whether it's theoretically sound to remove the 'most stable' from the analyses.
Regarding the sample size, we performed a simulation study (available on statmodel.com under the Special topics: DSEM: Schultzberg & Muthén(2017)). This showed that large N is more important than large T for the performance of these models: T=21 can be fine with N=200. The study showed that the random residual variance is demanding in terms of data especially when using it as a predictor on the between level, like in your model. I would consider fixing the residual variance across subjects, or keep it random but excluding it from the between level regressions.
Your second query is an interesting one. This is still a research question at this point which I know people are working on. If the model converges well, I would think it is alright to keep them. If it is not that many subjects, you can try to exclude them and see how it affects the estimation.
Thank you - but presumably if I fix IIV I am not able to look at how between person differences in IIV predicts pregnancy outcomes in that model.
So presuming I am interested in that question: Am I best to look at that separately in another model (so I am not trying to estimate too many things at once), or is it just a case of leaving it as currently specified, but accepting the estimates may potentially have some issues because allowing IIV to be random and then using it in between level regressions is always going to be demanding so ideally needs large N (and ideally T)?
Apologies if the answers to these questions are obvious. These analyses are taking me beyond my comfort zone, but I am fascinated by these analysis possibilities.
accepting the estimates may potentially have some issues because allowing IIV to be random and then using it in between level regressions is always going to be demanding so ideally needs large N (and ideally T)
DSEM is all based on Bayes for which Mplus uses a probit link, so odds ratios are not directly obtained as exponentiated logit slopes as it would be with logit link. You can, however, compute estimated probabilites for the distal as a function of different values of its predictors. From those probabilities, it is then also possible to compute odds ratios although they won't be constant with respect to the other predictors as with logit link.
OK thanks Bengt, this is definitely getting beyond the realms of my understanding.
I suppose I'm thinking about the best way one reports this kind of analyses: if these were simple logistic regressions say of pregnancy ON stress I'd report Odds Ratios, Beta & P for example. Are there any examples of how you would report these kinds of analyses? Should the two-level diagram be manually drawn with Beta weights added (Standardised or Unstandardised)?
Here is an article that pioneered this type of analysis:
Wang-Hamaker-Bergeman 2012 Psych Methods Intra-individual differences in intra-individual variation
Maybe that can give you ideas for how to report results.
I would focus my reporting on the probability of pregnancy for different values of its predictors (for instance varying the values of one predictor from -1SD from its mean to +1SD from its mean, holding the other predictors at their means). With the probit link for pregnancy that you are using, the probability is simply (with the example of 2 predictors):
where Phi is the standard normal distribution function (the one that you can compute z-scores from) which you find in intro stat books. Or, use Model Constraint to compute this probability expression (see, e.g., pages 232-233 of our RMA book). It isn't complicated.