what would you consider to be the "best" estimator for a 2-level-CFA with categorical observations?
I'd like to do model evaluation and comparisons as well. So could you please give me a short briefing, which of the three estimators available is the appropriate one concerning ordinal data, model comparison and of course speed?
I would recommend MLR which is the default. Speed would be approximately the same because all available estimators are maximum likelihood. You would have to use the scaling factor which is provided in the output to do model comparisons. How to do this is described on the website.
Regarding the scaling factor of MLR: McDonald & Ho (2002)wrote that it┬┤s useful to obtain a separate chi┬▓-statistic for the structural part (path model) of a full SEM by subtracting the maximum likelihood chi┬▓/df of the measurement model from the chi┬▓/df of the full SEM. Is this also reasonable with MLR and type=complex? Many thanks for your thoughts!
McDonald, R.P. & Ho, M.-H.R. (2002). Principles and Practice in Reporting Structural Equation Analyses. Psychological Methods, 7(1), 64-82.
OK, thanks a lot. Yet another question: I would like to estimate a LV-interaction in combination with TYPE=COMPLEX. Because of the XWITH-command, there is no chi▓ and no scaling correction factor, but the SE are different from the ones I obtain from a ML-analysis with TYPE=GENERAL. Does that mean that MLR still adjusts the SE for the degree of non-independence? In addition, does MLR adjusts also for non-normality in this case, since the analysis of the indicator-distribution seems to be part of the LMS-approach (Klein & Moosbrugger, 2000) for LV-interaction itself ?
OK, maybe I don't understand this correctly, but is there not a conflict between MLR (adjusting for nonindependence and non-normality) and XWITH (analyzing the non-normality and taking it explicitly into account)?
It is an interesting question. Consider a case without type = complex (no complex survey data) - if non-normality is only due to the latent variable interaction handled by XWITH, I don't think the sandwich estimator used to compute SEs by MLR would necessarily do any better than ML. But MLR would probably not do worse. But if other parts of the model has non-normal outcomes, MLR might do better than ML. With type=complex there is no choice but to use MLR given the need for the sandwich estimator to take care of the non-independence.
So you say, that MLR would probably do no harm in combination with XWITH. There is one more question: Does adding a LV-interaction via XWITH change the identification status of the model, since one more parameter has to be estimated? As far as I remember, the Klein & Moosbrugger (2000) - article doesn't mention this topic.
Many thanks for your insights, they are really invaluable.
I have a related concern as my reviewers question my identification status given the complexity of a model with interaction and some constructs with less than 3 items.
The model converges fine and does meet the "t-rule" suggested by Bollen (1989), however, how can I rule out empirical under-identification? I found additional rules for establishing model identification for models with less than three indicators (O'Brien, 1994), however, it does not discuss interaction models specifically.
So the question is: Does a model with interaction change the identification requirements in Mplus?
This has not been studied as far as I know. With latent variable interactions, not only is the regular information from means, variances, and covariances used, but also higher-order moments. My conjecture is that (1) a model that is identified without the interaction is typically identified also with the interaction, whereas (2) a model that is not identified without the interaction cannot be identified when adding the interaction. For (1), there might still be cases of non-identification, but hopefully the Mplus non-identification check using the singularity check of the sums of squares and cross-products of first-order derivatives (the "MLF check") will flag such a model as non-identified. A good empirical way to study identifiability is to do a Monte Carlo study and see if parameter estimates can be recovered well and if SEs are estimated well. For more information on this topic, you may want to contact Andreas Klein.
Thank you so much - this is very useful. I am, however, running a single-level model with one interaction and thus it seems that the MLF estimator is not applicable here or?
I did run a Monte Carlo study and it seems that the results are robust.
I guess my main question now is: Is there another command/test I can use for a single-level study with interaction to test for model identification. The model is identified without the interaction and I would like to be able to say that Mplus did not flag the model as non-identified.
The MLF check is done irrespective of which ML estimator you use: ML, MLR, MLF.
RDU posted on Wednesday, December 03, 2008 - 10:30 am
Hello. I am trying to perform a series of CFA models with ordinal indicators. The data are nested (e.g., students within schools). The sample size is around 600.
Based on the articles I've read and from what I've seen on the Mplus discussion board, I was wondering if it is appropriate to use TYPE=COMPLEX in conjunction with the MLR Estimator, since Dr. Muthen stated earlier that MLR adjusts for non-independence. Thus, if you have TYPE=COMPLEX, MLR adjusts for non-independence and non-normality.
Here is a copy of my Mplus code. Thanks loads.
TITLE: DATA: FILE IS G:\FactorAnalysis\ascii.dat; VARIABLE: NAMES school var1 var2 var3 var4;
USEVARIABLES= school var1 var2 var3 var4;
MISSING ARE ALL .; CATEGORICAL=var1 var2 var3 var4; CLUSTER = school; ANALYSIS: TYPE = COMPLEX; MODEL: F BY VAR1 VAR2 VAR3 VAR4;
A factor model is not an aggregatable model so I would use TYPE=TWOLEVEL to account for clustering not TYPE=COMPLEX.
RDU posted on Wednesday, December 03, 2008 - 12:03 pm
To make sure that I understand, you are saying that for nested data with latent continuous variables, one must use a multilevel model where the within and between-level variance are disaggregated. In other words a sandwich estimator cannot be used in this case, and only a multilevel model or random effects model using the command "TYPE=TWOLEVEL" can be used (as opposed to looking at an aggregated model using TYPE=COMPLEX). Is this correct?
The topic of aggregatability was discussed for factor analysis in Muthen & Satorra (1995), Sociological Methodology.
RDU posted on Thursday, December 04, 2008 - 8:25 am
I'm sorry to keep at this, but I am still a bit confused. Muthen and Satorra (1995) state that the 2 methods for dealing with SEM/CFA models with complex sample data are: 1.) aggregated analysis and 2.) disaggregated analysis (i.e., multilevel cfa/sem).
Furthermore, Ch. 9 of the user's guide states that Muthen and Satorra (1995) discuss these 2 approaches, where the first aggregated approach corresponds to using the TYPE=COMPLEX command. The second approach (disaggregated approach) is said to use the TYPE=TWOLEVEL.
Since my aim is to correct the standard errors for my categorical CFA models and not to look at models for both the student and the school-levels of my data, I do not understand why the TYPE=COMPLEX command was not recommended earlier. Perhaps I am not understanding everything, so could you please clarify this for me?
Also, if it is allright to use the TYPE=COMPLEX command, then I was also wondering whether it was advisable to use MLR estimation for a categorical CFA model in conjunction with the TYPE=COMPLEX command. I believe the default estimation for this is WLS.
Using Type=Complex is better than ignoring the nested nature of the data, giving better SEs. It is, however, taking an "aggregated" approach to the modeling which implies that the parameter estimates may be a bit distorted relative to those of a "disaggregated" approach using Type=Twolevel. This is discussed a bit in Muthen-Satorra (1995) on pages 290-291. The discussion says that if a twolevel model with equal factor loading matrices on the within and between levels hold, then the aggregated approach is correct. But if within and between have different numbers of factors - which is often the case - the aggregated approach is distorted to some degree. Say that a simple structure 2-factor model holds for Sigma_W and a 1-factor model holds for Sigma_B. This does not result in a simple structure factor model for Sigma_T, Sigma_T being the covariance matrix in the aggregated approach. Often, however, the distortion is not large. And again, the disaggregated approach of Type=Complex is better than ignoring the nesting.
I currently ran a very simple path analysis model using complex survey data, just to test it out in Mplus, as I will then have a much more complicated model.
I have covariates (linked to both of my predictor variables), 2 predictor variables, one mediating variable, and 1 outcome variable. My predictor, mediating, and outcome variables are all continuous variables.
My model ran well, and actually had a good fit. However, I wasn't sure what the difference between specificying Estimator to be "MLR" and not specifying it was? Given that my next step will be to test this same model as a multiple group path analysis model (males and females), I'm wondering how specifying particular estimators will work in terms of the chi-square test difference test.
Each analysis situation has a default estimator. If your specify ESTIMATOR=MLR when it is not the default, it overrides the default.
For ML and WLS, regular difference testing is used. For estimators ending in MV, the DIFFTEST option is used. For estimators ending in M and for MLR, a scaling correction factor is used in difference testing. This is described on the website.
Thank you so much for your response. I guess I'm confused then as to whether I should leave out the "ESTIMATOR=MLR" after:
STRATIFICATION IS SESTRAT; CLUSTER IS NSECLUTR; WEIGHT IS NEWWEIGHT; ANALYSIS: TYPE=COMPLEX;
I'm guessing that because I am running multiple group analysis with complex data, that I would have to use "MLR"?
Would I still be able to do the regular difference testing using the default estimator (if I leave MLR out) or is MLR the default for this type of analysis?
From what I read in the User's Guide, it says "for all types of outcomes, robust estimation of standard errors and robust chi-square tests of model fit are provided. These procedures take into account non-normality of outcomes and non-independence of observations due to clustering sampling."
So I'm just not sure what the default estimator for Type=Complex and whether I have to specify a specific one when using complex samples with multiple-group analysis.
You never need to specify an estimator. Just leave the option out and the default will be used. The defaults are shown on pages 482-483. You don't give enough information for me to know what the default would be for you.
I am trying to compare two groups on measurement invariance. I am using a MLM estimator as some of my variables are not normally distributed. But with the statistics of the unconstrained base model Chi-sq =0 and df=0 I am not sure how to apply the scaling factor to the data.
The data for my constrained model are as follows: Chi-Square = 11.82 Df = 6 p-value = .07 Scaling correction = 1.05
Please send the two outputs showing the constrained and unconstrained models and your license number to email@example.com.
Hans Leto posted on Friday, April 13, 2012 - 9:45 am
I am performing different interaction effects. My data has not multivariant normality, so I am using a robust estimator (MLM).
When I test the model, Mplus gives me the following error "THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH..." suggesting that the results are presented for the MLF estimator.
I used "estimator=MLF" in the model command and I had nice results without error. My question is, Can I use the MLF estimator in this analyisis? (non-normality).
My research team and I are considering using MLR vs. WLSMV. After reading some articles and going through your discussion board posts, we think that WLSMV would be sufficiently robust, but we are having a hard time understanding why and if there are any citations we can use.
Our SEM analysis is multi-level with both categorical and continuous variables and data is assumed MAR and nonnormal. Our DV is a latent construct made up of 9 variables that are scaled on 1-3 based on frequency (not at all, occasionally, frequently).
If MCAR does not hold, but MAR does, WLSMV estimates will be biased and MLR estimates ok. If it is computationally not too heavy (not too many latent variables), MLR is better than WLSMV due to using full information. Another alternative is Bayes, which is as good as MLR, but can handle more latent variables.
Just to be sure since you use the word "robust", when you say MLR I hope that you mean treating the variables as categorical just like WLSMV does. Sometimes people say ML (or MLR) when they really mean treating the variables as continuous.
Our DV is being treated as a continuous latent variables. Actually all of our latent variables (four total) are treated as continous. Then we have both continuous and categorical observed variables (about 20 total), and the categorical variables are identified as such in our syntax.
When you say that WLSMV is "better" would it be possible to still justify its use over MLR?
One hesitation that we have with MLR is how long it's taking to run given our large sample, and also we understand that using MLR precludes us from deriving the indirect effect estimates.
It sounds like you are getting chi-square values for the frequency table for your categorical outcomes. This is not the chi-square that compares the unrestricted H1 and the H0 models. This chi-square and related fit statistics are not available unless means, variances, and covariances are sufficient statistics for model estimation which is not the case with maximum likelihood and categorical outcomes.
are there any recommendations concerning the use of FIML / ML versus the default MLR as estimator in a "two-level random" analysis? I used ML as estimator as I gathered from the literature that ML is the preferred practice in my field and the required estimation method for models with random slopes and for comparing models with log likelihood difference tests. It also seems to be assumed that ML tolerates slight violations of the non-normality requirement.
But how performs ML compared to MLR? Did I make a grave mistake in choosing ML over MLR? My sample size is 165 on L2 and about 700 on L1; the cluster size varies between 3-5. All my variables are continuous. The independent variables all seem to be approximately normal distributed or only very slightly skewed but some of the independent variables are clearly not normally distributed.
Sadly, the SEs seem to differ quite a bit so that some formerly significant effects are no longer significant. Am I correct in concluding that I cannot trust the results obtained with FIML as estimator and thus have to rerun the analyses with MLR? And it is also possible to do the log likelihood tests with MLR estimates - I just have to use the correction factor as it is explained on your homepage, right? (I am sorry if these are stupid questions; I just want to make sure I do nothing wrong again.)