I am estimating several standard CFA models with dichotomous data, and have been using the robust WLS estimator. I have been requesting modification indices and have noticed that none of the MIs are indicating that I should free any error covariances in any of my models (in fact, the MIs have not mentioned errors at all). I have run similar models (with these variables) in other SEM programs, and MIs have suggested allowing errors to correlate on several occasions. So, I'm wondering if I need to do anything other than request modification indices in the output command to get information on error terms in MPlus.
I have a related question to the one I asked earlier. I am estimating CFA models with correlated errors using the MLR estimator and am running into program errors, for example:
*** ERROR in Model command Covariances for categorical, censored, count or nominal variables with other observed variables are not defined. Problem with the statement: LINTRST WITH DECAPP *** ERROR The following MODEL statements are ignored: * Statements in the GENERAL group: LINTRST WITH DECAPP
Previously, when I was estimating the CFA models with the WLSMV estimator, I didn't receive these errors. In the above example, "lintrst" and "decapp" are observed binary variables. It appears that my syntax (in the MLR case) is commanding the program to correlate the observed variables, whereas the same syntax (in the WLSMV case) is commanding the program to correlate the residual (i.e., error) covariances.
So, my question is, why is this happening? (My other question/concern is when I specify in the model command "lintrst WITH decapp" using the WLSMV estimator, am I commanding the observed variables to be correlated or am I commanding their residual errors to be correlated?)
Do I need to change my syntax to estimate the model using MLR, or is there a more fundamental modeling issue that I am not picking up on?
For the previous example, the full model syntax is as follows: INPUT INSTRUCTIONS
TITLE: NCS depression EFA
DATA: FILE IS "C:\Documents and Settings\Jim\Desktop\EFAnosad2.dat";
VARIABLE: NAMES ARE lintrst decapp decwate incapp incwate earlins midins latins hypins fatigue divar retard agit pleasur lsex worthls sinful guilty inferio conf conc thgt indeci death wdie commit attempt; USEVARIABLES ARE lintrst decapp decwate incapp incwate earlins midins latins hypins fatigue divar retard agit pleasur lsex worthls sinful guilty inferio conf conc thgt indeci death wdie commit attempt; CATEGORICAL ARE lintrst-attempt;
ANALYSIS: TYPE IS GENERAL; ESTIMATOR IS MLR; INTEGRATION = 10; ITERATIONS = 10000; CONVERGENCE = 0.00005;
OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED TECH3;
MODEL: f1 BY lintrst decapp decwate incapp incwate earlins midins latins hypins fatigue divar retard agit pleasur lsex conc thgt indeci; f2 BY worthls sinful guilty conf inferio death wdie commit attempt; lintrst WITH decapp;
The algorithm behind maximum likelihood with non-normal outcomes relies on conditional independence because of the heavy computations involving numerical integration.
In your model, you would be estimating residual covariances because lintrst and decapp are dependent variables in the regression of them on the factor f1.
There is a way to specify residual covariances in this situation although it leads to heavier computations. See Example 7.16. If you have a log of residual covariances in your model, WLSMV is a better choice.
I understand the reasoning behind why I cannot specify the correlated errors with MLR as I do for WLSMV. However, after reading example 7.16 I am only loosely understanding how this example would apply to my model.
I think what is confusing me is the latent class aspect to the example. Should I be specifying my factors as classes and then specifying factors to be causes of the residual covariance between items after accounting for the influence of the classes on the items? Would this be "equivalent" to the model I have listed previously or is it something very different?
Also, could you write out the model portion of the syntax for what you are proposing in your response? In other words, could you modify the model portion of the syntax I have provided earlier to reflect how my model could be estimated according to example 7.16?
One final question. When you say the computations will be heavier... does this mean that each estimated "error covariance" will place the same computational burden to the model as would adding an additional latent factor? If this is so, is it reasonable to dramatically reduce the number of integration points used in the analysis? At what number of integration points (on the lower end) would the analysis become untenable? Is there any literature you are aware of regarding varying the number of integration points and the effect it has on results?
I pointed you to Example 7.16 to show you the trick of estimating a residual covariance by using a factor. That's the part you should be looking at. I am not suggesting that you do a mixture model. So if you want to estimate the residual covariance between y1 and y2, for example, you would add the following to the model:
The residual covariance can be picked up from the factor loading of y2. Also, be certain that f does not covary with any other factors.
By computations being heavier, I mean that each residual covariance will add one dimension of integration. In our experience, you can safely go to INTEGRATION = 5; You can also use INTEGRATION = MONTECARLO; But you can use 3, 5, and 7 and see how the results change. I know of no literature on this. Perhaps Bengt might add something if he knows.
bmuthen posted on Saturday, May 21, 2005 - 12:46 pm
Integration = 5 often gives a good, rough approximation to the full ML solution (ML for normal latent variables). And, it is a model in itself in the sense that it is the ML solution for a 5-point discretized normal latent variable distribution for each latent variable. There is literature on number of integration points that one needs to get (almost) exactly the ML solution corresponding to underlying normal variables. It is clear, however, that the number is application-specific and depends strongly on the model-data features. I will post some references shortly.
Here are some references related to this. Lesaffre & Spiessens (2001) in Applied Statistics discuss the need for many integration points in some applications. Bock-Aitkin(1981) in Psychometrika discuss using only 3 and 5 points in IRT, Anderson & Aitkin (1983) also used 3, 5, 7 and 9. Hedeker-Gibbons (1994) say that as few as 3 is enough.
Does using the method you have provided above (i.e., the method Linda suggested) "take care of" the problems that item dependence would cause in IRT parameter estimations? More generally, does using this method have any effects on any model parameter estimations that I should be aware of (other than the estimation of the error covariance parameter)?
Additionally, are there any resources that you could point me to that I could use to defend (and understand) this analytic choice (i.e., in a journal article write-up)?
Allowing the conditional dependence (residual correlation) by adding an extra factor behind two items does affect the parameters estimates for each of the two items as it should - for example, the loadings probably go down a bit since some of the correlation between the items is channeled directly through the residual correlation.
Regarding literature on this, I don't know off hand - there has to be writings on this in IRT contexts, but I can't put my finger on them. Perhaps other Mplus Discussion readers would know; or, search IRT literature for violation of conditional independence. But the approach isn't anything revolutionary; having a residual correlation is simply adding a minor factor for two items.
I am experiencing similar problems to those discussed in this thread. I am attempting to conduct a CFA with two factors. I want to introduce correlated errors between four pairs of variables. I am using complex survey data and the MLR estimator.
Initially, I got an error similar to that of Jim Prisciandaro May 21 2005 9:45 am and subsequently used Linda’s advice to follow eg 7.16 in the manual.
My problem is this = with the two factors, and four correlated errors, Mplus tells me that I need 6 integration points. I tried reducing the number of integration points to 3 and 5. This did not work (an error message saying that there wasn’t enough memory space). I specified integration = 2 and the program ran but I didn’t receive any standard errors and got the following error message:
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-ZERO DERIVATIVE OF THE OBSERVED-DATA LOGLIKELIHOOD.
THE MCONVERGENCE CRITERION OF THE EM ALGORITHM IS NOT FULFILLED. CHECK YOUR STARTING VALUES OR INCREASE THE NUMBER OF MITERATIONS. ESTIMATES CANNOT BE TRUSTED. THE LOGLIKELIHOOD DERIVATIVE FOR PARAMETER 11 IS -0.77208232D+01.
If I increase the number of miterations should this help the problem? I am concerned however that I should be working at 5 integration points as per Bengt’s post on May 21 2005 12:46pm. Any advice on how I might solve this problem would be greatly welcomed.
Your model has too many dimensions of integration. If you want to include residual covariances in your model, you should use a weighted least squares estimator like WLSMV. Numerical integration is not required for this.
Drs. Muthen, I am a new Mplus user and I am generally familiar with latent class analysis. I am currently applying this statistical method to a dataset in which I suspect that several of my items are correlated. I understand how to handle lack of conditional independence using Mplus, but I do not understand how I can determine for which items the assumption of conditional independence is violated. I could not find this information in my manual and I was hoping you could provide some guidance on how I can use Mplus to first examine for this assumption so that I may account for violations in the model.
Hello, I want to ask with regards to the previous question in this thread. I have a latent class model using 7 observed variables (parcels/testlets/standardized summed composites). Regardless of the number of classes I extract (including the optimal number according to BIC and interpretability) I have significant bivariate residual correlations. My question is simply whether freeing those residual correlations between the observed variables effectively "controls" for the lack of conditional independence. In other words if they are freed can I trust my LCA solution? I am also not clear on exactly what the consequences are of violating conditional independence in the LCA context. Any insight you could offer would be trememdous and also any related citations for any of this would be great. Thanks!!!
If you have only very few residual correlations, I think you can trust the results and claim partial conditional independence. This may point to the need for a factor mixture model. See the following paper which is available on the website:
Muthén, B. (2008). Latent variable hybrids: Overview of old and new models. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 1-24. Charlotte, NC: Information Age Publishing, Inc. Click here for information about the book.
Related to the issue that Linda posted on Saturday, May 21, 2005. What if I have e.g. three count DVs and I want to estimate correlations between residual variances using a factor(s), which code should I use:
Thanks, Bengt. But does that code estimate also a correlation between y2 and y3 (this code runs the model fast without errors)? If use the second approach in my post (and specifying INTEGRATION = STANDARD(15)), the code runs a long time and ends up in error message below:
THE ESTIMATED COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 188. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
Any suggestions for solutions? Should I just try to change starting values first?
Yes, my code suggestion (1 factor) estimates a correlation also among y2 and y3 (a factor makes all its indicators correlate). It is just that I use a 2-parameter model vs a 3-parameter model, using a factor to help get positive def cov matrix for these residuals. The fact that the 3-par model gets in trouble may be due to model misfit. I don't think starting values matter here. Perhaps more integration points.
Samuli Helle posted on Tuesday, February 05, 2013 - 12:03 pm
Ok. I must be missing something, because by using that 1 factor code where I can find the residual correlation between y2 and y3? Aren't the factor loadings for y2 and y3 residual correlations with y1? Thanks!
Yes. You give labels to the loadings in the MODEL command and then you use the MODEL CONSTRAINT command to define NEW parameters that you express as products of the loadings. This gives you their estimates and SEs.
Samuli Helle posted on Wednesday, February 06, 2013 - 1:11 pm