Anonymous posted on Saturday, December 11, 2004 - 11:29 am
Hello, I am running a simple SEM model in which I ask for Modification Indices in the output. I receive "BY" statements, "ON" statements, and "WITH" statements that are all easy to interpret, however, I also get what are called "ON/BY" statements. What are these statements and what are they suggesting that I do to improve model fit? Thanks!
BY is a type of ON. In BY, the dependent variables are on the right-hand side of BY and the independent variables are on the left-hand side of BY. The modification indices listed with ON/BY are parameters that can potentially be freed either using ON or BY.
Sanjoy posted on Tuesday, July 26, 2005 - 12:11 pm
Dear Professor/s ... one very quick question
this is my model (R1 ...B3 are all 5 scale ordered categorical, u is 0/1)...using WLSMV
R by R1-R3 B by B1-B3 U on R B X1 R on B X2 B on R X3
MI indices (I set it at 3.84) suggests for a complex factor, ie. R by R1-R3, B3 ... I did so and found overall fit improves significantly
Now my question is
I need to do some difference testing (using DIFFTEST) on some otherwise freely estimable parameters of our model .... SHOULD I do it after following the MI suggestion or before ...I would sincerely appreciate if you kindly explain the answer as well
thanks and regards
bmuthen posted on Tuesday, July 26, 2005 - 5:42 pm
Yes, I would first make sure that the model fits the data well and the MI is a useful tool here. If you want to be precise, there are some issues with the p value for your difftest not being exactly right given the modifications in these multiple analysis steps. The best thing would be to have first explored the R and B factor structure by itself (using only the R1-R3, B1-B3 variables) using EFA and CFA on a separate (independent) sample.
Thank You Professor ... as you have sugested, I ran seperate CFA (using only R1 ... B3 as "usevariable" so that we should not have effects of any other uncorrelated variables in our model)
I start with this MODEL:R BY R1 R2 R3; B by B1 B2 B3;
kindly tell me if I'm wrong
1. we should keep including parameter one at a time and revaluate the model ... and
2. with respect to both MI, EPC and modeling sense we should pick up "suggested path"
finally after three stages of inclusion I ended up with this MODEL:R BY R1 R2 R3 B3; B by B1 B2 B3; B1 WITH R2; B2 WITH R1;
1st stage CFI/TLI
CFI 0.802 TLI 0.688
2nd stage CFI/TLI (I add B3 in "R by")
CFI 0.877 TLI 0.775
3rd stage CFI/TLI ( I add B2 with R1)
CFI 0.962 TLI 0.916
last stage CFI/TLI ( I add B1 WITH R2)
CFI 0.994 TLI 0.986
Thanks and regards
bmuthen posted on Wednesday, July 27, 2005 - 6:46 pm
You can do it that way. Note that your model improvements may not all be necessary in the following sense. Once you have gotten to the last stage you can see if some parameters - although improving model fit - are (1) really not substantively significant and (2) don't change the key parameters of the model from the values these key parameters had in earlier stages. Key parameters in your case might be the factor correlation and the loadings of the initial factor indicators. If (1) and (2) hold, then you may not want to include such "waste-basket" parameters (to quote Michael Browne), which may only be a function of (too) high power. Note also that a safer way to proceed might be exploratory factor analysis (EFA)and "EFA within a CFA framework" which we teach at our Nov course.
Sanjoy posted on Wednesday, July 27, 2005 - 10:25 pm
Thank you Professor, I think I start getting it ... could you kindly name the article by Prof. Browne, I think I should read more on MI
from page 24 of Technical appendix (plus Sorbom's article )... you mention the theory of MI in the context of continuous Y variables ... however, in Mplus 3.12 we can calculate MI even when we have categorical Ys ... could you kindly tell us the reference behind ... I need to cite them in my thesis
Browne didn't mention this in the MI connection, but more generally - I am not sure which of his many articles it appeared in but perhaps in one of his British Journal of Math'l and Stat'l Psychology articles. Does anyone else know?
MI for categorical uses the same principle as Sorbom's article, the only difference is that the MIs are multiplied by the WLSMV correction factor so that they are on the chi-square scale. No reference yet, except the program itself.
thank you Professor ... Your words are reference to me ... with regards
Peter Croy posted on Wednesday, July 26, 2006 - 12:37 pm
Hello, I'm in the early stage Mplus use, but have reached the stage of assessing fit of my basic SEM model(N=2300). The only fit statistic that looks okay is SRMR which is 0.068. The Chi Square value is very high and RMSEA is 0.094. I have asked for modindices and have obtained MIs for correlations (the WITHs) between my factor indicator variables. Can you tell me what this means ... does Mplus not correlate factor indicators as the default?
Mplus does not correlate the residuals of the factor indicators by default because that would not be an identified model. I would look at the CFI and if it isn't close to or above 0.96, I would go back and do an EFA.
Peter Croy posted on Wednesday, July 26, 2006 - 10:21 pm
thanks for that. I have done an EFA which shows all indicators loading onto their expected factors. Various other indicators loaded onto sibling factors but to a lesser degree than expected indicators. However, I guess this still may be the genesis of my misfit problem. I also calculated Cronbach's alpha for the indicators of each factor ... alphas were > 0.7. Assuming I can muster an argument for correlating some of the residuals of the factor indicators, can I go ahead and do this yet avoid identification problems?
Yes, you can add some residual covariances to the model.
Peter Croy posted on Saturday, July 29, 2006 - 1:08 am
Hello again. I have added some residual covariances to the model and have obtained acceptable model fit indicies. I would appreciate a little guidance on the justification for inclusion of the residual covariances. The theoretical model I'm using is the Theory of Planned Behaviour ... a model shown to be robust across many studies that often use regression analyses where residual covariances are not at issue. These studies have averaged respective indicator scores to obtain single measures of IVs/predictors. They report regression coefficients and R square results. My study obtains regression/R sq results comparable to other studies. However, I want to stay with the Mplus SEM model for latter stages of my analysis which deals with a catagorical DV (which is mediated by the latent DV in my present model). Thus, I need to justify the inclusion of residual covariances in my present model. In addition to any guidance you may have on justification for inclusion of residual covariances ... is it common practice?
Residual covariances are not unusual parameters to have in a model. They should be substantively motivated, however, and not used solely to improve model fit.
Eda Aksoy posted on Tuesday, August 01, 2006 - 9:08 am
I am trying to conduct a confirmatory factory analysis for two scales. When I define the model as "x by x1 x2" and "y by y1 y2", the resulting model turns out to be a misfit, despite the fact that all the indicators load significantly. The modification indices recommend a rather long list of WITH connections among indicators. When I do include a few of these, the overall model fit improves drastically. My question is this: Under what conditions would I justifiably be able to include such WITH statements in my model? Would this be a statistically questionable conduct?
You have two factors with two indicators each. These factors are identified only because they borrow information from each other. This is not a strong model. Adding residual covariances should be done only if they are substantively defensible not just to improve model fit.
Eda Aksoy posted on Wednesday, August 02, 2006 - 3:29 am
I am sorry, I gave those model statements to indicate how I conducted the analysis. They are not meant to represent the number of items. My mistake. Actually the first scale has 8, and the second scale has 5 items. Therefore they have that many indicators.
What would you call a "substantively defensible" argument? Could you give an example?
A residual covariance could represent a minor substantive factor. You would need to determine that by seeing which two items are involved. It could be represent a methods factor due to common item wording or such. Or it could simply represent sampling variability in which case you would not want to include it because you would not be able to replicate this in a future analysis.
I have a quick question. I am using Mplus to run a CFA with WLSMV (ordinal data). When I ask for modification indices I get BY statements, but now WITH statements referring to residual correlations. Why do I not get MI for the residual correlations? Is that to do with WLSMV?
No, residual covariances can be part of the weighted least squares model. It is simply a matter of the matrix not being opened so no modification indices are shown. If you have four factor indicators, for example, u1 through u4, add u1-u4 WITH u1-u4@0; to the MODEL command and your should obtain the modification indices that you want.
I ran a mediation model with one mediator variable. The modification indices output shows that the mediator and one of the predictor variables has a MI=999.000, and all of the other listed estimates=0. The other MIs seem reasonable.
I'm not sure how to interpret this - could this be a program issue?
The value 999 is printed when the modification index cannot be computed most often due to a negative variance or residual variance. You would need to send the output and your license number to firstname.lastname@example.org for me to comment on the zero modification indices.
I have a path analysis model with a path between my exogenous variable A and the predicted variable B (the entire model is more complicated than this). The overall fit of the model is poor (chi-square = 151 (53 df), CFI = .88) The modification index suggests that the fit of the model would be improved if I allowed A and B to correlate. I am not sure how to interpret this as there is already a path between A and B. I have run the model in Mplus versions 3, 4 and 5 and get the same modification index. Thanks for your help.
I would need to see the full output to say. From what you say it does not sound like the WITH statement would be identified so no modification index would be given. Please send the output and your license number to email@example.com.
Hello, I have a factor F1 predicting observed variables Y1 Y2 Y3 and Y4. The factor and the outcomes have been measured at the same time, but for conceptual reasons, I wish to use the "on" statements, so F1 is a “predictor”: Y1 on F1; Y2 on F1; Y3 on F1; Y4 on F1;
Now I am running a multiple group analysis, to see if F1 predicts Y1-Y4 equally well across 2 different ethnic groups, so I constrained all of these to be equal across groups: Y1 on F1 (1); Y2 on F1 (2); Y3 on F1 (3); Y4 on F1 (4);
In the modification indices, I get "F1 on Y1" for one of my groups, which I believe means that I should release the equality constrain for the relationship between F1 and Y1 across the two groups. The order of the variables, however, does not respect the one I wish to have (i.e., I want F1 to be a predictor and Y1 to be an outcome). I wonder if I can use this MI to justify the release of the "Y1 on F1" constrain in my model. Thanks for your help!
Hi, I am running a CFA on a scale with ten items, 1 factor, ordinal response, using WLSMV. My CFI and TLI are good, but RMSEA is 0.117. Asking for Modification Indices produced a whole heap of 'with' statements. Does this mean that I should allow the residual covariances of some items be free? If so, is the correct syntax:
K1 WITH K2* (ECOV12);
which would be allowing item 1 and 2 residuals to covary.
Conceptually, what does this mean when interpreting my model? It makes sense for these two variables to be more correlated, as they are of the nature: 'How often do you feel restless?' (1) 'How often do you feel so restless that nothing could calm you down?' (2)
Does the residual covariance mean that the two items are correlated above and beyond their relationship to the latent factor? Thanks for all your help so far!
I have already done this and a unidimensional structure was confirmed (scree plot and eigenvalues). Taking this into consideration, is my syntax for the covariance correct and conceptually what does this mean. Thanks!
In addition to my last post, I tried to do a multigroup model with the covariances between the correlated items specified (i.e. K1 with K2* (ECOV12) as above), but the model wasn't identified. I think this is because the thresholds and loadings were also free. How do I include the covariances in the model and have it identified? I need the thresholds and loadings free so that I can test the various levels of invariance.
I'm a bit stuck as to what to do with my model now. When I use a one factor model, as you have pointed out, it does not fit the data well. The Modification Indices options suggests a LONG list of correlations to add that would improve it. But then when I go back to do the EFA and specify 2 factors, I have fairly strong cross-loading on 4/10 items. Does this mean that some of the items are redundant?
Dear Drs. Muthen & Muthen. I am fitting a model with Mplus 4.2, which has 4 independent observed variables and 18 dependent observed variables (some categorical), using the WLS estimator. I got a Chi-square= 201.454, DF=121, P-value=0.000; thus the model does not fit, although CFI=0.977, TLI=0.956 and RMSEA=0.016. Really, the theory about my issue is blurred, thus, I want to do specifications searches (post hoc) on the model; but I got both High (and logical) Modification Indices (e.g. MI=23.33) and non-significant (and may be logical) path parameters (i.e. Est./S.E < 1.960). Please, I need a suggestion. Which is the first step for comparing alternative models? A. Remove non-significant paths? all of them or step by step?, or B. Include the parameters suggested by the MI (step by step) in the model? Thank you.
Dear Drs. Muthen & Muthen. I am fitting a model with Mplus 4.2, which includes dependent observed variables (some categorical), using the WLS estimator.
I have two endogenous categorical variables, A2 and B2, measured at the same time; thus I have proposed a correlation between them: "A2 WITH B2". When I did the analysis; a Modification Index is suggesting the following: "B2 ON A2". It could be a logical regression parameter.
What should be my next step? 1. Should I add the parameter "B2 ON A2" to my model, keeping "A2 WITH B2"? 2. Also, should I remove "A2 WITH B2" from the model? (are correlation and regression redundant?). 3. Can I add "A2 ON B2" to the model? Can regressions be bi-directional?
You should only add a suggested parameter if it makes sense to your theory. I would not add it. I also would not covary observed exogenous variables. The model is estimated conditioned on these variables. Their means, variances, and covariances are not model parameters.
I have read some researchers caution against "over-fitting" the model, such that it becomes too specific to the sample data and thus loses generalizability. I am wondering at what point, if any, is it considered over-fitting by making use of modification indices. Any help on this issue would be appreciated.
Yes, I understand that in order to add parameters suggested via modification indices, the parameters should make theoretical sense. However, if your initial theoretical model has acceptable fit, and there are modification indices that could be theoretically justified, is there still a chance of over-fitting the model through use of these post-hoc additions? Or is it common place to always add parameters indicated by the MIs as long as they make theoretical sense? In other words, is it acceptable to stick with your initial hypothesized model and not make use of MIs, as long as there is acceptable model fit?
I sent it from the following email firstname.lastname@example.org in a few separate instances (Feb. 20th, Feb 23rd, Mar 15). As the subject line, I had "Modification indices" and "Reply to forum comment re:Mod indices"
I tried repeatedly to email you and all messages were undeliverable. We are having the same problem with another concordia.ca email address. I need you to email me one free parameter that there is a modification index for. You should check with your IT person why you can't receive emails from us.
Is there a way to include a direct path from the residual of a manifest indicator to an outcome variable? I would like to examine whether the residual of an indicator associates with the outcome variable above and beyond the influence of the latent factor the indicator loads on. I see in a paper the authors used the modindices output to examine this:
1) Is this correct? 2) Is there any other way to do this? 3) Does the StdYX EPC represent the standardized path beta from the residual to the outcome variable (taking into account all other paths in the model)?
I have a quick question. I understand that we should not use the MI just to improve model fit. In my situation, I get high MI for a path, e.g. Variable 1 WITH Variable 2. In previous analyses (Pearson correlation) those variables have a high correlation of .68. Would that explain the high value of MI? In that case, could I add that path to my model? I should also note that there are some studies indicating a high correlation between those two variables.
You should not mention means, variances, or covariances of observed exogenous predictors in the MODEL command. In regression, the model is estimated conditioned on these variables. Their means, variances, and covariances are not model parameters. They are assumed to be correlated.
Factor indicators are not IVs, but DVs - because they are influenced by the factor(s); they "depend" on the factor. In this case, V1 WITH V2 refers to their residual covariance, not their covariance. The factor model assumes that such residual covariances are zero, but modification indices might suggest that some should be free and that can be done by saying V1 WITH V2.
Factor indicators are endogenous variables not exogenous variables. They are not predictors. The only exogenous variable in your model is A. In the factor model, the factor indicators are regressed on the factor.
X1 WITH Y 999.000 0.000 0.000 0.000 X2 WITH Y 999.000 0.000 0.000 0.000 X3 WITH Y 999.000 0.000 0.000 0.000 X4 WITH Y 999.000 0.000 0.000 0.000 X5 WITH Y 999.000 0.000 0.000 0.000 X6 WITH Y 999.000 0.000 0.000 0.000 X7 WITH Y 999.000 0.000 0.000 0.000 X8 WITH Y 999.000 0.000 0.000 0.000
we are working to improve our model-fit-indices of a twolevel H1 path model without any latent variable and we requested modification indices.
Mplus gives us some MIs that we don not understand. What does the MI mean, if it says variable 1 on variable 1 191.000 or variable 2 on variable 2 24.000. Please find our MIs from the output pasted below. And can the MIs help to further increase the CFI of 0.794 and decrease the RMSEA of 0.117? Thank you very much for your help.
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index 3.840
M.I. E.P.C. Std E.P.C. StdYX E.P.C.
EURO_2 ON EURO_2 191.000 -0.500 -0.500 -0.500 STUDIE ON STUDIE 191.000 -0.500 -0.500 -0.500
NOR_1 WITH ALTER_66 999.000 0.000 0.000 0.000 RR_60 WITH HKT_27 9.271 205.686 205.686 0.549 CARD_25 WITH TEMP_35 6.835 86.772 86.772 0.128
It doesn't look like the MIs are helpful here (some MIs are non-sensical from a practical point of view). Perhaps instead you can saturate your model (don't have any left-out arrows) to get a model with zero df and then see which coefficients are insignificant.
My question is regarding model specification. All variables are observed and continuous. I have 2 IVs and 6 DVs. I am using path modeling to see how the IVs predict to the DVs. 1. If simple bi-variate correlations shows small non-significant correlation, between DV and IVs, should I leave that DV out of the model? 2. I've entered the IVs to predict each DV giving me a just-identified model, 0 df. In order to get fit statistics I know to constrain non-significant paths to 0, which will increase df and give me fit statistics. However, do I constrain non-significant paths one by one monitoring changes in model... or? Furthermore, how do I know how many non-significant paths to constrain to 0? When I constrain only one path, the fit statistics are consistent with an excellent fit. Thank you.
I would not go about modeling that way. You don't test a model by deleting non-significant paths in a just-identified model. The test should be of a model specified according to theory before looking at the data.
You may want to get more input on general analysis strategies on SEMNET.
Waqar Nadeem posted on Wednesday, December 16, 2015 - 1:17 am
I am a doctoral student and wanted to ask about a small thing related to SEM analysis.
When we test structural model, is it ok to keep the non-significant paths as they contribute to the model fit indices?
For instance, I have eight independent variables and one dependent variable. When I run the structural model , three paths come out as non-significant based on t-values(values less than the threshold level).
When I report fit indices based on the output file, the non-significant paths are also contributing to it.
Should I keep those non-significant paths or totally remove them? and only keep the significant paths based on which fit indices would be reported.
Kindly advise and if you can share some articles if possible.
I would not remove hypothesized paths that turn out to be non-significant. This type of model trimming can lead to difficulty in replicating results. Others may not share my opinion.
Waqar Nadeem posted on Thursday, December 17, 2015 - 2:00 am
Thank you for the response. Actually, if some researchers say those non-significant paths have to be removed, then how to justify that not removing them is preferred. It(keeping them in the structural model) is a common practice as well. What can possibly be the underlying rationale of keeping them?
This type of model trimming can lead to difficulty in replicating results.
Waqar Nadeem posted on Thursday, December 17, 2015 - 6:02 am
Thank you Prof. Linda..
Daniel Lee posted on Friday, February 19, 2016 - 7:02 pm
Dear Dr. Muthen:
I had a question about using modification indices to determine whether I remove/keep items on a scale.
I had a scale of 18 items and conducted a CFA (theoretically justified). The model fit indicators indicated poor fit, so I looked at the mod indices and noticed that covarying the residuals of the first and second item would reduce a lot of the model misfit. However, instead of covarying the two, would it make more sense to actually delete one of the items (e.g., item 1)? I deleted the first item, and the model fit improved substantially. I'm at this decision point (let the residuals covary OR delete one of the items) and would appreciate any guidance in moving forward.
After doing a 2nd Order Factor, in the Output, ModIndices suggests same cross-loads and also... 1stOrder LV correlations... it was not supposed these Factors "lose" their covariances/correlations for the 2nd Order? e.g:
ON/BY statements M.I. E.P.C.
strutsup on autsup 30.690 0.432 autsup on strutsup
involv on strutsup 74.050 0.626 strutsup on involv
WITH statements strutsup with autsup 30.687 0.432 involv with autsup 39.541 0.492 involv with strutsup 74.047 0.626
If you have more than 3 first-order factors measuring a second-order factor there may be misfit so that some residual covariances among the first-order factors need to be freed. it is the same as in regular factor analysis where some residuals may need to be correlated.
You may want to ask these general modeling questions on SEMNET.
With 3 first-order factors that part of the model is just-identified and no Modindices for that part are different from zero.
QianLi Xue posted on Wednesday, November 23, 2016 - 11:19 am
The modification index is NOT the same as the Lagrange multiplier test (or Score test) statistic, right? Because the former looks at the drop in chi-square, which is equivalent to a likelihood ratio test.
Hi, I am running a two factor CFA using WLSMV. All variables are binary. When I run the same exact model but indicate in the output statement modindices(ALL) vs. modindices(3.84), I get different fit statistics and factor loadings. Do you know why this may be the case? Two variables are highly intercorrelated, and modification indices indicate correlated residual variance between them.
Eliminating one of these items, the loadings and fit statistics are the same whether I request modindices(ALL) or modindices(3.84). Thank you for your assistance.