Message/Author 

Anonymous posted on Saturday, December 11, 2004  11:29 am



Hello, I am running a simple SEM model in which I ask for Modification Indices in the output. I receive "BY" statements, "ON" statements, and "WITH" statements that are all easy to interpret, however, I also get what are called "ON/BY" statements. What are these statements and what are they suggesting that I do to improve model fit? Thanks! 


BY is a type of ON. In BY, the dependent variables are on the righthand side of BY and the independent variables are on the lefthand side of BY. The modification indices listed with ON/BY are parameters that can potentially be freed either using ON or BY. 

Sanjoy posted on Tuesday, July 26, 2005  12:11 pm



Dear Professor/s ... one very quick question this is my model (R1 ...B3 are all 5 scale ordered categorical, u is 0/1)...using WLSMV R by R1R3 B by B1B3 U on R B X1 R on B X2 B on R X3 MI indices (I set it at 3.84) suggests for a complex factor, ie. R by R1R3, B3 ... I did so and found overall fit improves significantly Now my question is I need to do some difference testing (using DIFFTEST) on some otherwise freely estimable parameters of our model .... SHOULD I do it after following the MI suggestion or before ...I would sincerely appreciate if you kindly explain the answer as well thanks and regards 

bmuthen posted on Tuesday, July 26, 2005  5:42 pm



Yes, I would first make sure that the model fits the data well and the MI is a useful tool here. If you want to be precise, there are some issues with the p value for your difftest not being exactly right given the modifications in these multiple analysis steps. The best thing would be to have first explored the R and B factor structure by itself (using only the R1R3, B1B3 variables) using EFA and CFA on a separate (independent) sample. 

Sanjoy posted on Tuesday, July 26, 2005  8:43 pm



Thank You Professor ... as you have sugested, I ran seperate CFA (using only R1 ... B3 as "usevariable" so that we should not have effects of any other uncorrelated variables in our model) I start with this MODEL:R BY R1 R2 R3; B by B1 B2 B3; kindly tell me if I'm wrong 1. we should keep including parameter one at a time and revaluate the model ... and 2. with respect to both MI, EPC and modeling sense we should pick up "suggested path" finally after three stages of inclusion I ended up with this MODEL:R BY R1 R2 R3 B3; B by B1 B2 B3; B1 WITH R2; B2 WITH R1; 1st stage CFI/TLI CFI 0.802 TLI 0.688 2nd stage CFI/TLI (I add B3 in "R by") CFI 0.877 TLI 0.775 3rd stage CFI/TLI ( I add B2 with R1) CFI 0.962 TLI 0.916 last stage CFI/TLI ( I add B1 WITH R2) CFI 0.994 TLI 0.986 Thanks and regards 

bmuthen posted on Wednesday, July 27, 2005  6:46 pm



You can do it that way. Note that your model improvements may not all be necessary in the following sense. Once you have gotten to the last stage you can see if some parameters  although improving model fit  are (1) really not substantively significant and (2) don't change the key parameters of the model from the values these key parameters had in earlier stages. Key parameters in your case might be the factor correlation and the loadings of the initial factor indicators. If (1) and (2) hold, then you may not want to include such "wastebasket" parameters (to quote Michael Browne), which may only be a function of (too) high power. Note also that a safer way to proceed might be exploratory factor analysis (EFA)and "EFA within a CFA framework" which we teach at our Nov course. 

Sanjoy posted on Wednesday, July 27, 2005  10:25 pm



Thank you Professor, I think I start getting it ... could you kindly name the article by Prof. Browne, I think I should read more on MI from page 24 of Technical appendix (plus Sorbom's article )... you mention the theory of MI in the context of continuous Y variables ... however, in Mplus 3.12 we can calculate MI even when we have categorical Ys ... could you kindly tell us the reference behind ... I need to cite them in my thesis thanks and regards 

bmuthen posted on Friday, July 29, 2005  8:54 am



Browne didn't mention this in the MI connection, but more generally  I am not sure which of his many articles it appeared in but perhaps in one of his British Journal of Math'l and Stat'l Psychology articles. Does anyone else know? MI for categorical uses the same principle as Sorbom's article, the only difference is that the MIs are multiplied by the WLSMV correction factor so that they are on the chisquare scale. No reference yet, except the program itself. 

Sanjoy posted on Friday, July 29, 2005  10:09 am



thank you Professor ... Your words are reference to me ... with regards 

Peter Croy posted on Wednesday, July 26, 2006  12:37 pm



Hello, I'm in the early stage Mplus use, but have reached the stage of assessing fit of my basic SEM model(N=2300). The only fit statistic that looks okay is SRMR which is 0.068. The Chi Square value is very high and RMSEA is 0.094. I have asked for modindices and have obtained MIs for correlations (the WITHs) between my factor indicator variables. Can you tell me what this means ... does Mplus not correlate factor indicators as the default? 


Mplus does not correlate the residuals of the factor indicators by default because that would not be an identified model. I would look at the CFI and if it isn't close to or above 0.96, I would go back and do an EFA. 

Peter Croy posted on Wednesday, July 26, 2006  10:21 pm



thanks for that. I have done an EFA which shows all indicators loading onto their expected factors. Various other indicators loaded onto sibling factors but to a lesser degree than expected indicators. However, I guess this still may be the genesis of my misfit problem. I also calculated Cronbach's alpha for the indicators of each factor ... alphas were > 0.7. Assuming I can muster an argument for correlating some of the residuals of the factor indicators, can I go ahead and do this yet avoid identification problems? 


Yes, you can add some residual covariances to the model. 

Peter Croy posted on Saturday, July 29, 2006  1:08 am



Hello again. I have added some residual covariances to the model and have obtained acceptable model fit indicies. I would appreciate a little guidance on the justification for inclusion of the residual covariances. The theoretical model I'm using is the Theory of Planned Behaviour ... a model shown to be robust across many studies that often use regression analyses where residual covariances are not at issue. These studies have averaged respective indicator scores to obtain single measures of IVs/predictors. They report regression coefficients and R square results. My study obtains regression/R sq results comparable to other studies. However, I want to stay with the Mplus SEM model for latter stages of my analysis which deals with a catagorical DV (which is mediated by the latent DV in my present model). Thus, I need to justify the inclusion of residual covariances in my present model. In addition to any guidance you may have on justification for inclusion of residual covariances ... is it common practice? 


Residual covariances are not unusual parameters to have in a model. They should be substantively motivated, however, and not used solely to improve model fit. 

Eda Aksoy posted on Tuesday, August 01, 2006  9:08 am



Hi, I am trying to conduct a confirmatory factory analysis for two scales. When I define the model as "x by x1 x2" and "y by y1 y2", the resulting model turns out to be a misfit, despite the fact that all the indicators load significantly. The modification indices recommend a rather long list of WITH connections among indicators. When I do include a few of these, the overall model fit improves drastically. My question is this: Under what conditions would I justifiably be able to include such WITH statements in my model? Would this be a statistically questionable conduct? Thank you. 


You have two factors with two indicators each. These factors are identified only because they borrow information from each other. This is not a strong model. Adding residual covariances should be done only if they are substantively defensible not just to improve model fit. 

Eda Aksoy posted on Wednesday, August 02, 2006  3:29 am



I am sorry, I gave those model statements to indicate how I conducted the analysis. They are not meant to represent the number of items. My mistake. Actually the first scale has 8, and the second scale has 5 items. Therefore they have that many indicators. What would you call a "substantively defensible" argument? Could you give an example? Thanks a lot. 


A residual covariance could represent a minor substantive factor. You would need to determine that by seeing which two items are involved. It could be represent a methods factor due to common item wording or such. Or it could simply represent sampling variability in which case you would not want to include it because you would not be able to replicate this in a future analysis. 


Hello, I have a quick question. I am using Mplus to run a CFA with WLSMV (ordinal data). When I ask for modification indices I get BY statements, but now WITH statements referring to residual correlations. Why do I not get MI for the residual correlations? Is that to do with WLSMV? Thank you very much in advance, Julia Diemer 


No, residual covariances can be part of the weighted least squares model. It is simply a matter of the matrix not being opened so no modification indices are shown. If you have four factor indicators, for example, u1 through u4, add u1u4 WITH u1u4@0; to the MODEL command and your should obtain the modification indices that you want. 


Dear Professors, I ran a mediation model with one mediator variable. The modification indices output shows that the mediator and one of the predictor variables has a MI=999.000, and all of the other listed estimates=0. The other MIs seem reasonable. I'm not sure how to interpret this  could this be a program issue? Thank you in advance foryour reply. Kimberley Freire 


The value 999 is printed when the modification index cannot be computed most often due to a negative variance or residual variance. You would need to send the output and your license number to support@statmodel.com for me to comment on the zero modification indices. 


Hello, I have a path analysis model with a path between my exogenous variable A and the predicted variable B (the entire model is more complicated than this). The overall fit of the model is poor (chisquare = 151 (53 df), CFI = .88) The modification index suggests that the fit of the model would be improved if I allowed A and B to correlate. I am not sure how to interpret this as there is already a path between A and B. I have run the model in Mplus versions 3, 4 and 5 and get the same modification index. Thanks for your help. 


I would need to see the full output to say. From what you say it does not sound like the WITH statement would be identified so no modification index would be given. Please send the output and your license number to support@statmodel.com. 


Hello, I have a factor F1 predicting observed variables Y1 Y2 Y3 and Y4. The factor and the outcomes have been measured at the same time, but for conceptual reasons, I wish to use the "on" statements, so F1 is a “predictor”: Y1 on F1; Y2 on F1; Y3 on F1; Y4 on F1; Now I am running a multiple group analysis, to see if F1 predicts Y1Y4 equally well across 2 different ethnic groups, so I constrained all of these to be equal across groups: Y1 on F1 (1); Y2 on F1 (2); Y3 on F1 (3); Y4 on F1 (4); In the modification indices, I get "F1 on Y1" for one of my groups, which I believe means that I should release the equality constrain for the relationship between F1 and Y1 across the two groups. The order of the variables, however, does not respect the one I wish to have (i.e., I want F1 to be a predictor and Y1 to be an outcome). I wonder if I can use this MI to justify the release of the "Y1 on F1" constrain in my model. Thanks for your help! 


The modification index for f1 ON y1 is not the same as for y1 ON f1 so it should not be used for any decisions about y1 ON f1. 


Hi, I am running a CFA on a scale with ten items, 1 factor, ordinal response, using WLSMV. My CFI and TLI are good, but RMSEA is 0.117. Asking for Modification Indices produced a whole heap of 'with' statements. Does this mean that I should allow the residual covariances of some items be free? If so, is the correct syntax: K1 WITH K2* (ECOV12); which would be allowing item 1 and 2 residuals to covary. Conceptually, what does this mean when interpreting my model? It makes sense for these two variables to be more correlated, as they are of the nature: 'How often do you feel restless?' (1) 'How often do you feel so restless that nothing could calm you down?' (2) Does the residual covariance mean that the two items are correlated above and beyond their relationship to the latent factor? Thanks for all your help so far! 


I would start with an EFA to see if unidimensionality is correct for these variables. 


I have already done this and a unidimensional structure was confirmed (scree plot and eigenvalues). Taking this into consideration, is my syntax for the covariance correct and conceptually what does this mean. Thanks! 


In addition to my last post, I tried to do a multigroup model with the covariances between the correlated items specified (i.e. K1 with K2* (ECOV12) as above), but the model wasn't identified. I think this is because the thresholds and loadings were also free. How do I include the covariances in the model and have it identified? I need the thresholds and loadings free so that I can test the various levels of invariance. Thanks so much, Lucy 


A onefactor EFA would fit the same as a onefactor CFA so although you established unidimensionality, the model does not fit the data well. See the Topic 2 course handout under multiple group regression to see how to specify the models to test measurement invariance. 


Thanks for your help Linda I'm a bit stuck as to what to do with my model now. When I use a one factor model, as you have pointed out, it does not fit the data well. The Modification Indices options suggests a LONG list of correlations to add that would improve it. But then when I go back to do the EFA and specify 2 factors, I have fairly strong crossloading on 4/10 items. Does this mean that some of the items are redundant? 


You may find listening to the Topic 1 course video on the website useful. We go over EFA in detail and discuss crossloadings. 


Dear Drs. Muthen & Muthen. I am fitting a model with Mplus 4.2, which has 4 independent observed variables and 18 dependent observed variables (some categorical), using the WLS estimator. I got a Chisquare= 201.454, DF=121, Pvalue=0.000; thus the model does not fit, although CFI=0.977, TLI=0.956 and RMSEA=0.016. Really, the theory about my issue is blurred, thus, I want to do specifications searches (post hoc) on the model; but I got both High (and logical) Modification Indices (e.g. MI=23.33) and nonsignificant (and may be logical) path parameters (i.e. Est./S.E < 1.960). Please, I need a suggestion. Which is the first step for comparing alternative models? A. Remove nonsignificant paths? all of them or step by step?, or B. Include the parameters suggested by the MI (step by step) in the model? Thank you. 


I would start with my theoretical model and use modification indices to add theoretically meaningful parameters. 


Dear Drs. Muthen & Muthen. I am fitting a model with Mplus 4.2, which includes dependent observed variables (some categorical), using the WLS estimator. I have two endogenous categorical variables, A2 and B2, measured at the same time; thus I have proposed a correlation between them: "A2 WITH B2". When I did the analysis; a Modification Index is suggesting the following: "B2 ON A2". It could be a logical regression parameter. What should be my next step? 1. Should I add the parameter "B2 ON A2" to my model, keeping "A2 WITH B2"? 2. Also, should I remove "A2 WITH B2" from the model? (are correlation and regression redundant?). 3. Can I add "A2 ON B2" to the model? Can regressions be bidirectional? Thank you very much. 


You should only add a suggested parameter if it makes sense to your theory. I would not add it. I also would not covary observed exogenous variables. The model is estimated conditioned on these variables. Their means, variances, and covariances are not model parameters. 


Hi! For an SEM I got an E.P.C. and Std E.P.C.that appears inflated: AGE WITH CFIT 108.726 As the StdYX E.P.C. is 2.629 I'm thinking the large EPC above may just be due to the large range in values for the two observed variables (age, cfit). Would 108.726 be 'out of range' for E.P.C. and Std E.P.C.? The syntax was: MODEL: sr BY selfr@0.78; selfr@.15; soc BY social@.72; social@.17; sr ON edu cfit; soc ON sr pop cfit age; pa ON pop soc age sex; cfit ON edu age; pop ON edu sex; OUTPUT: SAMPSTAT STDYX RES MODINDICES (3.84)PATTERN Look forward to your reply. 


Please send your output and license number to support@statmodel.com. 


Dr. Muthen, When conducting a path analysis, is it always necessary to consider modification indices? My hypothetical model has the following fit indices: ChiSquare = 42.358, df = 13, p = 0.0001 CFI = 0.968 TLI = 0.931 RMSEA = 0.066 SRMR = 0.031 I have read some researchers caution against "overfitting" the model, such that it becomes too specific to the sample data and thus loses generalizability. I am wondering at what point, if any, is it considered overfitting by making use of modification indices. Any help on this issue would be appreciated. Thank you, Kat BrackenMinor 


You would not want to add parameters based on a modification index that is not justified by theory. You would not want to add a parameter just in improve model fit. 


Thank you for your response. Yes, I understand that in order to add parameters suggested via modification indices, the parameters should make theoretical sense. However, if your initial theoretical model has acceptable fit, and there are modification indices that could be theoretically justified, is there still a chance of overfitting the model through use of these posthoc additions? Or is it common place to always add parameters indicated by the MIs as long as they make theoretical sense? In other words, is it acceptable to stick with your initial hypothesized model and not make use of MIs, as long as there is acceptable model fit? Thank you for your time, Kat 


I would think it would be preferable to stick with your initial hypothesized model. This question might get a better response from a general discussion forum like SEMNET. 


Dr. Muthen, I am testing a double mediation model, and getting modification indices for paths that are already specified in the model. Is my syntax wrong or could this be a bug? Thank you and looking forward to your reply 


You should not get a modification index for a free parameter unless it is constrained to be equal to another free parameter. I would have to see the output to comment further. 


Hi Dr. Muthen, I sent an email with the question and output to support@statmodel.com but have not received a reply yet. Should I be emailing the output to a different email address? Thanks 


When and under what name did you send it? 


Hi Dr. Muthen, I sent it from the following email alcstudy@alcor.concordia.ca in a few separate instances (Feb. 20th, Feb 23rd, Mar 15). As the subject line, I had "Modification indices" and "Reply to forum comment re:Mod indices" Did you receive it? 


I tried repeatedly to email you and all messages were undeliverable. We are having the same problem with another concordia.ca email address. I need you to email me one free parameter that there is a modification index for. You should check with your IT person why you can't receive emails from us. 

kja posted on Friday, May 03, 2013  5:28 pm



Hello Dr. Muthen, Is there a way to include a direct path from the residual of a manifest indicator to an outcome variable? I would like to examine whether the residual of an indicator associates with the outcome variable above and beyond the influence of the latent factor the indicator loads on. I see in a paper the authors used the modindices output to examine this: 1) Is this correct? 2) Is there any other way to do this? 3) Does the StdYX EPC represent the standardized path beta from the residual to the outcome variable (taking into account all other paths in the model)? Thank you in advance for your time. 


See the FAQ called "Regressing on a residual" on our website. 

kja posted on Saturday, May 04, 2013  9:05 am



Oh, perfect. Thank you. 


Dear Dr. Muthen, I have a quick question. I understand that we should not use the MI just to improve model fit. In my situation, I get high MI for a path, e.g. Variable 1 WITH Variable 2. In previous analyses (Pearson correlation) those variables have a high correlation of .68. Would that explain the high value of MI? In that case, could I add that path to my model? I should also note that there are some studies indicating a high correlation between those two variables. Thank you in advance, Margarita 


I would say Yes if it makes substantive sense. But you don't say if these variables are DVs or IVs. If they are DVs, note that correlation is not the same as residual correlation/covariance. 


Dear Dr. Muthen Thank you for your reply. The two variables are actually IV  Predictors. 


You should not mention means, variances, or covariances of observed exogenous predictors in the MODEL command. In regression, the model is estimated conditioned on these variables. Their means, variances, and covariances are not model parameters. They are assumed to be correlated. 


Thank you for your prompt answer! I see what you mean.I have a last quick question. Let's say that 5 IV (V1V5) serve as indicators for a factor (B), and then that factor serves as a predictor for my DV. (e.g.) MODEL: A BY V1 V2 V3 V4 V5; B ON A; I am assuming that V1 and V2 are not assumed to be correlated like in the previous example. So once the indicators are loaded onto a factor are not supposed to be correlated anymore right? In that case, could I add their covariates in the MODEL command? Again thank you for your comments! 


Factor indicators are not IVs, but DVs  because they are influenced by the factor(s); they "depend" on the factor. In this case, V1 WITH V2 refers to their residual covariance, not their covariance. The factor model assumes that such residual covariances are zero, but modification indices might suggest that some should be free and that can be done by saying V1 WITH V2. 


Factor indicators are endogenous variables not exogenous variables. They are not predictors. The only exogenous variable in your model is A. In the factor model, the factor indicators are regressed on the factor. 


That's very clear. Thank you for your comments. 

Anonymous posted on Tuesday, December 10, 2013  8:25 am



Dr. Muthen, I ran several twolevel/twolevel random models and requested the modification indices. Mplus always gives me 999.000 as M.I., you can see an example output below. You previously suggested the 999.000values came from negative variances/ residual variances and could be ignored. My questions: (1) Can I even ignore them if there are exclusively 999M.I.s? (2) How can I interpret that? (3) Can I see the negative variances/ residual variances in the output? I cannot find any. Thank you very much for your help!  Within Level M.I. E.P.C. Std E.P.C. StdYX E.P.C. WITH Statements X1 WITH Y 999.000 0.000 0.000 0.000 X2 WITH Y 999.000 0.000 0.000 0.000 X3 WITH Y 999.000 0.000 0.000 0.000 X4 WITH Y 999.000 0.000 0.000 0.000 X5 WITH Y 999.000 0.000 0.000 0.000 X6 WITH Y 999.000 0.000 0.000 0.000 X7 WITH Y 999.000 0.000 0.000 0.000 X8 WITH Y 999.000 0.000 0.000 0.000 Between Level 


Please send the output and your license number to support@statmodel.com. 

Sarah posted on Thursday, August 28, 2014  6:57 am



Dear Drs. Muthen I wonder if you can help me with a query regarding modification indices? I have run an SEM and the values for all the modification indices provided were 999.000. Does this indicate a problem with the model? Many thanks for your help. 


Please send the output and your license number to support@statmodel.com. 


Dear Dr. Muthen, we are working to improve our modelfitindices of a twolevel H1 path model without any latent variable and we requested modification indices. Mplus gives us some MIs that we don not understand. What does the MI mean, if it says variable 1 on variable 1 191.000 or variable 2 on variable 2 24.000. Please find our MIs from the output pasted below. And can the MIs help to further increase the CFI of 0.794 and decrease the RMSEA of 0.117? Thank you very much for your help. MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index 3.840 M.I. E.P.C. Std E.P.C. StdYX E.P.C. Within Level ON Statements EURO_2 ON EURO_2 191.000 0.500 0.500 0.500 STUDIE ON STUDIE 191.000 0.500 0.500 0.500 WITH Statements NOR_1 WITH ALTER_66 999.000 0.000 0.000 0.000 RR_60 WITH HKT_27 9.271 205.686 205.686 0.549 CARD_25 WITH TEMP_35 6.835 86.772 86.772 0.128 Between Level ON Statements SEVOB_30 ON SEVOB_30 24.000 0.500 0.500 0.500 


It doesn't look like the MIs are helpful here (some MIs are nonsensical from a practical point of view). Perhaps instead you can saturate your model (don't have any leftout arrows) to get a model with zero df and then see which coefficients are insignificant. 


Hello. My question is regarding model specification. All variables are observed and continuous. I have 2 IVs and 6 DVs. I am using path modeling to see how the IVs predict to the DVs. 1. If simple bivariate correlations shows small nonsignificant correlation, between DV and IVs, should I leave that DV out of the model? 2. I've entered the IVs to predict each DV giving me a justidentified model, 0 df. In order to get fit statistics I know to constrain nonsignificant paths to 0, which will increase df and give me fit statistics. However, do I constrain nonsignificant paths one by one monitoring changes in model... or? Furthermore, how do I know how many nonsignificant paths to constrain to 0? When I constrain only one path, the fit statistics are consistent with an excellent fit. Thank you. 


I would not go about modeling that way. You don't test a model by deleting nonsignificant paths in a justidentified model. The test should be of a model specified according to theory before looking at the data. You may want to get more input on general analysis strategies on SEMNET. 

Back to top 