Message/Author 

davood posted on Friday, October 01, 2004  7:32 pm



Dear Dr. Muthen, I am using a large data base with cluster and weight variables. My purpose is to perform a multiple group comparison between 4 groups. It's a fairly complex model in which latent variables are regressed on some background variables. My data are ordinal. I treat them as continuous. I also have missing values. I used TYPE=COMPLEX. I have several questions for you: 1 It takes Mplus 23 minutes to run the analysis. Would you please let me know why it takes that long? (Since the data are confidential, I would be able only to provide the input syntax and partial output results). 2 I have heard you said as long as the observed dependent variables are not very badly skewed, the analysis should be ok. Could you please elaborate more on this issue? 2 Mplus uses the MLR estimator as default for this type=complex analysis. I am interested in Chisquared difference test. I already consulted the documents on the website regarding the chisquare difference test; but, just to clarify the subject, to perform chisquared difference test, how would I obtain the "regular" chisquare values? 3 In case I should treat my data as categorical, which estimators you would recommend to perform chisquared difference data with missing values? Sincerely, Davood 

bmuthen posted on Friday, October 01, 2004  8:37 pm



1) Perhaps this example has many outcomes which in addition to having a large sample and 4 groups could be timeconsuming. Do you have Type = Missing as well? Do you ask for Modification Indices? Is a run without Type=Complex equally timeconsuming? And, hopefully you are using version 3.11. 2) With very skewed items, the regular linear models used in standard continuousvariable analysis aren't realistic or dependable. For instance a Person correlation is attenuated. 2 again) Multiply by the printed correction factor and also see http://statmodel.com/chidiff.html 3)Use WLSMV and the new DIFFTEST option. 

davood posted on Tuesday, October 05, 2004  5:05 pm



Dear Dr. Muthen, Thanks for your time and your valuable comments. Just would like to let you know what I used in my analysis: 1)I do not use type=missing. When I dont ask for MI's, the run time just decreases by just few minutes. And, I upgraded MPlus to version 3.11. The running time decreased by 1 or 2 minutes. Another question: I reran the same model(for simplicity, this time just for 2 groups) by specifying the categorical obs dependent variables and using WLSMV estimator (type=complex, cluster and weight variables). This time, however, I've got this error message: *** ERROR Cluster ID cannot appear in more than one group. Problem with cluster ID: 50 I did not have this error when I treat my observed dep variables as continuous, however. Sinecerly, Davood p.s. If you need the syntax, please let me know. 


You should send the output where you treat the variables as continuous and also the output where you treat the variables as categorical. 

davood posted on Thursday, December 09, 2004  3:27 pm



Dear Dr. Muthen, I'm doing a multiple group CFA with one factor and categorical indicators, the mplus generated this warning: SERIOUS COMPUTATIONAL PROBLEMS OCCURRED IN THE BIVARIATE ESTIMATION OF THE CORRELATION FOR VARIABLES MOMRELS1 AND MOMCOMM1. CHECK YOUR DATA. IF THE PROGRAM RECOVERS FOR THIS PAIR OF VARIABLES (SEE TECHNICAL 6 OUTPUT), THE ESTIMATES ARE VALID. THE PROBLEM OCCURRED FOR THE FOLLOWING OBSERVATION(S): OBSERVATION(S) WITH MOMRELS1=4 AND MOMCOMM1=4. I checked the tech6 output. Correlation between MOMCOMM1 and MOMRELS1 was estimated with so many iterations. However, at the end, message was NORMAL TERMINATION FOR ESTIMATING THE CORRELATION FOR MOMRELS1 AND MOMCOMM1 I did not have this problem when I ran my model for each group seperately. Are my results valid then? Thank you, Davood 


As long as you get the message "Normal termination etc.", your results are valid. 


Dear Dr, Muthen, I have faced an error while doing a two group CFA with categorical outcomes. Here is the error message: *** ERROR Group 4 does not contain all values of categorical variable: GOOD1 would you please let me know how to overcome this error? Thank you! 


Each group must have the same values on categorical observed variables. So if males have values of 1, 2, and 3 on a variable, females cannot have only 1 and 2. This sometimes comes about with listwise deltion. You need to collapse categories until you have the same values in each group. 


Dear Dr. muthen, I'm doing a multiple group analysis with categorical indicators, cluster and weight variables using WLSM estimator. I have faced partial measurement invariance. In doing so, I free each factor loading and respected thresholds while I fixed the scale factor to one for one indicator at a time in the nested model. I use scaled chi square difference test mentioned on the statmodel homepage to compare the nested model with the less restricted baseline model. The problem is that in doing so, however, after freeing three indicators parameters I faced negative scaled chi square difference test. I apprecaite your comments. Regards, Davood 

BMuthen posted on Friday, January 21, 2005  2:11 am



This can happen. There have been writing by Bentler and Satorra on this for continuous outcomes. The difference testing is an asymptotic procedure and can break down like this. 


In this regard, can I conclude that the difference between the two models (nested and less restricted) is not significant? 


I don't think it is possible to make a conclusion in this case. The results point to a breakdown of the method. 


Dear Dr. Muthén, I am doing a twogroup SEM with continuous indicators using TYPE=Complex (with weight and cluster variables). I use MLM as an indicator. The model is a hybrid SEM model: 4 observed background variables predict two latent variables; these, in turn, are used to predict depression, which is a two level latent variable (20 indicators are used to form 4 latent variables which in turn form the depression variable). In addition, direct paths from two control variables and from the 4 background variables to the depression variable are included. Here is the summary of the model that Mplus prints: Number of dependent variable: 32 Number of independent variables: 6 Number of Continuous latent variables: 7 Degrees of freedom: 1280 Number of free parameters: 268 To calculate degrees of freedom, I used the formula mentioned in Bollen (1989, p. 361). ½*G*(p+q)(p+q+1)t where p: number of background variables (6) q: number of independent variable (32) G: number of groups: (2) t: number of free(estimated)parameters (268) After plugging in the numbers if the Bollen’s formula, I get a different number from the Mplus output. (df=½*2* 38*39268 =1214). So, the calculated df is less than df in Mplus print out. Would you please help us resolve this discrepancy? 


I would like to correct one mistake in the previous post. I used MLM as the estimator(not indicator). Sorry for the inconvenience. 


Please send your complete output to support@statmodel.com and I will look at it when I return to LA. The formula that you are using does not take into account that df's for covariates are treated differently than for outcomes. 

Sadhana posted on Monday, February 21, 2005  8:56 am



For Hierarchichal Cluster analysis of continuous data, Which distance and linkage method is appropriate as ther is a variety of methods available in the literature. 

bmuthen posted on Saturday, February 26, 2005  11:05 pm



For cluster analysis related to Mplus, please see mixture modeling (categorical latent variables)references on our web site. For instance, the Applied Latent Class Analysis book shown there has a chapter by Vermunt and Magidson that touches on relationships between classic "kmeans" clustering and latent class analysis of continuous outcomes. I would say that latent class analysis and related techniques such as factor mixture modeling could be preferrable to traditional clustering techniques. 


Dear Dr. Muthen, I have some questions regarding using both categorical and continuous indicators in a SEM model with both cluster and weight variables. 1) can we use both categorical and continuous in the above analysis? 2)How does Mplus treat continuous indicators when some others are categorical? 3)In this case, what type of estimator is recommended? 4)Are there any complications involved when one uses both categroical and continuous indicators? 5)The ultimate goal is to test for factorial invariance between two groups. Is there any other issue that we should be carfeul about when using both categorical and continuous indicators to test for measurement invariance? Regards, 


1) can we use both categorical and continuous in the above analysis? Yes. 2)How does Mplus treat continuous indicators when some others are categorical? 3)In this case, what type of estimator is recommended? Two estimators are available in this situation  weigthed least squares or maximum likelihood. They are treated as continuous variables if that is what you mean. 4)Are there any complications involved when one uses both categroical and continuous indicators? No. 5)The ultimate goal is to test for factorial invariance between two groups. Is there any other issue that we should be carfeul about when using both categorical and continuous indicators to test for measurement invariance? For categorical indicators, thresholds and factor loadings must be freed together. You cannot free just a threshold or a factor indicator. And when free, factor means need to be fixed at zero in all groups and scale factors or residual variances need to be fixed to one in all groups depending on whether the delta or theta parameterization is used. See Examples 5.16 and 5.17 in the Mplus User's Guide and Web Note 4. 


Dear Dr Muthen, 1In TYPE=COMPLEX MISSING H1 with categorical observed variables, how the missing values are treated? 2What is the best estimator for this analysis? 3 In the analysis, I got WRMR >2, CFI>.95 and RMSEA<.05. Does it mean any serious misfit? 4In the case of categrocial observed variables, what is the interpretation of factor loadings? Is it the same as for continuous observed variables? Regards, Davood 

bmuthen posted on Friday, June 17, 2005  2:14 pm



1. with categorical outcomes, WLSMV is the default for which a pairwise present approach is taken. 2. WLSMV is good, unless the MCAR flavor of the pairwise approach needs to be replaced by MAR in which case ML should be used if possible from a computational point of view. 3. CFI is rather reliable and sounds like you are in the ballpark. 4. The loadings are probit slopes when WLSMV is used (Logit witn ML). But if you only check significance, sign and size of standardized loadings, then they can be looked at analogous to continuous indicators. 

Frank Davis posted on Sunday, November 06, 2005  6:09 am



Dr. Muthen, I plan to use either cluster analysis or LCA for my dissertation. I will have a large a relatively large data set (N=3,000) with approximately 8 continuous variables and 4 categorical variables. Do you forsee any complications? I guess this answer depends on what kind of data I have. Based upon my limited knowledge, cluster analysis doesn't handle categorical variables; however, LCA handles both categorical and continuous variables. Could you provide me a few good texts or references that would point a novice in the right direction? Thanks so much.. Frank 


This analysis should work fine. You may be interested in the following reference: Hagenaars, J.A. & McCutcheon, A.L. (2002). Applied latent class analysis. Cambridge, UK: Cambridge University Press. 


Dr. Muthen, I am looking into using Mplus for testing measurement invariance across groups (e.g. gender, age, race etc.). I have a complex survey design and would like to use a splitsample design for my analyses. Based on the user manual I noticed that the “subpopulation” option was not available for multiple group comparison, meaning that it is not possible to use this option to select half of my sample (based on a dummy differentiating between the two randomly selected subsamples). Is there any way I can solve this problem given (a) the nature of the sample design, (b) wanting to do a crossvalidation and (c) wanting to test for group differences in measurement invariance? Thank you! 


I can think of 2 approaches. I think you could do the multiplegroup analysis using mixtures with the "knownclass" option in combination with the subpopulation option. But note also that the subpopulation option often makes very little difference. 


Dr. Muthen, thank you for your quick reply. Just a short followup: 1. When you say that "the subpopulation option often makes very little difference" do you mean that it should not be problematic to use the "useobservations" option instead, even if the analysis involves a complex survey design? I am just concerned that this would be viewed as a source of bias in my analysis. 2. Based on your suggestion, I assume that mplus will not "complain" that I am doing multiple group analysis with the subpopulation command if groups are defined by the "knownclass" option rather than the "grouping" option in a "regular" CFA, correct? 3. Last (naive)question: what would be the pendant of comparing factor means between groups in CFA when switching to LCA? Thanks again for all your help! 


1. You can check out whether with your data the SUBPOPULATION option makes a difference by analyzing each group separately using the SUBPOPULATION command. 2. You would have to try this to know for sure. I think it would work. 3. The counterpart to comparing the means of a continuous latent variable would be comparing the means of the categorical latent variable. 


Hello Dr. Muthen, I am analyzing data from a national data set. The full data set contains information from youth between the ages of 12 and 20. I wish to analyze data from youth between the ages of 1518 only. Thus, I believe that I use the subpopulation (selecting data from those between the ages of 15 and 18) command to make sure the results are representative. However, I wish to analyze my data as a multigroup by age. The manual indicates that the subpopulation command cannot be used in conjunction with the multigroup. I thought I could make a dataset which removes data from individuals younger than 15 and older than 18. I would then NOT use the subpopulation command, but still include the weight, stratification, and cluster commands. 1) Would the weighting be wrong if I used this technique? I am asking because I read somewhere that subsetting concerns regarding the variance computation are only relevant to survey software using the Taylor Series. I believe that Mplus uses robust variance estimation. If what I read is true I am wondering if it is even necessary to use a subpopulation command when analyzing complex data using Mplus (regardless if doing a multigroup)? 2) If the weighting would be incorrect using the strategy described above (see above #1) then how would I conduct a multigroup only using a subsample of the full data? Thank you very much! 


I would do two single group analyses, one using the SUBPOPULATION option and one not using it to see if there is a big difference in the results. If not, I would just do the multiple group analysis without the SUBPOPULATION option. 


Hello dr. Muthen, I am using clustered data (twins) and I would like to calculate means for 3 groups of twins and decide if these means are significantly different to each other. My questions are: 1) As a first step (saturated model) can I just run the data and look at the sample statistics for the means? Or do I need to look at the model results for the means? These means are not the same. 2) Is it okay not to specify any model when I am only interested in mean scores, or is it necessary to specify the correlations like this: y1 with y2 y3 with y4 I tried both approaches, and I get different mean scores in the model results. 3) To compare the groupmeans, can I just fix the means like this: [y1](1) in the separate groupmodels and then compare the BICs of these models with the BIC of the saturated model? Can I use BIC for these purposes? 4) When I want to report group means of y1 that are corrected for covariate x (y1 ON x), should I report the means (i.e. intercepts, using centering) in the model results? I hope my questions are clear. Thank you very much in advance for your time! Sincerely, S. Robbers 


If you want to test that the means are equal to each other and equal across groups, do the following: MODEL: y1 WITH y2; MODEL g1: [y1y2] (p1); MODEL g2: [y1y2] (p2); MODEL g3: [y1y2] (p3); MODEL TEST: p2 = p1; p3 = p1; p3=p2; 


Thank you very much for you help. I tried your approach, but I get the following warning: WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX. Do you have any idea how I can solve this problem? I tried to run the model with 2 and with 4 dependent variables, but I get the message every time. Also I would like to know whether or not I can use BIC for these purposes as described in my previous post. Thank you very much in advance for your time. Sylvana 


Pleas send your input, data, output, and license number to support@statmodel.com. I am not aware that BIC can be used as a test of two nested models. 


I am using complex survey data to examine a fairly straightforward path model. All variables are dichotomous. What I'd like to do is to an "overall" test of men vs. women to (hopefully) find that the path models are significantly different. That is to say, that at least some of the path coefficients are significantly different between the two groups. I envision first doing an "overall" test of the men model vs. women model and then, having established that they are different, compare individual path coefficients to find which are different. My question is this: much of what I've been reading says that you must have measurement invariance. But, as I understand it, this would only apply to a model with a latent variable, right? Would you be able to point me in the right direction for some assistance as to how to do these 2 comparisons? I'm stuck as to what commands to use.... Further, if we did (in the future) add a latent variable to the model, how would I need to change the procedure? I anticipate that, at this point, we would test for measurement invariance across gender first.... Thanks in advance! 


You can do an analysis with all paths held equal across the groups using the parameter equality constraint feature described in the UG. Then you can see which paths are not invariant by looking at the Modification indices. Yes, measurement invariance can typically be studied only with multiple indicators, in which case the measurement part should be invariance tested first. 


Hi, I am testing for mean differences between 6 groups using clustered data (twins) while correcting for SES: USEVARIABLES= ntrid ses3 intmo3s extmo3s intfa3s extfa3s; GROUPING= grsexe (1=mboys 2=mgirls 3=lboys 4=lgirls 5=dboys 6=dgirls); MISSING= ALL (1.00); CENTERING = GRANDMEAN (ses3); CLUSTER IS ntrid; ANALYSIS: TYPE= complex; MODEL: intmo3s extmo3s intfa3s extfa3s ON ses3; !model mboys: ![intmo3s] (1); !model dboys: ![intmo3s] (1); I do chisquare diff testing while fixing the means of one var in two groups every time. I then get an MLR chisquare with 1 df, which I devide by the scaling correction factor to get the right chisquare. If this chisquare is significantly different from zero, I conclude that the means were different. FYI, the variables are not normally distributed. My first question is: is this approach correct? When I use squareroot transformed data in order to obtain normality, I get somewhat different results. How is that possible? I thought normality was not needed when using MLR. Which results should I trust? Thanks, Sylvana 


Yes, you can test mean differences in the way you describe. When you transform a variable by taking the square root of the variable, you change its relationships with other variables. This is why you get different results. For example, if linearity was approximately correct for your regression, it will not be after transformation. It is not the same as dividing a variable by a constant where all relationships remain the same. 


Thank you, this is very helpful! 


By the way, is there a paper that describes this procedure (chisquare diff testing with Mplus using MLR and the scaling corr factor), that I can use as a reference? Thanks. 


I am sorry not to put my concerns into one message, but I got a following question related to my first post yesterday 6.22am: Since I perform multiple comparisons (24 in total), do I have to control for type I error? If yes, how can I do that? Thanks, Sylvana 


See the following for a discussion of MLR difference testing: http://www.fcsm.gov/events/papers05.html With many tests, you should do some type of Bonferroni correction. I don't know of any rule but I would just use smaller pvalues. 


Hi, How can I obtain Wald test statistics for nested model with survey data from the Mplus output? 


You get that if you use the MLR estimator and Model Test (see User's Guide). 


Dr Muthen, I have a multiple group GMM (cg=2, c=3). Can I use LL difference testing to test if my 3 intercepts are equal between the 2 cg's? (test with 3 df) I suppose in that case I need to use the scaling correction factor? Or do I need to compare the nested models with BIC? Thanks. 


You can use LL difference testing with the scaling factors because you are not testing at the border of the admissible parameter space. An easier way is to use the Wald test of Model test. 


I am trying to run a twolevel multiple group path model (similar to Example 9.11 in MPLUS handbook, but not a CFA and a path model). The cluster is countries, and the grouping variable is religious affiliation. I'd like to test whether the hypothesized direct and indirect effects apply to different religious groups, when taking into account the variation at the country level. I keep on getting the error message: Cluster ID cannot appear in more than one group. Problem with cluster ID: 2504 what do you think could be the problem here? 


Multiple group analysis assumes each group contains independent observations. In multilevel modeling, this means the grouping variable should be a betweenlevel variable. If a withinlevel variable is used as a grouping variables, observations from the same cluster may be in both groups violating this assumption. 


Hi, I'm testing a relatively simple model for 2 groups: males and females. So I use gender as a grouping variable. I use type=complex to specify they're clustered in schools. The dependent is continuous. The hypothesis is that there's no difference in coefficients between males and females. So I compared a model with no constraints and a model with only a different intercept for each group, everything else constrainted. 1. Am I correct to think that the chisquare test is a formal test for all constraints at once? 2. Can I test every constraint seperately? I would like to know for every coefficient if it's significantly different between boys and girls. I could do this by comparing the full (no constraints) model with all possible models with one constraint. But could I also do this with single command in Mplus? Thanks a lot, Ruben. 


1. Yes. 2. You can do one at a time. There is no automatic way to do this. You can ask for modification indices when you test all at the same time to see which coefficients have the largest modification indices. 

sam posted on Wednesday, September 07, 2011  9:32 pm



Hi, I have two models from two different scenarios (i.e., one scenario induces good intention and another scenario induces negative intention). Each model has five variables. The variables are the same, except for one variable. That is, the type of consequences is different for each of the two models. The paths of these two models are exactly the same. My questions are: (1) Can I compare these models? I am interested in testing the effect of intention on the responses. (2) Do I need to transform one of the variables in one model that has problems with skewness and kurtosis? If I transform this variable, will I still be able to compare the models? Thank you. 


1. No, not statistically unless you have exactly the same set of variables, but perhaps substantively. 2. No, use MLR which is robust to nonnormality. 

sam posted on Thursday, September 08, 2011  6:58 pm



Thank you very much for the quick response. For the second question, all of the variables have skewness between 0.1 and 1.5 and kurtosis between 0.5 and 1.5, except for one latent variable that its skewness is above 1.5 and kurtosis about 2. When I ran the model, fit indices are inconclusive. SRMR is 0.058 and RMSEA is 0.051, which suggest that the model has a good fit. However, CFI is only 0.923, below the conventional cutoff value of 0.95. Can I conclude that the mixed results are because one variable is too skewed and high in kurtosis? Is it appropriate to transform that one variable? Thank you very much. 


An important consideration is what the MLR chisquare test says. 


Dear Dr. Muthén, I am running a multiple group analysis (grouping is male vs. female) and was using the type=complex command to take into account that we have MZ and DZ twin pairs in our data. I got an error message (see below) and I have two thoughts about it. We also have single twins (only one member of a pair) in our dataset so that there are some cluster with n=1 observations. Is that a problem? And second: Is there a problem for the correction algorithm when there are a lot of cluster (177) with only few (in our case 2) observations? THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.422D17. PROBLEM INVOLVING PARAMETER 107. THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER. I really appreciate your thoughts! Thanks, Marion 


Having one or two members of a cluster is not a problem. The message says you have more parameters than independent pieces of information in your data. That is you have more parameters than the number of clusters minus the number of strata with more than one cluster. The impact of this on your results is not known. You would need to do a Monte Carlo study for your situation to see this. 


Thanks for the quick answer, Linda! 

Stata posted on Tuesday, September 18, 2012  8:37 pm



Dr. Muthen, I only incorporated weight variable for MCFA without stratification and cluster commands. Would it be alright if I use MLM instead of MLR (with type=general) if the data is not normally distributed? Thank you. 


MLR is also robust to nonnormality. I think MLR is preferable to MLM. MLM does not allow missing data. 


Dr. Muthén, I'm running a measurement model invariance analysis with categorical data obtained from players clustered within teams. I want to test invariance across two age groups (younger/older than 12) and I'm not sure on how to handle the fact that some teams have only younger players, other teams have only older players and other teams have both younger and older players. Could you suggest a proper way to deal with this data? Or maybe recommend something similar to your webnote 16 adressing suitable models for measurement model invariance in this case? Thank you. 


I think you would take the approach described in Web Note 16. 


Thanks for your answer. When I run the models described in Web Note 16, I'm not able to see how to conclude that item factor loadings/thresholds are invariant or not invariant between age groups. Could you elaborate a bit more on this point or suggest a more detailed reading, please? 


See either the results or TECH1 to know what is held equal and what is not held equal across groups. 


Hello! I have tested meausurement invariance between groups (5 ability groups). The model is a secondorder CFA. Because students are nested in classes, I controlled for the cluster information(class) with Type=complex.I've got the following error message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.303D15. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 49, Group 1: K WITH M THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER. The problem is that i couldn`t discover any problem with parameter 49. I did the same analysis with type=general and no error message appeared. Can I trust the results with type is complex? Thanks! 


With clustered data, independence of observations is at the cluster level. You must have 49 clusters. The message tells you that your model has more parameters than it has clusters. The effect of this on model results has not been wellstudied. This is a warning. 

Back to top 