Message/Author 

Anonymous posted on Friday, August 19, 2005  10:56 am



I think my question may be too simple and that's why I can't find an example. In chapter 9 of the user's manual, it describes two ways of dealing with clustered sampling  type = complex or type= twolevel. Since I am only worried about properly taking into account nonindependence of samples and calculating correct standard errors, it seems that type = complex is most appropriate, but there are only examples for the twolevel input. I have a model relating variables collected on individuals (plants) in three separate plots. There may be nonindependence between individuals within plots because of spatial proximity. I have two questions: 1) is type = complex appropriate for only three clusters (cluster = plot)? 2) Is there such an example in the manual or on the website? I just wanted to confirm the language to be used. Would it be just include a cluster variable in the NAMES ARE list and then defining which one is the cluster variable (ex. CLUSTER = plot) and Mplus does the rest? Since there is no covariate (w), it seems I would not include in "within" and "between" model input. Your help is much appreciated. Also, incidentally, the website is fantastic and the online lectures are a great resource. 

bmuthen posted on Friday, August 19, 2005  4:40 pm



1). No, the Type = Complex SEs only work well with at least 20 clusters. Twolevel modeling also requires at least that many clusters. Maybe there are other approaches in the literature of your area (randomization approaches?). 2)Two things:  Cluster = ...;  Type = Complex; I am glad the web site materials are of use. 

HeeJin Jun posted on Friday, February 02, 2007  12:27 pm



I have somewhat related question as above. We have data of kids, some of them have siblings but most of them don't. In this case, can we use type=complex cluster=idm (mother's id) Other commands are: ANALYSIS: TYPE = MIXTURE MISSING COMPLEX; STARTS = 1000 500; STITERATIONS = 20; ESTIMATOR = MLR; MODEL: %OVERALL% i s  cwkc97@0 cwkc98@1 cwkc99@2 cwkc01@4 cwkc03@6; We tried this and got the error messages. (to be continued...) 

HeeJin Jun posted on Friday, February 02, 2007  12:28 pm



(continued from the previous) We tried this and got the error messages. WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. THE CONDITION NUMBER IS 0.105D+03. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OPTION. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 1. Should I increase the starting value even more, or does the problem lie somewhere else? Thanks a lot for your help. 


Please send your input, data, output, and license number to support@statmodel.com. 

jks posted on Monday, February 22, 2010  3:42 pm



I have two questions about the complex survey data analysis. The questions are: 1) WHAT IS THE EXACT STATISTIC USED WHEN ESTIMATOR=MLR IS USED IN MPLUS 5.1? 2) WHERE CAN I GET THIS STATISTIC? Another related question is WHAT IS THE DEFAULT ESTIMATOR WHEN TYPE=COMPLEX IS USED IN MPLUS 5.1? Thanks in advance. 


Do you mean the chisquare test of model fit, or do you mean the standard errors? MLR is the default with Type=Complex. 

jks posted on Monday, February 22, 2010  11:38 pm



Thanks for your reply. I mean the chisquare test of model fit for the questions: 1) WHAT IS THE EXACT STATISTIC USED WHEN ESTIMATOR=MLR IS USED IN MPLUS 5.1? 2) WHERE CAN I GET THIS STATISTIC? 3) AM I CORRECT TO SAY: TYPE=COMPLEX; AND ESTIMATOR=MLR; USE THE SAME STATISTICS FOR CHISQUARE TESTS OF MODEL FIT? I THINK THE DIFFERENCE IS THAT TYPE=COMPLEX; TAKES INTO ACCOUNT THE EFFECTS OF STRATIFICATION, CLUSTERING BUT ESTIMATOR=MLR DOES NOT DO THAT. AS WELL BOTH CAN TAKE THE WEIGHTS. 


1) See http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf 2) Mplus gives it for mean and covariance structure models, but not for models where those are not sufficient statistics. 3) Yes, Type=complex also takes into account stratification, weights, and clustering. MLR can be used also without all that. 

jks posted on Wednesday, February 24, 2010  2:49 pm



Thanks. These answers are very helpful. I am running two models and conducting chisquare difference tests to establish measurement invariance for two groups in each model. Model 1, using: weight=weight; cluster=cluster; Type=complex; Model 2, using: weight=weight; Estimator=MLR; Questions: 1) How can I conduct the chisquare difference tests for Model 1? 2) For Model 2, the procedure for conducting chisquare difference tests is outlined in the statmodel.com website as *Compute the SatorraBentler scaled chisquare difference test (TRd) as follows: TRd = (T0  T1)/cd where, T0 and T1 are the regular chisquare values. My question is: Are T0 and T1 obtained using maximum likelihood (ML) method? 3) I ran Model 1 and Model 2 for same data. Parameter estimates (e.g., factor loadings) are same but chisquare, RMSEA and CFI values are different for two models. Are the results correct? 


The sample weight affects the parameter estimates. That is why the parameter estimates are the same. Clustering affects standard errors and fit statistics. That is why the standard errors and fit statistics are different. I think you are using MLR in both analyses. Therefore, you would use the same test as for Model 2. Yes, T0 and T1 refer to ML. 

jks posted on Friday, February 26, 2010  10:07 pm



Yes, I am using MLR in both model 1 and Model 2: Model 1, using: weight=weight; cluster=cluster; Type=complex; Model 2, using: weight=weight; Estimator=MLR; Question: 1) Can I use the same SatorraBentler scaled chisquare difference test (below) for both models? *SatorraBentler scaled chisquare difference test (TRd) as follows: TRd = (T0  T1)/cd where, T0 and T1 are the regular chisquare values. 


Yes. The test depends on the estimator and the estimator is MLR in both cases. 


Hi When I am doing a multiple group cfa with type = complex, using weight, cluster, and stratum. The tech 1 output shows lambda being estimated in the beta matrix rather than the lambda matrix. And on another computer with a similar model, the lambda are in the lambda matrix. Both have mplus v 6.12. What could be the issue? Or what more information do you need for us to send? Thank you. 


If you do the exact same model on both computers, the lambda will be in the same matrix. I would need to see both outputs and your license number at support@statmodel.com to say why the matrix shift occurred. This is, however, not something to be concerned about. 


I am doing a MGCFA to test for configural invariance across five different racial/ethnic groups. Since I am using Type Complex in the analysis, and the estimator Mplus uses is MLR, I don't understand the instructions on this page http://www.statmodel.com/chidiff.shtml to calculate the SatorraBentler chi square from the output I get. I typically see the Satorra Bentler reported. When using MLR (Type Complex) the User's Guide says it produces a chisquare test statistic that is asymptotically equivalent to a Yuan Bentler T2* statistic. Is this the statistic I should report in the table instead of the SatorraBentler? Thanks for your help. 


Dear Dr. Linda Muthen, I am doing a multilevel modeling with clusters(school ids) using Type=Complex, but I am wondering that how Mplus calculates the estimate of variance. As my knowledge, Mplus aggregates the data to calculate variance, but I don't know how Mplus does it. 


Geehong: We use a sandwich estimator. See the following paper which is available on the website: Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. 


Yes. You should see the Mplus Version 7.1 Language Addendum on the website with the user's guide. We have automated testing for measurement invariance. 


Dear Dr. Linda Muthen, I have read the paper of "Complex sample data in structural equation modeling. Sociological Methodology" as you said, but I still wondering that how the inverse selection probability W_ijkl in eq. (18) are calculated. I think this paper didn't mention it. Thanks in advance. 


Take a look at Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411434. and other complex survey papers on our website. 

Cecily Na posted on Monday, September 23, 2013  8:33 am



Dear Professor, I have a data set with strata (cities), and within each stratum (city) simple random sampling is used. Is this stratified sampling or can it be considered cluster sampling? Can I still use type=complex and MLR? If not, what is the type and estimator to use to take account the nonindependence? Thanks a lot! 

Cecily Na posted on Monday, September 23, 2013  10:08 am



Hello Professors, I want to follow up with my last question. I would like to know if type=complex and MLR can also apply to stratified sampling design. thanks. 


You can use type = complex and MLR for a stratified sampling design, that is, when this is the only complex survey feature. How many cities do you have? 

Cecily Na posted on Monday, September 23, 2013  4:15 pm



Hi Bengt About 30 cities. Actually, these cities are all the primary units of the population. Should i use the commands stratification = city, type=complex, and Estimator = MLR Is it right? Thank you very much. 


Yes. 

Cecily Na posted on Monday, September 23, 2013  5:52 pm



Hi Bengt, Thank you. In the above example, by theory, how would the results be different if cluster=city is specified, instead of stratification = city? I think type=complex and estimator = MLR together address the nonindependence of sampling, how do they do differently for strata vs. clusters? Thanks a lot! 


Cluster=city increases the standard errors and stratification = city decreases the standard errors. If the population you are studying has 30 cities you have to use stratification = city. If the population has many more cities and only 30 were sampled (at random or another method) then you have to use cluster=city. 

Cecily Na posted on Wednesday, September 25, 2013  6:30 pm



Thank you, Tihomir, for your clarification above. Suppose that the whole population has 30 cities, and they're all included in my sample (so they're strata), how can I use the multilevel modeling approach when Type = twolevel can be only used with cluster = city, not with stratification = city? Thank you! 


You should use TYPE = COMPLEX; STRATIFICATION = city; 

Cecily Na posted on Thursday, September 26, 2013  1:21 pm



Hi Linda, I understand the above syntax. I guess my question is: if I need to incorporate citylevel (cities are strata) variables and individual level variables, how can I distinguish city level effects from individual level effects in a model without using type=twolevel? Thank you very much! 


You treat the city variables as individual variables. Your standard errors will be correct. 

una posted on Wednesday, November 06, 2013  2:15 am



Dear Prof. Muthen, Participants in my study were recruited from 12 schools across five provinces. My analysis is crosslagged modeling using MLR estimator. 1. Can I handle correlation of responses from participants who are nested within the same schools and provinces by clustering on school ID (type is complex; cluster = schoolID) and use dummy variables to represent provincial effects as control variables in the analysis? 2. Or do I have too little clusters to estimate this, as written in the aforementioned answer (first answer: he "Type = Complex SEs only work well with at least 20 clusters.") Thank you very much in advance 


You have too few units for TYPE=COMPLEX. I would create 11 dummy variables and use them as covariates to control for nonindependence of observations due to nesting in schools. I would treat the five provinces as fixed effects and use them as multiple groups. 

una posted on Thursday, November 07, 2013  6:23 am



Dear Prof. Muthen, Thank you for this reply. 1. Do you refer to the statement "grouping = provinces" for treating the provinces? Should I use this in combination with "type is twolevel"? 2.With regard to the fact that I have too few units for this type = complex, do you have an article to refer to? Kind regards, Caroline 


You will have a singlelevel model with dummy variables as covariates. See work by Joop Hox. 


Dear Drs. Muthen, I would love to get your advice on the appropriateness of my statistical approach. I have a dataset with 835 participants; each participant provided data on 45 (continuous) variables. The 835 participants actually come from 7 different studies, which range in sample size from 40 to 346 participants. Research Question: I am interested in exploring the factor structure of the 45 continuous indicators. I would like to take into account the nonindependence due to the fact that participants come from 7 different samples. Thus, I am using "type = complex efa" in my analysis. 1) Would you agree that "type = complex efa" approach is correct, given my question and data? Specifically, I am concerned that 7 clusters would not be sufficient for this type of analysis, and that the number of indicators (i.e., 45) is too high. 2) If the approach I am currently taking is problematic, do you have any thoughts on another approach that might be more appropriate? Sincerely, Natalia 


7 studies is not enough for Type=Complex. You can instead use 6 dummy covariates to represent study differences. EFA with covariates can be done using "ESEM"  see our website's left margin Under Special Mplus Topics for writings on this. 


Thank you! 


Using COMPLEX data, I am running a structural model with a latent variable predicting a categorical outcome (binary). I have a few questions: 1. What is the default estimator used by Mplus7 for type=complex data and a categorical dependent variable? 2. I am indicating in the input that my dependent variable is categorical, so I imagined that the program would be running a logistic regression but it won't give me odds ratios which makes me think it's running a different regression. Is that the case? 3. After doing some reading, I realized I should probably be using the Estimator is MLR command. I ran the model with this command and it gives me different estimates and ODDS ratio, but it does not give me fit indexes. What is the best estimator to be used in this case? 


The default estimator is WLSMV. This gives probit regression so no odds ratios. MLR give logistic regression. However, with MLR and categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation so chisquare and related fit statistics are not available. You can use either estimator. 


Thank you very much for your fast response. 


Hello, Sorry for asking a rookie question, but I want to make sure I have this right. If I use TYPE = COMPLEX and do not specify any weights, then is it the case that all clusters have an equal impact on the analysis, even if there are more observations from some clusters than others? If not, then what do I need to specify in order to make it so? Many thanks in advance! Viv. 


Q1. No Q2. I don't see why you would want to. 


Dear Bengt, Thanks for your quick response. I am conducting an RIEFA on data from many cultural samples. In principle, I would weight all of the samples equally, but some are larger than others by accidents of the data collection. Some samples are larger than others because our collaborators in those places happened to collect more data, but the size of the samples has no theoretical meaning, and I would not want those samples to disproportionately drive the results of the EFA. Does this make sense and, if so, what commands do I need to put it into practice? Many thanks! Viv. 


I have twolevel data and used type = complex to address clustering. It is a simple regression that includes both level 1 and 2 predictors in the model. One reviewer has questioned the appropriateness of using type = complex when there are level 2 variables in the model stating that we may be losing some important information by collapsing both level 1 and 2 predictors into 1 model and just using a correction for the SE. The specific comment written is that "Typically when you are including level 2 predictors in a multilevel model, you partition the variance in the outcome (between/within variability). Then you examine how much your Level 2 predictor is explaining the between (or Level 2 variance) in this outcome. When you use Type = complex, you are not partitioning variance (level 1/ 2) and so you cannot model the extent to which the Level 2 predictor is accounting for the classroomlevel variance in children’s outcomes." In our first resubmission I attempted to explain the adjustment  he/she has asked for a citation indicating that you can include a level 2 predictor (e.g. Classroom Quality) in the regression models. Could you point me to a specific citation that would address this reviewers concern? not sure that Muthén & Satorra,1995 does so adequately for this example. And/or guide me in how to respond as my first attempt has proven unsuccessful? 


MuthenSatorra talk about aggregatable and nonaggregatable models and is to some extent applicable (you can use aggregatable models with Complex but need Twolevel for nonaggregatable models. I think this is easy for you to handle. Just switch to Twolevel. The reviewer has a point that if you want to fully use the level 2 predictor it should be modeled on Between and predict the level2 portion of the level1 variable (namely its random intercept). It's simple to reanalyze  just follow UG ex 9.1. 


I reanalyzed the data using type = twolevel. The results are largely consistent in terms of sig/nonsig predictors. My concern is that fit (RMSEA, CFI, TLI) is now terrible whereas it was acceptable before when using type= complex. Could you comment on why these fit statistics would change so much when resulting model estimates are fairly similar? 


With COMPLEX, you do not model betweenlevel parameters and with TWOLEVEL you do. You between model may have misspecifications. 


Hello, To follow up on my rookie question a few months ago, what do I need to do in order to weight different sized clusters equally in a Type = complex analysis? In other words, I want to create a situation where each cluster has an equal impact on the analysis rather than each case having an equal impact on the analysis. Thanks! 


If you use type=twolevel that is already done. The mean a variable is the mean of the cluster means, not the mean of teh observations. If you are using type=complex you can use weights to adjust this effect. 


Thanks  that's extremely useful! Viv. 

Okyoung, JI posted on Wednesday, January 25, 2017  1:35 am



hello, i have a question about diference between type=twolevel and type=twolevel random does type=twolevel mean fixed intercept, fixed slope? i know that type=twolevel random could have random intercept and fixed slope or random intercept and random slope. please let me know the difference between them. 


TWOLEVEL has a random intercept. See Example 9.1. If you want a random slope and a random intercept, you must use TWOLEVEL RANDOM and the model specification shown in Examplel 9.2. 


Hi Drs. Muthen, I am working on a GMM, using data from 7 schools. Per other comments on the discussion board, I am not using a cluster variable, but instead have created school level dummy variables. I have included these dummy variables in the GMM model to identify classes, by regressing I & S on dummy vars. After finding best fitting number of classes, I have exported a class membership variable to include in multinomial logistic regression models to examine predictors of class membership. Should I also include school dummy variables in these logistic regressions or should I consider school level differences already controlled for in the class outcome? Thanks for your help! 


That's a tricky question. It depends on whether you think the school dummies influence only the class membership (1) or also i and s directly (2). In case (1), you wouldn't include the school dummies in your Step 1. In case (2), I don't think the model is identified in Step 2 if you have the school dummies influence both the latent class variable and i and s directly. I would either do a 1step analysis or take approach (1). 


I'm interested in looking at the interaction between an L1 and L2 predictor. My understanding is that to do this, a random slope must first be specified for the L1 predictor. However, I've recently read that with only 2 individuals per group, there aren't enough lower level data points to allow the slopes to vary from cluster to cluster. I'm also not particularly interested in parsing within group effects from between group effects only accounting for nonindependence. (e.g., my L1 predictor is a personality trait, and I'm interested in the predictive power of differences in the trait across the sample as a whole, as opposed to within or between the clusters). Given these two conditions, would it make more sense to use type = complex? MLM seems like the more popular approach in the literature, but how would I set up a model like this without the random slope? 


I think you can use Type=complex and created L1*L2 in Define. 


Hi Drs Muthen, I am currently testing the validity of an emotional competence questionnaire using a sample of athletes. The athletes are nested within 19 teams. The teams have anywhere from 4 to 73 athletes. I am using both CFA and ESEM approaches in Mplus to examine the psychometric properties of the questionnaire, which has 50 items and 10 expected factors. Given that we have nested data, I have been researching the TYPE=COMPLEX option of the ANALYSIS command. I only want to account for the fact that the athletes are nested within teams; I am not interested in evaluating the data at the team level. I was hoping you might be able to answer a few questions. 1. I have read on this and a separate Mplus discussion threads that having 20 clusters is a rule of thumb to be able to use TYPE=COMPLEX, do you think it would be acceptable to run my analyses using TYPE=COMPLEX with 19 teams? 2. If using TYPE=COMPLEX is acceptable, I understand that I would include TYPE=COMPLEX under the ANALYSIS command and CLUSTER=teams under the VARIABLE command. Is this correct given that I only want to account for the fact that the athletes are nested in teams and do not want to analyze the data at the team level? 3. Do I also need to include the WEIGHT option of the VARIABLE command? Thank you for your time and assistance! 


1. Barely acceptable, but probably better than ignoring the clustering. 2. Yes. 3. No. Unless weights are available and deemed important. 


Hi Dr. Muthen, Thank you for your prompt response, I sincerely appreciate you taking the time to respond to my enquiries. Ashley 


Dr. Muthen, In relation to my post and your response above (dated March 16), I ran the CFA model with TYPE=COMPLEX (and CLUSTER=teams) and without this syntax. When I ran the CFA with TYPE=COMPLEX I received the following warning (in part): THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. In addition my output read: THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER. I did some digging and came across a post on the Nonpositive definite matrix discussion thread from Elise on June 24, 2011 who encountered a similar issue. The response from Dr. Linda Muthen was: The number of clusters is the number of independent observations in your data set. The warning is telling you that you have more parameters than you have independent observations. The impact of this on the results has not been studied. This is simply a warning. Question: As the impact has not been studied, I am wondering if my data would be OK to run without TYPE=COMPLEX. There appear to be only minor changes in fit indices: fit indices with TYPE=COMPLEX: RMSEA .056; CFI .766; TLI .746; SRMR .084 fit indices without TYPE=COMPLEX: RMSEA .054; CFI .770; TLI .750; SRMR .084 Thank you for your time! 


The question is if the SEs are similar. Make sure you have at least 20 clusters. The number of clusters should be larger than the number of parameters specific to the cluster level. 


Hi Dr. Muthen, Thank you for your prompt response. I hope you don't mind me asking a few more followup questions. I have 19 clusters (i.e., teams). You previously noted that this is "barely acceptable, but probably better than ignoring the clustering." I went back to check the SEs and there is a bit more variation than with the fit indices. For example: Model Results (With TYPE=COMPLEX) F1 BY EC1 Estimate = 1.000 S.E. = 0.000 EC2 Estimate = 1.349 S.E. = 0.372 EC3 Estimate = 1.778 S.E. = 0.678 EC4 Estimate = 1.626 S.E. = 0.389 EC5 Estimate = 1.767 S.E. = 0.519 Model Results (without TYPE=COMPLEX) F1 BY EC1 Estimate = 1.000 S.E. = 0.000 EC2 Estimate = 1.349 S.E. = 0.335 EC3 Estimate = 1.778 S.E. = 0.804 EC4 Estimate = 1.626 S.E. = 0.343 EC5 Estimate = 1.767 S.E. = 0.635 Do you know of any "standards" to go by to decide whether or not those changes in SEs warrant the use of TYPE=COMPLEX? What is meant by "The number of clusters should be larger than the number of parameters specific to the cluster level"? Since I am receiving the warning in Mplus that the standard errors of the model parameter estimates may not be trustworthy, is it advisable, if the SEs don't drastically change, to run the analyses without TYPE=COMPLEX? 


Q1: No. Perhaps you can see if it makes a difference in terms of significance. Q2: See the Betweenlevel part of Tech 1  which free parameters appear only there (check the numbers printed). Q3: Perhaps; there are no established rules for this type of case. 


Thank you! 


Hi, http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf This link no longer works. I'd like to access this paper to learn about comparing models when type=complex. thank you 


Hi! I have administered 6 surveys to 6 groups of respondents. Sometimes, however, the same respondent participated in 23 surveys, creating nonindependence. I'm not sure how many, but it would be less than 20 (for clustering purposes) and likely less than 10 (these were small surveys, so dropping dependent responses is also not desirable). If I'm reading this thread correctly, is it the case that MLR with dummy codes for which survey (5 dummy codes) would be the appropriate way to handle the nonindependence? Can this be done with TYPE=COMPLEX but not signifying a weight or cluster or sampling variable? Or should type=complex just be omitted? 


Type= Complex requires at least 20 clusters. Otherwise, use dummy covariates. 

shonnslc posted on Thursday, September 05, 2019  1:58 pm



Hi, By reading previous comments, it seems that TYPE=COMPLEX is not feasible when the number of clusters is too low. And it seems that one of the solutions would be to bring dummies into the model. I am doing CFA with nested data structure (n of clusters = 8). Does that mean I can regress my latent factors on those dummies to account for nested data structure? Isn't that very similar to the MIMIC model? Thanks. 


Yes and yes. 

Back to top 