Anonymous posted on Friday, August 19, 2005 - 10:56 am
I think my question may be too simple and that's why I can't find an example. In chapter 9 of the user's manual, it describes two ways of dealing with clustered sampling - type = complex or type= twolevel. Since I am only worried about properly taking into account non-independence of samples and calculating correct standard errors, it seems that type = complex is most appropriate, but there are only examples for the twolevel input.
I have a model relating variables collected on individuals (plants) in three separate plots. There may be non-independence between individuals within plots because of spatial proximity. I have two questions:
1) is type = complex appropriate for only three clusters (cluster = plot)?
2) Is there such an example in the manual or on the website? I just wanted to confirm the language to be used. Would it be just include a cluster variable in the NAMES ARE list and then defining which one is the cluster variable (ex. CLUSTER = plot) and Mplus does the rest? Since there is no covariate (w), it seems I would not include in "within" and "between" model input.
Your help is much appreciated. Also, incidentally, the website is fantastic and the online lectures are a great resource.
bmuthen posted on Friday, August 19, 2005 - 4:40 pm
1). No, the Type = Complex SEs only work well with at least 20 clusters. Twolevel modeling also requires at least that many clusters. Maybe there are other approaches in the literature of your area (randomization approaches?).
- Cluster = ...; - Type = Complex;
I am glad the web site materials are of use.
Hee-Jin Jun posted on Friday, February 02, 2007 - 12:27 pm
I have somewhat related question as above. We have data of kids, some of them have siblings but most of them don't. In this case, can we use
We tried this and got the error messages. (to be continued...)
Hee-Jin Jun posted on Friday, February 02, 2007 - 12:28 pm
(continued from the previous)
We tried this and got the error messages.
WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. THE CONDITION NUMBER IS -0.105D+03. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OPTION.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 1.
Should I increase the starting value even more, or does the problem lie somewhere else?
Do you mean the chi-square test of model fit, or do you mean the standard errors?
MLR is the default with Type=Complex.
jks posted on Monday, February 22, 2010 - 11:38 pm
Thanks for your reply. I mean the chi-square test of model fit for the questions: 1) WHAT IS THE EXACT STATISTIC USED WHEN ESTIMATOR=MLR IS USED IN MPLUS 5.1?
2) WHERE CAN I GET THIS STATISTIC?
3) AM I CORRECT TO SAY: TYPE=COMPLEX; AND ESTIMATOR=MLR; USE THE SAME STATISTICS FOR CHISQUARE TESTS OF MODEL FIT? I THINK THE DIFFERENCE IS THAT TYPE=COMPLEX; TAKES INTO ACCOUNT THE EFFECTS OF STRATIFICATION, CLUSTERING BUT ESTIMATOR=MLR DOES NOT DO THAT. AS WELL BOTH CAN TAKE THE WEIGHTS.
2) Mplus gives it for mean and covariance structure models, but not for models where those are not sufficient statistics.
3) Yes, Type=complex also takes into account stratification, weights, and clustering. MLR can be used also without all that.
jks posted on Wednesday, February 24, 2010 - 2:49 pm
Thanks. These answers are very helpful. I am running two models and conducting chi-square difference tests to establish measurement invariance for two groups in each model. Model 1, using: weight=weight; cluster=cluster; Type=complex; Model 2, using: weight=weight; Estimator=MLR; Questions: 1) How can I conduct the chi-square difference tests for Model 1? 2) For Model 2, the procedure for conducting chi-square difference tests is outlined in the statmodel.com website as
*Compute the Satorra-Bentler scaled chi-square difference test (TRd) as follows: TRd = (T0 - T1)/cd where, T0 and T1 are the regular chi-square values. My question is: Are T0 and T1 obtained using maximum likelihood (ML) method? 3) I ran Model 1 and Model 2 for same data. Parameter estimates (e.g., factor loadings) are same but chi-square, RMSEA and CFI values are different for two models. Are the results correct?
The sample weight affects the parameter estimates. That is why the parameter estimates are the same. Clustering affects standard errors and fit statistics. That is why the standard errors and fit statistics are different.
I think you are using MLR in both analyses. Therefore, you would use the same test as for Model 2. Yes, T0 and T1 refer to ML.
jks posted on Friday, February 26, 2010 - 10:07 pm
Yes, I am using MLR in both model 1 and Model 2: Model 1, using: weight=weight; cluster=cluster; Type=complex; Model 2, using: weight=weight; Estimator=MLR; Question: 1) Can I use the same Satorra-Bentler scaled chi-square difference test (below) for both models?
*Satorra-Bentler scaled chi-square difference test (TRd) as follows: TRd = (T0 - T1)/cd where, T0 and T1 are the regular chi-square values.
When I am doing a multiple group cfa with type = complex, using weight, cluster, and stratum. The tech 1 output shows lambda being estimated in the beta matrix rather than the lambda matrix. And on another computer with a similar model, the lambda are in the lambda matrix. Both have mplus v 6.12. What could be the issue? Or what more information do you need for us to send?
If you do the exact same model on both computers, the lambda will be in the same matrix. I would need to see both outputs and your license number at email@example.com to say why the matrix shift occurred. This is, however, not something to be concerned about.
I am doing a MG-CFA to test for configural invariance across five different racial/ethnic groups. Since I am using Type Complex in the analysis, and the estimator Mplus uses is MLR, I don't understand the instructions on this page http://www.statmodel.com/chidiff.shtml to calculate the Satorra-Bentler chi square from the output I get. I typically see the Satorra Bentler reported. When using MLR (Type Complex) the User's Guide says it produces a chi-square test statistic that is asymptotically equivalent to a Yuan Bentler T2* statistic. Is this the statistic I should report in the table instead of the Satorra-Bentler? Thanks for your help.
I am doing a multilevel modeling with clusters(school ids) using Type=Complex, but I am wondering that how Mplus calculates the estimate of variance. As my knowledge, Mplus aggregates the data to calculate variance, but I don't know how Mplus does it.
I have read the paper of "Complex sample data in structural equation modeling. Sociological Methodology" as you said, but I still wondering that how the inverse selection probability W_ijkl in eq. (18) are calculated. I think this paper didn't mention it. Thanks in advance.
Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411-434.
and other complex survey papers on our website.
Cecily Na posted on Monday, September 23, 2013 - 8:33 am
Dear Professor, I have a data set with strata (cities), and within each stratum (city) simple random sampling is used. Is this stratified sampling or can it be considered cluster sampling? Can I still use type=complex and MLR? If not, what is the type and estimator to use to take account the non-independence? Thanks a lot!
Cecily Na posted on Monday, September 23, 2013 - 10:08 am
Hello Professors, I want to follow up with my last question. I would like to know if type=complex and MLR can also apply to stratified sampling design. thanks.
Cecily Na posted on Monday, September 23, 2013 - 5:52 pm
Hi Bengt, Thank you. In the above example, by theory, how would the results be different if cluster=city is specified, instead of stratification = city? I think type=complex and estimator = MLR together address the non-independence of sampling, how do they do differently for strata vs. clusters? Thanks a lot!
Cluster=city increases the standard errors and stratification = city decreases the standard errors.
If the population you are studying has 30 cities you have to use stratification = city. If the population has many more cities and only 30 were sampled (at random or another method) then you have to use cluster=city.
Cecily Na posted on Wednesday, September 25, 2013 - 6:30 pm
Thank you, Tihomir, for your clarification above. Suppose that the whole population has 30 cities, and they're all included in my sample (so they're strata), how can I use the multilevel modeling approach when Type = twolevel can be only used with cluster = city, not with stratification = city? Thank you!
Cecily Na posted on Thursday, September 26, 2013 - 1:21 pm
Hi Linda, I understand the above syntax. I guess my question is: if I need to incorporate city-level (cities are strata) variables and individual level variables, how can I distinguish city level effects from individual level effects in a model without using type=twolevel? Thank you very much!
You treat the city variables as individual variables. Your standard errors will be correct.
una posted on Wednesday, November 06, 2013 - 2:15 am
Dear Prof. Muthen, Participants in my study were recruited from 12 schools across five provinces. My analysis is cross-lagged modeling using MLR estimator.
1. Can I handle correlation of responses from participants who are nested within the same schools and provinces by clustering on school ID (type is complex; cluster = schoolID) and use dummy variables to represent provincial effects as control variables in the analysis?
2. Or do I have too little clusters to estimate this, as written in the aforementioned answer (first answer: he "Type = Complex SEs only work well with at least 20 clusters.")
You have too few units for TYPE=COMPLEX. I would create 11 dummy variables and use them as covariates to control for non-independence of observations due to nesting in schools. I would treat the five provinces as fixed effects and use them as multiple groups.
una posted on Thursday, November 07, 2013 - 6:23 am
Dear Prof. Muthen,
Thank you for this reply.
1. Do you refer to the statement "grouping = provinces" for treating the provinces? Should I use this in combination with "type is twolevel"?
2.With regard to the fact that I have too few units for this type = complex, do you have an article to refer to?
I would love to get your advice on the appropriateness of my statistical approach.
I have a dataset with 835 participants; each participant provided data on 45 (continuous) variables. The 835 participants actually come from 7 different studies, which range in sample size from 40 to 346 participants.
Research Question: I am interested in exploring the factor structure of the 45 continuous indicators. I would like to take into account the non-independence due to the fact that participants come from 7 different samples. Thus, I am using "type = complex efa" in my analysis.
1) Would you agree that "type = complex efa" approach is correct, given my question and data? Specifically, I am concerned that 7 clusters would not be sufficient for this type of analysis, and that the number of indicators (i.e., 45) is too high.
2) If the approach I am currently taking is problematic, do you have any thoughts on another approach that might be more appropriate?
7 studies is not enough for Type=Complex. You can instead use 6 dummy covariates to represent study differences. EFA with covariates can be done using "ESEM" - see our website's left margin Under Special Mplus Topics for writings on this.
Using COMPLEX data, I am running a structural model with a latent variable predicting a categorical outcome (binary). I have a few questions: 1. What is the default estimator used by Mplus7 for type=complex data and a categorical dependent variable? 2. I am indicating in the input that my dependent variable is categorical, so I imagined that the program would be running a logistic regression but it won't give me odds ratios which makes me think it's running a different regression. Is that the case? 3. After doing some reading, I realized I should probably be using the Estimator is MLR command. I ran the model with this command and it gives me different estimates and ODDS ratio, but it does not give me fit indexes. What is the best estimator to be used in this case?
The default estimator is WLSMV. This gives probit regression so no odds ratios. MLR give logistic regression. However, with MLR and categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation so chi-square and related fit statistics are not available. You can use either estimator.
I am conducting an RI-EFA on data from many cultural samples. In principle, I would weight all of the samples equally, but some are larger than others by accidents of the data collection.
Some samples are larger than others because our collaborators in those places happened to collect more data, but the size of the samples has no theoretical meaning, and I would not want those samples to disproportionately drive the results of the EFA.
Does this make sense and, if so, what commands do I need to put it into practice?
I have two-level data and used type = complex to address clustering. It is a simple regression that includes both level 1 and 2 predictors in the model.
One reviewer has questioned the appropriateness of using type = complex when there are level 2 variables in the model stating that we may be losing some important information by collapsing both level 1 and 2 predictors into 1 model and just using a correction for the SE.
The specific comment written is that "Typically when you are including level 2 predictors in a multilevel model, you partition the variance in the outcome (between/within variability). Then you examine how much your Level 2 predictor is explaining the between (or Level 2 variance) in this outcome. When you use Type = complex, you are not partitioning variance (level 1/ 2) and so you cannot model the extent to which the Level 2 predictor is accounting for the classroom-level variance in children’s outcomes."
In our first resubmission I attempted to explain the adjustment - he/she has asked for a citation indicating that you can include a level 2 predictor (e.g. Classroom Quality) in the regression models. Could you point me to a specific citation that would address this reviewers concern? not sure that Muthén & Satorra,1995 does so adequately for this example. And/or guide me in how to respond as my first attempt has proven unsuccessful?
Muthen-Satorra talk about aggregatable and non-aggregatable models and is to some extent applicable (you can use aggregatable models with Complex but need Twolevel for non-aggregatable models.
I think this is easy for you to handle. Just switch to Twolevel. The reviewer has a point that if you want to fully use the level 2 predictor it should be modeled on Between and predict the level-2 portion of the level-1 variable (namely its random intercept). It's simple to re-analyze - just follow UG ex 9.1.
I re-analyzed the data using type = twolevel. The results are largely consistent in terms of sig/non-sig predictors. My concern is that fit (RMSEA, CFI, TLI) is now terrible whereas it was acceptable before when using type= complex. Could you comment on why these fit statistics would change so much when resulting model estimates are fairly similar?
Hi Drs. Muthen, I am working on a GMM, using data from 7 schools. Per other comments on the discussion board, I am not using a cluster variable, but instead have created school level dummy variables. I have included these dummy variables in the GMM model to identify classes, by regressing I & S on dummy vars. After finding best fitting number of classes, I have exported a class membership variable to include in multinomial logistic regression models to examine predictors of class membership. Should I also include school dummy variables in these logistic regressions or should I consider school level differences already controlled for in the class outcome? Thanks for your help!
That's a tricky question. It depends on whether you think the school dummies influence only the class membership (1) or also i and s directly (2).
In case (1), you wouldn't include the school dummies in your Step 1. In case (2), I don't think the model is identified in Step 2 if you have the school dummies influence both the latent class variable and i and s directly.
I would either do a 1-step analysis or take approach (1).
I'm interested in looking at the interaction between an L1 and L2 predictor. My understanding is that to do this, a random slope must first be specified for the L1 predictor. However, I've recently read that with only 2 individuals per group, there aren't enough lower level data points to allow the slopes to vary from cluster to cluster.
I'm also not particularly interested in parsing within group effects from between group effects-- only accounting for nonindependence. (e.g., my L1 predictor is a personality trait, and I'm interested in the predictive power of differences in the trait across the sample as a whole, as opposed to within or between the clusters).
Given these two conditions, would it make more sense to use type = complex? MLM seems like the more popular approach in the literature, but how would I set up a model like this without the random slope?
I am currently testing the validity of an emotional competence questionnaire using a sample of athletes. The athletes are nested within 19 teams. The teams have anywhere from 4 to 73 athletes. I am using both CFA and ESEM approaches in Mplus to examine the psychometric properties of the questionnaire, which has 50 items and 10 expected factors. Given that we have nested data, I have been researching the TYPE=COMPLEX option of the ANALYSIS command. I only want to account for the fact that the athletes are nested within teams; I am not interested in evaluating the data at the team level. I was hoping you might be able to answer a few questions.
1. I have read on this and a separate Mplus discussion threads that having 20 clusters is a rule of thumb to be able to use TYPE=COMPLEX, do you think it would be acceptable to run my analyses using TYPE=COMPLEX with 19 teams?
2. If using TYPE=COMPLEX is acceptable, I understand that I would include TYPE=COMPLEX under the ANALYSIS command and CLUSTER=teams under the VARIABLE command. Is this correct given that I only want to account for the fact that the athletes are nested in teams and do not want to analyze the data at the team level?
3. Do I also need to include the WEIGHT option of the VARIABLE command?
In relation to my post and your response above (dated March 16), I ran the CFA model with TYPE=COMPLEX (and CLUSTER=teams) and without this syntax. When I ran the CFA with TYPE=COMPLEX I received the following warning (in part): THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX.
In addition my output read: THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER.
I did some digging and came across a post on the Non-positive definite matrix discussion thread from Elise on June 24, 2011 who encountered a similar issue. The response from Dr. Linda Muthen was: The number of clusters is the number of independent observations in your data set. The warning is telling you that you have more parameters than you have independent observations. The impact of this on the results has not been studied. This is simply a warning.
Question: As the impact has not been studied, I am wondering if my data would be OK to run without TYPE=COMPLEX. There appear to be only minor changes in fit indices:
fit indices with TYPE=COMPLEX: RMSEA .056; CFI .766; TLI .746; SRMR .084
fit indices without TYPE=COMPLEX: RMSEA .054; CFI .770; TLI .750; SRMR .084
Thank you for your prompt response. I hope you don't mind me asking a few more follow-up questions. I have 19 clusters (i.e., teams). You previously noted that this is "barely acceptable, but probably better than ignoring the clustering." I went back to check the SEs and there is a bit more variation than with the fit indices. For example:
Do you know of any "standards" to go by to decide whether or not those changes in SEs warrant the use of TYPE=COMPLEX?
What is meant by "The number of clusters should be larger than the number of parameters specific to the cluster level"?
Since I am receiving the warning in Mplus that the standard errors of the model parameter estimates may not be trustworthy, is it advisable, if the SEs don't drastically change, to run the analyses without TYPE=COMPLEX?