Example for type=complex
Message/Author
 Anonymous posted on Friday, August 19, 2005 - 10:56 am
I think my question may be too simple and that's why I can't find an example. In chapter 9 of the user's manual, it describes two ways of dealing with clustered sampling - type = complex or type= twolevel. Since I am only worried about properly taking into account non-independence of samples and calculating correct standard errors, it seems that type = complex is most appropriate, but there are only examples for the twolevel input.

I have a model relating variables collected on individuals (plants) in three separate plots. There may be non-independence between individuals within plots because of spatial proximity. I have two questions:

1) is type = complex appropriate for only three clusters (cluster = plot)?

2) Is there such an example in the manual or on the website? I just wanted to confirm the language to be used. Would it be just include a cluster variable in the NAMES ARE list and then defining which one is the cluster variable (ex. CLUSTER = plot) and Mplus does the rest? Since there is no covariate (w), it seems I would not include in "within" and "between" model input.

Your help is much appreciated. Also, incidentally, the website is fantastic and the online lectures are a great resource.
 bmuthen posted on Friday, August 19, 2005 - 4:40 pm
1). No, the Type = Complex SEs only work well with at least 20 clusters. Twolevel modeling also requires at least that many clusters. Maybe there are other approaches in the literature of your area (randomization approaches?).

2)Two things:

- Cluster = ...;
- Type = Complex;

I am glad the web site materials are of use.
 Hee-Jin Jun posted on Friday, February 02, 2007 - 12:27 pm
I have somewhat related question as above. We have data of kids, some of them have siblings but most of them don't. In this case, can we use

type=complex
cluster=idm (mother's id)

Other commands are:
ANALYSIS: TYPE = MIXTURE MISSING COMPLEX;
STARTS = 1000 500;
STITERATIONS = 20;
ESTIMATOR = MLR;
MODEL:
%OVERALL%
i s | cwkc97@0 cwkc98@1 cwkc99@2 cwkc01@4 cwkc03@6;

We tried this and got the error messages.
(to be continued...)
 Hee-Jin Jun posted on Friday, February 02, 2007 - 12:28 pm
(continued from the previous)

We tried this and got the error messages.

WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING
VALUES.

THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE
OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH.
THE CONDITION NUMBER IS -0.105D+03. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OPTION.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 1.

Should I increase the starting value even more, or does the problem lie somewhere else?

Thanks a lot for your help.
 Linda K. Muthen posted on Friday, February 02, 2007 - 2:22 pm
 jks posted on Monday, February 22, 2010 - 3:42 pm
I have two questions about the complex survey data analysis. The questions are:
1) WHAT IS THE EXACT STATISTIC USED WHEN
ESTIMATOR=MLR
IS USED IN MPLUS 5.1?

2) WHERE CAN I GET THIS STATISTIC?

Another related question is
WHAT IS THE DEFAULT ESTIMATOR WHEN
TYPE=COMPLEX
IS USED IN MPLUS 5.1?
 Bengt O. Muthen posted on Monday, February 22, 2010 - 8:29 pm
Do you mean the chi-square test of model fit, or do you mean the standard errors?

MLR is the default with Type=Complex.
 jks posted on Monday, February 22, 2010 - 11:38 pm
Thanks for your reply. I mean the chi-square test of model fit for the questions:
1) WHAT IS THE EXACT STATISTIC USED WHEN
ESTIMATOR=MLR
IS USED IN MPLUS 5.1?

2) WHERE CAN I GET THIS STATISTIC?

3) AM I CORRECT TO SAY:
TYPE=COMPLEX; AND
ESTIMATOR=MLR;
USE THE SAME STATISTICS FOR CHISQUARE TESTS OF MODEL FIT?
I THINK THE DIFFERENCE IS THAT TYPE=COMPLEX; TAKES INTO ACCOUNT THE EFFECTS OF STRATIFICATION, CLUSTERING BUT ESTIMATOR=MLR DOES NOT DO THAT. AS WELL BOTH CAN TAKE THE WEIGHTS.
 Bengt O. Muthen posted on Tuesday, February 23, 2010 - 9:27 am
1) See

http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf

2) Mplus gives it for mean and covariance structure models, but not for models where those are not sufficient statistics.

3) Yes, Type=complex also takes into account stratification, weights, and clustering. MLR can be used also without all that.
 jks posted on Wednesday, February 24, 2010 - 2:49 pm
I am running two models and conducting chi-square difference tests to establish measurement invariance for two groups in each model.
Model 1, using:
weight=weight;
cluster=cluster;
Type=complex;
Model 2, using:
weight=weight;
Estimator=MLR;
Questions:
1) How can I conduct the chi-square difference tests for Model 1?
2) For Model 2, the procedure for conducting chi-square difference tests is outlined in the statmodel.com website as

*Compute the Satorra-Bentler scaled chi-square difference test (TRd) as follows:
TRd = (T0 - T1)/cd
where, T0 and T1 are the regular chi-square values.
My question is: Are T0 and T1 obtained using maximum likelihood (ML) method?
3) I ran Model 1 and Model 2 for same data. Parameter estimates (e.g., factor loadings) are same but chi-square, RMSEA and CFI values are different for two models. Are the results correct?
 Linda K. Muthen posted on Wednesday, February 24, 2010 - 5:50 pm
The sample weight affects the parameter estimates. That is why the parameter estimates are the same. Clustering affects standard errors and fit statistics. That is why the standard errors and fit statistics are different.

I think you are using MLR in both analyses. Therefore, you would use the same test as for Model 2. Yes, T0 and T1 refer to ML.
 jks posted on Friday, February 26, 2010 - 10:07 pm
Yes, I am using MLR in both model 1 and Model 2:
Model 1, using:
weight=weight;
cluster=cluster;
Type=complex;
Model 2, using:
weight=weight;
Estimator=MLR;
Question:
1) Can I use the same Satorra-Bentler scaled chi-square difference test (below) for both models?

*Satorra-Bentler scaled chi-square difference test (TRd) as follows:
TRd = (T0 - T1)/cd
where, T0 and T1 are the regular chi-square values.
 Linda K. Muthen posted on Saturday, February 27, 2010 - 10:03 am
Yes. The test depends on the estimator and the estimator is MLR in both cases.
 Mauricio Garnier-Villarreal posted on Wednesday, August 08, 2012 - 1:58 pm
Hi

When I am doing a multiple group cfa with type = complex, using weight, cluster, and stratum. The tech 1 output shows lambda being estimated in the beta matrix rather than the lambda matrix. And on another computer with a similar model, the lambda are in the lambda matrix. Both have mplus v 6.12. What could be the issue? Or what more information do you need for us to send?

Thank you.
 Linda K. Muthen posted on Thursday, August 09, 2012 - 5:49 am
If you do the exact same model on both computers, the lambda will be in the same matrix. I would need to see both outputs and your license number at support@statmodel.com to say why the matrix shift occurred. This is, however, not something to be concerned about.
 Jessika Bottiani posted on Tuesday, June 11, 2013 - 8:29 am
I am doing a MG-CFA to test for configural invariance across five different racial/ethnic groups. Since I am using Type Complex in the analysis, and the estimator Mplus uses is MLR, I don't understand the instructions on this page http://www.statmodel.com/chidiff.shtml to calculate the Satorra-Bentler chi square from the output I get. I typically see the Satorra Bentler reported. When using MLR (Type Complex) the User's Guide says it produces a chi-square test statistic that is asymptotically equivalent to a Yuan Bentler T2* statistic. Is this the statistic I should report in the table instead of the Satorra-Bentler? Thanks for your help.
 Geehong Hyun posted on Tuesday, June 11, 2013 - 9:31 am
Dear Dr. Linda Muthen,

I am doing a multilevel modeling with clusters(school ids) using Type=Complex, but I am wondering that how Mplus calculates the estimate of variance. As my knowledge, Mplus aggregates the data to calculate variance, but I don't know how Mplus does it.
 Linda K. Muthen posted on Tuesday, June 11, 2013 - 11:36 am
Geehong:

We use a sandwich estimator. See the following paper which is available on the website:

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.
 Linda K. Muthen posted on Tuesday, June 11, 2013 - 11:39 am
Yes. You should see the Mplus Version 7.1 Language Addendum on the website with the user's guide. We have automated testing for measurement invariance.
 Geehong Hyun posted on Wednesday, June 19, 2013 - 7:50 am
Dear Dr. Linda Muthen,

I have read the paper of "Complex sample data in structural equation modeling. Sociological Methodology" as you said, but I still wondering that how the inverse selection probability W_ijkl in eq. (18) are calculated. I think this paper didn't mention it.
 Bengt O. Muthen posted on Wednesday, June 19, 2013 - 2:28 pm
Take a look at

Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411-434.

and other complex survey papers on our website.
 Cecily Na posted on Monday, September 23, 2013 - 8:33 am
Dear Professor,
I have a data set with strata (cities), and within each stratum (city) simple random sampling is used. Is this stratified sampling or can it be considered cluster sampling? Can I still use type=complex and MLR? If not, what is the type and estimator to use to take account the non-independence?
Thanks a lot!
 Cecily Na posted on Monday, September 23, 2013 - 10:08 am
Hello Professors,
I want to follow up with my last question. I would like to know if type=complex and MLR can also apply to stratified sampling design.
thanks.
 Bengt O. Muthen posted on Monday, September 23, 2013 - 3:46 pm
You can use type = complex and MLR for a stratified sampling design, that is, when this is the only complex survey feature. How many cities do you have?
 Cecily Na posted on Monday, September 23, 2013 - 4:15 pm
Hi Bengt
About 30 cities. Actually, these cities are all the primary units of the population.
Should i use the commands
stratification = city,
type=complex, and
Estimator = MLR
Is it right?

Thank you very much.
 Bengt O. Muthen posted on Monday, September 23, 2013 - 5:30 pm
Yes.
 Cecily Na posted on Monday, September 23, 2013 - 5:52 pm
Hi Bengt,
Thank you. In the above example, by theory, how would the results be different if cluster=city is specified, instead of stratification = city?
I think type=complex and estimator = MLR together address the non-independence of sampling, how do they do differently for strata vs. clusters?
Thanks a lot!
 Tihomir Asparouhov posted on Tuesday, September 24, 2013 - 9:07 am
Cluster=city increases the standard errors and stratification = city decreases the standard errors.

If the population you are studying has 30 cities you have to use stratification = city. If the population has many more cities and only 30 were sampled (at random or another method) then you have to use cluster=city.
 Cecily Na posted on Wednesday, September 25, 2013 - 6:30 pm
Thank you, Tihomir, for your clarification above. Suppose that the whole population has 30 cities, and they're all included in my sample (so they're strata), how can I use the multilevel modeling approach when Type = twolevel can be only used with cluster = city, not with stratification = city?
Thank you!
 Linda K. Muthen posted on Thursday, September 26, 2013 - 10:16 am
You should use

TYPE = COMPLEX;
STRATIFICATION = city;
 Cecily Na posted on Thursday, September 26, 2013 - 1:21 pm
Hi Linda,
I understand the above syntax. I guess my question is: if I need to incorporate city-level (cities are strata) variables and individual level variables, how can I distinguish city level effects from individual level effects in a model without using type=twolevel?
Thank you very much!
 Linda K. Muthen posted on Friday, September 27, 2013 - 11:07 am
You treat the city variables as individual variables. Your standard errors will be correct.
 una posted on Wednesday, November 06, 2013 - 2:15 am
Dear Prof. Muthen,
Participants in my study were recruited from 12 schools across five provinces. My analysis is cross-lagged modeling using MLR estimator.

1. Can I handle correlation of responses from participants who are nested within the same schools and provinces by clustering on school ID (type is complex; cluster = schoolID) and use dummy variables to represent provincial effects as control variables in the analysis?

2. Or do I have too little clusters to estimate this, as written in the aforementioned answer (first answer: he "Type = Complex SEs only work well with at least 20 clusters.")

Thank you very much in advance
 Linda K. Muthen posted on Wednesday, November 06, 2013 - 9:59 am
You have too few units for TYPE=COMPLEX. I would create 11 dummy variables and use them as covariates to control for non-independence of observations due to nesting in schools. I would treat the five provinces as fixed effects and use them as multiple groups.
 una posted on Thursday, November 07, 2013 - 6:23 am
Dear Prof. Muthen,

1. Do you refer to the statement "grouping = provinces" for treating the provinces? Should I use this in combination with "type is twolevel"?

2.With regard to the fact that I have too few units for this type = complex, do you have an article to refer to?

Kind regards,
Caroline
 Linda K. Muthen posted on Thursday, November 07, 2013 - 6:30 am
You will have a single-level model with dummy variables as covariates. See work by Joop Hox.
 Natalia Dmitrieva posted on Tuesday, March 10, 2015 - 10:26 am
Dear Drs. Muthen,

I would love to get your advice on the appropriateness of my statistical approach.

I have a dataset with 835 participants; each participant provided data on 45 (continuous) variables. The 835 participants actually come from 7 different studies, which range in sample size from 40 to 346 participants.

Research Question: I am interested in exploring the factor structure of the 45 continuous indicators. I would like to take into account the non-independence due to the fact that participants come from 7 different samples. Thus, I am using "type = complex efa" in my analysis.

1) Would you agree that "type = complex efa" approach is correct, given my question and data? Specifically, I am concerned that 7 clusters would not be sufficient for this type of analysis, and that the number of indicators (i.e., 45) is too high.

2) If the approach I am currently taking is problematic, do you have any thoughts on another approach that might be more appropriate?

Sincerely,
Natalia
 Bengt O. Muthen posted on Tuesday, March 10, 2015 - 12:31 pm
7 studies is not enough for Type=Complex. You can instead use 6 dummy covariates to represent study differences. EFA with covariates can be done using "ESEM" - see our website's left margin Under Special Mplus Topics for writings on this.
 Natalia Dmitrieva posted on Tuesday, March 10, 2015 - 2:05 pm
Thank you!
 Diana Chirinos posted on Friday, March 13, 2015 - 10:49 am
Using COMPLEX data, I am running a structural model with a latent variable predicting a categorical outcome (binary).
I have a few questions:
1. What is the default estimator used by Mplus7 for type=complex data and a categorical dependent variable?
2. I am indicating in the input that my dependent variable is categorical, so I imagined that the program would be running a logistic regression but it won't give me odds ratios which makes me think it's running a different regression. Is that the case?
3. After doing some reading, I realized I should probably be using the Estimator is MLR command. I ran the model with this command and it gives me different estimates and ODDS ratio, but it does not give me fit indexes.
What is the best estimator to be used in this case?
 Linda K. Muthen posted on Friday, March 13, 2015 - 12:04 pm
The default estimator is WLSMV. This gives probit regression so no odds ratios. MLR give logistic regression. However, with MLR and categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation so chi-square and related fit statistics are not available. You can use either estimator.
 Diana Chirinos posted on Friday, March 13, 2015 - 12:11 pm
Thank you very much for your fast response.
 Vivian Vignoles posted on Wednesday, April 29, 2015 - 10:16 am
Hello,

Sorry for asking a rookie question, but I want to make sure I have this right.

If I use TYPE = COMPLEX and do not specify any weights, then is it the case that all clusters have an equal impact on the analysis, even if there are more observations from some clusters than others?

If not, then what do I need to specify in order to make it so?

Viv.
 Bengt O. Muthen posted on Wednesday, April 29, 2015 - 5:54 pm
Q1. No

Q2. I don't see why you would want to.
 Vivian Vignoles posted on Wednesday, April 29, 2015 - 6:03 pm
Dear Bengt,

I am conducting an RI-EFA on data from many cultural samples. In principle, I would weight all of the samples equally, but some are larger than others by accidents of the data collection.

Some samples are larger than others because our collaborators in those places happened to collect more data, but the size of the samples has no theoretical meaning, and I would not want those samples to disproportionately drive the results of the EFA.

Does this make sense and, if so, what commands do I need to put it into practice?

Many thanks!

Viv.
 Natalie Bohlmann posted on Wednesday, January 13, 2016 - 3:42 pm
I have two-level data and used type = complex to address clustering. It is a simple regression that includes both level 1 and 2 predictors in the model.

One reviewer has questioned the appropriateness of using type = complex when there are level 2 variables in the model stating that we may be losing some important information by collapsing both level 1 and 2 predictors into 1 model and just using a correction for the SE.

The specific comment written is that "Typically when you are including level 2 predictors in a multilevel model, you partition the variance in the outcome (between/within variability). Then you examine how much your Level 2 predictor is explaining the between (or Level 2 variance) in this outcome. When you use Type = complex, you are not partitioning variance (level 1/ 2) and so you cannot model the extent to which the Level 2 predictor is accounting for the classroom-level variance in children’s outcomes."

In our first resubmission I attempted to explain the adjustment - he/she has asked for a citation indicating that you can include a level 2 predictor (e.g. Classroom Quality) in the regression models. Could you point me to a specific citation that would address this reviewers concern? not sure that Muthén & Satorra,1995 does so adequately for this example.
And/or guide me in how to respond as my first attempt has proven unsuccessful?
 Bengt O. Muthen posted on Wednesday, January 13, 2016 - 6:38 pm
Muthen-Satorra talk about aggregatable and non-aggregatable models and is to some extent applicable (you can use aggregatable models with Complex but need Twolevel for non-aggregatable models.

I think this is easy for you to handle. Just switch to Twolevel. The reviewer has a point that if you want to fully use the level 2 predictor it should be modeled on Between and predict the level-2 portion of the level-1 variable (namely its random intercept). It's simple to re-analyze - just follow UG ex 9.1.
 Natalie Bohlmann posted on Monday, January 25, 2016 - 10:48 am
I re-analyzed the data using type = twolevel. The results are largely consistent in terms of sig/non-sig predictors. My concern is that fit (RMSEA, CFI, TLI) is now terrible whereas it was acceptable before when using type= complex.
Could you comment on why these fit statistics would change so much when resulting model estimates are fairly similar?
 Linda K. Muthen posted on Monday, January 25, 2016 - 2:58 pm
With COMPLEX, you do not model between-level parameters and with TWOLEVEL you do. You between model may have misspecifications.
 Vivian Vignoles posted on Friday, February 12, 2016 - 4:31 am
Hello,

To follow up on my rookie question a few months ago, what do I need to do in order to weight different sized clusters equally in a Type = complex analysis?

In other words, I want to create a situation where each cluster has an equal impact on the analysis rather than each case having an equal impact on the analysis.

Thanks!
 Tihomir Asparouhov posted on Friday, February 12, 2016 - 8:10 am
If you use type=twolevel that is already done. The mean a variable is the mean of the cluster means, not the mean of teh observations.

If you are using type=complex you can use weights to adjust this effect.
 Vivian Vignoles posted on Friday, February 12, 2016 - 8:44 am
Thanks - that's extremely useful!

Viv.
 Ok-young, JI posted on Wednesday, January 25, 2017 - 1:35 am
hello,
i have a question about diference between type=twolevel and type=twolevel random

does type=twolevel mean fixed intercept, fixed slope?
i know that type=twolevel random could have random intercept and fixed slope or random intercept and random slope.

please let me know the difference between them.
 Linda K. Muthen posted on Wednesday, January 25, 2017 - 6:31 am
TWOLEVEL has a random intercept. See Example 9.1. If you want a random slope and a random intercept, you must use TWOLEVEL RANDOM and the model specification shown in Examplel 9.2.
 Katie Gelman posted on Saturday, April 01, 2017 - 2:32 pm
Hi Drs. Muthen,
I am working on a GMM, using data from 7 schools. Per other comments on the discussion board, I am not using a cluster variable, but instead have created school level dummy variables. I have included these dummy variables in the GMM model to identify classes, by regressing I & S on dummy vars. After finding best fitting number of classes, I have exported a class membership variable to include in multinomial logistic regression models to examine predictors of class membership. Should I also include school dummy variables in these logistic regressions or should I consider school level differences already controlled for in the class outcome? Thanks for your help!
 Bengt O. Muthen posted on Saturday, April 01, 2017 - 4:51 pm
That's a tricky question. It depends on whether you think the school dummies influence only the class membership (1) or also i and s directly (2).

In case (1), you wouldn't include the school dummies in your Step 1. In case (2), I don't think the model is identified in Step 2 if you have the school dummies influence both the latent class variable and i and s directly.

I would either do a 1-step analysis or take approach (1).
Post:
This is a private posting area. Only registered users and moderators may post messages here.