Mplus Discussion >> How many clusters?

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


How many clusters?

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Anonymous posted on Thursday, July 21, 2005 - 2:58 pm

Related to the number of variables and factors, how many clusters do I need to have a 2-level-model identified? The output says that I've probably more free parameters than clusters. I thought that only some of the parameters belong to the cluster level. ZThe problem is that with ordinal variables there are quite a lot of thresholds to be estimated which rapidly increases the number of parameters to be estimated ...

To be more explicit: how many clusters do I need to do a 2-level-cfa with 32 ordinal (4 categories) variables, 4 within- and 2 between-factors (factors correlated within each level)?

bmuthen posted on Thursday, July 21, 2005 - 3:11 pm

This is an interesting question and I don't know of a paper that addresses it. I can imagine that the estimates and SEs can be decent even when the number of clusters fall below the number of parameters. A Monte Carlo study is needed of this - and you can do it in Mplus. In our simulation studies not focusing on the relation to the number of parameters, the minimum number of clusters seems to be 20 for the ML asymptotics to have a chance to "kick in".

Anonymous posted on Friday, July 22, 2005 - 12:54 am

The problem is, that on the individual level N=7900 but on the cluster level G=89. So I tend to estimate quite a complex factor model on the within level and a more simple one on the between level. Which parameters would you consider to be "within parameters" and which to be "between parameters"? So I could argue that estimating only say k parameters on between level is possible with 89>k clusters ... Another question in this context is why the thresholds for ordinal variables are specified on between level since I would expect them to be part of the measurement model and so being "more" part of the within model. At the moment it looks to me as if constraining thresholds reduces the number of parameters on the between level. So, is there a clear distinction between "within parameters" and "between parameters"? I can draw this only between "between factor loadings" and "within factor loadings" but I've difficulties with the threshold parameters mentionde above.

bmuthen posted on Friday, July 22, 2005 - 7:01 am

If you check Tech1 you find which parameters are listed on the between level. Thresholds are listed on between to be in line with having intercepts and means on the between level (level 2) in multilevel modeling. But such "location" parameters are seldom problematic to estimate and get SEs for. More critical parameters are level 2 (between) variances. But only a Monte Carlo simulation study could really answer this well. - How many parameters do you have on within and on between?

Anonymous posted on Monday, July 25, 2005 - 9:47 am

There are 35 parameters on the within level and together I've got 134 parameters. I already reduced the number of clusters down to 3, to have only one threshold parameter being estimated for each variable via constraining t2x=t1x+(t21-t11) for all items x. But unfortunatly I still get error messages like this one.

THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED.
COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1.
CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE.
COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1.

THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED.
COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1.
CHANGE YOUR MODEL AND/OR STARTING VALUES.

SERIOUS PROBLEM IN THE OPTIMIZATION WHEN COMPUTING THE POSTERIOR DISTRIBUTION.
CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION.
CHANGE YOUR MODEL AND/OR STARTING VALUES.

I already used starting values for the factor correlations and I'll add starting values for the factor loadings. I hope this helps.

Thank you for your comments.

Anonymous posted on Monday, July 25, 2005 - 9:49 am

"clusters" in the previous posting should be replaced with "categories", so "I reduced the number of categories down to 3".

Linda K. Muthen posted on Monday, July 25, 2005 - 2:34 pm

Please send your input, data, output and license number to support@statmodel.com. This is not enough information to answer your question.

Marco Haferburg posted on Sunday, November 27, 2005 - 8:43 am

Dear Drs. Muth�n,

Mplus recommends to reduce the number of parameters when the first-order derivative product matrix is not positive definite, because there are probably more parameters than cluster. Does the relevant number of parameters refer to the total number of parameters (between and within) or just to the parameters on the between level?

Thanks for the clarification.

Linda K. Muthen posted on Sunday, November 27, 2005 - 2:43 pm

total number of parameters.

Marc posted on Thursday, December 15, 2005 - 10:40 am

Is the recommendation for clusters >= free model parameters also valid for type=complex or only with type=twolevel?

Linda K. Muthen posted on Thursday, December 15, 2005 - 11:13 am

It is valid for both.

Marc posted on Friday, December 16, 2005 - 2:48 pm

Linda, many thanks. Would you please say something about the nature of this rule: Is it a necessary condition for model identification (like df >= 0) or is it something else (e.g. avoidance of empirical underidentification, experience,..)?

Linda K. Muthen posted on Friday, December 16, 2005 - 4:39 pm

It is the same principle as not having more parameters than observations. The number of clusters is the between-level sample size. Violating this may cause problems with your standard errors.

Wu wenfeng posted on Wednesday, December 22, 2010 - 6:55 pm

Dear Mplus team,I just used a syntax, in it there is a line "xw BY x@1; x@0;",I can't understand what does this mean,I have checked the Mplus user's guide, but didnt find the answer, could you please explain it? thank you!

Linda K. Muthen posted on Thursday, December 23, 2010 - 6:22 am

That syntax is creating a latent variable xw which is equivalent to the observed variable x.

Lois Downey posted on Monday, April 15, 2013 - 7:47 am

For complex samples, when the entire sample is being used for an analysis, Mplus indicates the number of clusters. However, when one selects a subpopulation, I don't find that information in the output. Is there some reason for this omission, or is the information really there, and I'm just not seeing it?

Thanks!

Linda K. Muthen posted on Monday, April 15, 2013 - 8:37 am

I am not sure why we don't give this. Send the output and your license number to support@statmodel.com so I can look into it.

Lois Downey posted on Wednesday, October 08, 2014 - 1:36 pm

After sending my output and license number in response to the above request, I received an email indicating that outputting the cluster-size information for a subpopulation analysis had simply been overlooked, but would be added to the next version. However, I just downloaded the new version (7.3), and the information still seems to be missing from the output. Is there a chance that you're still planning to add it to a future version?

Thanks!

Linda K. Muthen posted on Thursday, October 09, 2014 - 2:27 pm

Sorry for taking so long to answer you. I had to see what happened. This change did not get made. I apologize. We will add it in the next version.