davood posted on Friday, October 01, 2004 - 7:32 pm
Dear Dr. Muthen, I am using a large data base with cluster and weight variables. My purpose is to perform a multiple group comparison between 4 groups. It's a fairly complex model in which latent variables are regressed on some background variables. My data are ordinal. I treat them as continuous. I also have missing values. I used TYPE=COMPLEX. I have several questions for you: 1- It takes M-plus 23 minutes to run the analysis. Would you please let me know why it takes that long? (Since the data are confidential, I would be able only to provide the input syntax and partial output results). 2- I have heard you said as long as the observed dependent variables are not very badly skewed, the analysis should be ok. Could you please elaborate more on this issue? 2- M-plus uses the MLR estimator as default for this type=complex analysis. I am interested in Chi-squared difference test. I already consulted the documents on the website regarding the chi-square difference test; but, just to clarify the subject, to perform chi-squared difference test, how would I obtain the "regular" chi-square values? 3- In case I should treat my data as categorical, which estimators you would recommend to perform chi-squared difference data with missing values?
bmuthen posted on Friday, October 01, 2004 - 8:37 pm
1) Perhaps this example has many outcomes which in addition to having a large sample and 4 groups could be time-consuming. Do you have Type = Missing as well? Do you ask for Modification Indices? Is a run without Type=Complex equally time-consuming? And, hopefully you are using version 3.11.
2) With very skewed items, the regular linear models used in standard continuous-variable analysis aren't realistic or dependable. For instance a Person correlation is attenuated.
davood posted on Tuesday, October 05, 2004 - 5:05 pm
Dear Dr. Muthen, Thanks for your time and your valuable comments. Just would like to let you know what I used in my analysis: 1)I do not use type=missing. When I dont ask for MI's, the run time just decreases by just few minutes. And, I upgraded M-Plus to version 3.11. The running time decreased by 1 or 2 minutes.
Another question: I re-ran the same model(for simplicity, this time just for 2 groups) by specifying the categorical obs dependent variables and using WLSMV estimator (type=complex, cluster and weight variables). This time, however, I've got this error message:
*** ERROR Cluster ID cannot appear in more than one group. Problem with cluster ID: 50
I did not have this error when I treat my observed dep variables as continuous, however.
Sinecerly, Davood p.s. If you need the syntax, please let me know.
You should send the output where you treat the variables as continuous and also the output where you treat the variables as categorical.
davood posted on Thursday, December 09, 2004 - 3:27 pm
Dear Dr. Muthen, I'm doing a multiple group CFA with one factor and categorical indicators, the mplus generated this warning:
SERIOUS COMPUTATIONAL PROBLEMS OCCURRED IN THE BIVARIATE ESTIMATION OF THE CORRELATION FOR VARIABLES MOMRELS1 AND MOMCOMM1. CHECK YOUR DATA. IF THE PROGRAM RECOVERS FOR THIS PAIR OF VARIABLES (SEE TECHNICAL 6 OUTPUT), THE ESTIMATES ARE VALID. THE PROBLEM OCCURRED FOR THE FOLLOWING OBSERVATION(S): OBSERVATION(S) WITH MOMRELS1=4 AND MOMCOMM1=4.
I checked the tech6 output. Correlation between MOMCOMM1 and MOMRELS1 was estimated with so many iterations. However, at the end, message was NORMAL TERMINATION FOR ESTIMATING THE CORRELATION FOR MOMRELS1 AND MOMCOMM1
I did not have this problem when I ran my model for each group seperately. Are my results valid then? Thank you, Davood
Each group must have the same values on categorical observed variables. So if males have values of 1, 2, and 3 on a variable, females cannot have only 1 and 2. This sometimes comes about with listwise deltion. You need to collapse categories until you have the same values in each group.
Dear Dr. muthen, I'm doing a multiple group analysis with categorical indicators, cluster and weight variables using WLSM estimator. I have faced partial measurement invariance. In doing so, I free each factor loading and respected thresholds while I fixed the scale factor to one for one indicator at a time in the nested model. I use scaled chi square difference test mentioned on the statmodel homepage to compare the nested model with the less restricted baseline model. The problem is that in doing so, however, after freeing three indicators parameters I faced negative scaled chi square difference test. I apprecaite your comments. Regards, Davood
BMuthen posted on Friday, January 21, 2005 - 2:11 am
This can happen. There have been writing by Bentler and Satorra on this for continuous outcomes. The difference testing is an asymptotic procedure and can break down like this.
I am doing a two-group SEM with continuous indicators using TYPE=Complex (with weight and cluster variables). I use MLM as an indicator. The model is a hybrid SEM model: 4 observed background variables predict two latent variables; these, in turn, are used to predict depression, which is a two level latent variable (20 indicators are used to form 4 latent variables which in turn form the depression variable). In addition, direct paths from two control variables and from the 4 background variables to the depression variable are included.
Here is the summary of the model that Mplus prints: Number of dependent variable: 32 Number of independent variables: 6 Number of Continuous latent variables: 7 Degrees of freedom: 1280 Number of free parameters: 268
To calculate degrees of freedom, I used the formula mentioned in Bollen (1989, p. 361). ½*G*(p+q)(p+q+1)-t where p: number of background variables (6) q: number of independent variable (32) G: number of groups: (2) t: number of free(estimated)parameters (268)
After plugging in the numbers if the Bollen’s formula, I get a different number from the Mplus output. (df=½*2* 38*39-268 =1214). So, the calculated df is less than df in Mplus print out. Would you please help us resolve this discrepancy?
Please send your complete output to firstname.lastname@example.org and I will look at it when I return to LA. The formula that you are using does not take into account that df's for covariates are treated differently than for outcomes.
Sadhana posted on Monday, February 21, 2005 - 8:56 am
For Hierarchichal Cluster analysis of continuous data, Which distance and linkage method is appropriate as ther is a variety of methods available in the literature.
bmuthen posted on Saturday, February 26, 2005 - 11:05 pm
For cluster analysis related to Mplus, please see mixture modeling (categorical latent variables)references on our web site. For instance, the Applied Latent Class Analysis book shown there has a chapter by Vermunt and Magidson that touches on relationships between classic "k-means" clustering and latent class analysis of continuous outcomes. I would say that latent class analysis and related techniques such as factor mixture modeling could be preferrable to traditional clustering techniques.
I have some questions regarding using both categorical and continuous indicators in a SEM model with both cluster and weight variables. 1) can we use both categorical and continuous in the above analysis? 2)How does Mplus treat continuous indicators when some others are categorical? 3)In this case, what type of estimator is recommended? 4)Are there any complications involved when one uses both categroical and continuous indicators? 5)The ultimate goal is to test for factorial invariance between two groups. Is there any other issue that we should be carfeul about when using both categorical and continuous indicators to test for measurement invariance?
1) can we use both categorical and continuous in the above analysis?
2)How does Mplus treat continuous indicators when some others are categorical? 3)In this case, what type of estimator is recommended?
Two estimators are available in this situation -- weigthed least squares or maximum likelihood. They are treated as continuous variables if that is what you mean.
4)Are there any complications involved when one uses both categroical and continuous indicators?
5)The ultimate goal is to test for factorial invariance between two groups. Is there any other issue that we should be carfeul about when using both categorical and continuous indicators to test for measurement invariance?
For categorical indicators, thresholds and factor loadings must be freed together. You cannot free just a threshold or a factor indicator. And when free, factor means need to be fixed at zero in all groups and scale factors or residual variances need to be fixed to one in all groups depending on whether the delta or theta parameterization is used. See Examples 5.16 and 5.17 in the Mplus User's Guide and Web Note 4.
1-In TYPE=COMPLEX MISSING H1 with categorical observed variables, how the missing values are treated? 2-What is the best estimator for this analysis? 3- In the analysis, I got WRMR >2, CFI>.95 and RMSEA<.05. Does it mean any serious misfit? 4-In the case of categrocial observed variables, what is the interpretation of factor loadings? Is it the same as for continuous observed variables?
1. with categorical outcomes, WLSMV is the default for which a pairwise present approach is taken.
2. WLSMV is good, unless the MCAR flavor of the pairwise approach needs to be replaced by MAR in which case ML should be used if possible from a computational point of view.
3. CFI is rather reliable and sounds like you are in the ballpark.
4. The loadings are probit slopes when WLSMV is used (Logit witn ML). But if you only check significance, sign and size of standardized loadings, then they can be looked at analogous to continuous indicators.
Frank Davis posted on Sunday, November 06, 2005 - 6:09 am
I plan to use either cluster analysis or LCA for my dissertation. I will have a large a relatively large data set (N=3,000) with approximately 8 continuous variables and 4 categorical variables. Do you forsee any complications? I guess this answer depends on what kind of data I have. Based upon my limited knowledge, cluster analysis doesn't handle categorical variables; however, LCA handles both categorical and continuous variables. Could you provide me a few good texts or references that would point a novice in the right direction? Thanks so much..
Dr. Muthen, I am looking into using Mplus for testing measurement invariance across groups (e.g. gender, age, race etc.). I have a complex survey design and would like to use a split-sample design for my analyses. Based on the user manual I noticed that the “subpopulation” option was not available for multiple group comparison, meaning that it is not possible to use this option to select half of my sample (based on a dummy differentiating between the two randomly selected sub-samples). Is there any way I can solve this problem given (a) the nature of the sample design, (b) wanting to do a cross-validation and (c) wanting to test for group differences in measurement invariance? Thank you!
I can think of 2 approaches. I think you could do the multiple-group analysis using mixtures with the "knownclass" option in combination with the subpopulation option. But note also that the subpopulation option often makes very little difference.
Dr. Muthen, thank you for your quick reply. Just a short follow-up: 1. When you say that "the subpopulation option often makes very little difference" do you mean that it should not be problematic to use the "useobservations" option instead, even if the analysis involves a complex survey design? I am just concerned that this would be viewed as a source of bias in my analysis. 2. Based on your suggestion, I assume that mplus will not "complain" that I am doing multiple group analysis with the subpopulation command if groups are defined by the "knownclass" option rather than the "grouping" option in a "regular" CFA, correct? 3. Last (naive)question: what would be the pendant of comparing factor means between groups in CFA when switching to LCA? Thanks again for all your help!
I am analyzing data from a national data set. The full data set contains information from youth between the ages of 12 and 20. I wish to analyze data from youth between the ages of 15-18 only. Thus, I believe that I use the subpopulation (selecting data from those between the ages of 15 and 18) command to make sure the results are representative. However, I wish to analyze my data as a multigroup by age. The manual indicates that the subpopulation command cannot be used in conjunction with the multigroup. I thought I could make a dataset which removes data from individuals younger than 15 and older than 18. I would then NOT use the subpopulation command, but still include the weight, stratification, and cluster commands. 1) Would the weighting be wrong if I used this technique? I am asking because I read somewhere that subsetting concerns regarding the variance computation are only relevant to survey software using the Taylor Series. I believe that Mplus uses robust variance estimation. If what I read is true I am wondering if it is even necessary to use a subpopulation command when analyzing complex data using Mplus (regardless if doing a multi-group)? 2) If the weighting would be incorrect using the strategy described above (see above #1) then how would I conduct a multigroup only using a subsample of the full data?
I would do two single group analyses, one using the SUBPOPULATION option and one not using it to see if there is a big difference in the results. If not, I would just do the multiple group analysis without the SUBPOPULATION option.
I am using clustered data (twins) and I would like to calculate means for 3 groups of twins and decide if these means are significantly different to each other. My questions are:
1) As a first step (saturated model) can I just run the data and look at the sample statistics for the means? Or do I need to look at the model results for the means? These means are not the same.
2) Is it okay not to specify any model when I am only interested in mean scores, or is it necessary to specify the correlations like this: y1 with y2 y3 with y4 I tried both approaches, and I get different mean scores in the model results.
3) To compare the group-means, can I just fix the means like this: [y1](1) in the separate group-models and then compare the BICs of these models with the BIC of the saturated model? Can I use BIC for these purposes?
4) When I want to report group means of y1 that are corrected for covariate x (y1 ON x), should I report the means (i.e. intercepts, using centering) in the model results?
I hope my questions are clear. Thank you very much in advance for your time!
I am using complex survey data to examine a fairly straightforward path model. All variables are dichotomous.
What I'd like to do is to an "overall" test of men vs. women to (hopefully) find that the path models are significantly different. That is to say, that at least some of the path coefficients are significantly different between the two groups. I envision first doing an "overall" test of the men model vs. women model and then, having established that they are different, compare individual path coefficients to find which are different.
My question is this: much of what I've been reading says that you must have measurement invariance. But, as I understand it, this would only apply to a model with a latent variable, right?
Would you be able to point me in the right direction for some assistance as to how to do these 2 comparisons? I'm stuck as to what commands to use....
Further, if we did (in the future) add a latent variable to the model, how would I need to change the procedure? I anticipate that, at this point, we would test for measurement invariance across gender first....
You can do an analysis with all paths held equal across the groups using the parameter equality constraint feature described in the UG. Then you can see which paths are not invariant by looking at the Modification indices. Yes, measurement invariance can typically be studied only with multiple indicators, in which case the measurement part should be invariance tested first.
I do chi-square diff testing while fixing the means of one var in two groups every time. I then get an MLR chisquare with 1 df, which I devide by the scaling correction factor to get the right chi-square. If this chisquare is significantly different from zero, I conclude that the means were different. FYI, the variables are not normally distributed. My first question is: is this approach correct?
When I use squareroot transformed data in order to obtain normality, I get somewhat different results. How is that possible? I thought normality was not needed when using MLR. Which results should I trust?
Yes, you can test mean differences in the way you describe.
When you transform a variable by taking the square root of the variable, you change its relationships with other variables. This is why you get different results. For example, if linearity was approximately correct for your regression, it will not be after transformation. It is not the same as dividing a variable by a constant where all relationships remain the same.
I am sorry not to put my concerns into one message, but I got a following question related to my first post yesterday 6.22am: Since I perform multiple comparisons (24 in total), do I have to control for type I error? If yes, how can I do that? Thanks, Sylvana
Dr Muthen, I have a multiple group GMM (cg=2, c=3). Can I use LL difference testing to test if my 3 intercepts are equal between the 2 cg's? (test with 3 df) I suppose in that case I need to use the scaling correction factor?
Or do I need to compare the nested models with BIC?
I am trying to run a twolevel multiple group path model (similar to Example 9.11 in MPLUS handbook, but not a CFA and a path model). The cluster is countries, and the grouping variable is religious affiliation. I'd like to test whether the hypothesized direct and indirect effects apply to different religious groups, when taking into account the variation at the country level. I keep on getting the error message: Cluster ID cannot appear in more than one group. Problem with cluster ID: 2504 what do you think could be the problem here?
Multiple group analysis assumes each group contains independent observations. In multilevel modeling, this means the grouping variable should be a between-level variable. If a within-level variable is used as a grouping variables, observations from the same cluster may be in both groups violating this assumption.
I'm testing a relatively simple model for 2 groups: males and females. So I use gender as a grouping variable. I use type=complex to specify they're clustered in schools. The dependent is continuous.
The hypothesis is that there's no difference in coefficients between males and females. So I compared a model with no constraints and a model with only a different intercept for each group, everything else constrainted. 1. Am I correct to think that the chi-square test is a formal test for all constraints at once? 2. Can I test every constraint seperately? I would like to know for every coefficient if it's significantly different between boys and girls. I could do this by comparing the full (no constraints) model with all possible models with one constraint. But could I also do this with single command in Mplus?
1. Yes. 2. You can do one at a time. There is no automatic way to do this. You can ask for modification indices when you test all at the same time to see which coefficients have the largest modification indices.
sam posted on Wednesday, September 07, 2011 - 9:32 pm
I have two models from two different scenarios (i.e., one scenario induces good intention and another scenario induces negative intention). Each model has five variables. The variables are the same, except for one variable. That is, the type of consequences is different for each of the two models. The paths of these two models are exactly the same. My questions are:
(1) Can I compare these models? I am interested in testing the effect of intention on the responses.
(2) Do I need to transform one of the variables in one model that has problems with skewness and kurtosis? If I transform this variable, will I still be able to compare the models?
1. No, not statistically unless you have exactly the same set of variables, but perhaps substantively.
2. No, use MLR which is robust to non-normality.
sam posted on Thursday, September 08, 2011 - 6:58 pm
Thank you very much for the quick response. For the second question, all of the variables have skewness between 0.1 and 1.5 and kurtosis between 0.5 and 1.5, except for one latent variable that its skewness is above 1.5 and kurtosis about 2. When I ran the model, fit indices are inconclusive. SRMR is 0.058 and RMSEA is 0.051, which suggest that the model has a good fit. However, CFI is only 0.923, below the conventional cutoff value of 0.95. Can I conclude that the mixed results are because one variable is too skewed and high in kurtosis? Is it appropriate to transform that one variable? Thank you very much.
Dear Dr. Muthén, I am running a multiple group analysis (grouping is male vs. female) and was using the type=complex command to take into account that we have MZ and DZ twin pairs in our data. I got an error message (see below) and I have two thoughts about it. We also have single twins (only one member of a pair) in our dataset so that there are some cluster with n=1 observations. Is that a problem? And second: Is there a problem for the correction algorithm when there are a lot of cluster (177) with only few (in our case 2) observations?
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.422D-17. PROBLEM INVOLVING PARAMETER 107.
THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER.
Having one or two members of a cluster is not a problem.
The message says you have more parameters than independent pieces of information in your data. That is you have more parameters than the number of clusters minus the number of strata with more than one cluster. The impact of this on your results is not known. You would need to do a Monte Carlo study for your situation to see this.
I'm running a measurement model invariance analysis with categorical data obtained from players clustered within teams. I want to test invariance across two age groups (younger/older than 12) and I'm not sure on how to handle the fact that some teams have only younger players, other teams have only older players and other teams have both younger and older players. Could you suggest a proper way to deal with this data? Or maybe recommend something similar to your webnote 16 adressing suitable models for measurement model invariance in this case?
Thanks for your answer. When I run the models described in Web Note 16, I'm not able to see how to conclude that item factor loadings/thresholds are invariant or not invariant between age groups. Could you elaborate a bit more on this point or suggest a more detailed reading, please?
I have tested meausurement invariance between groups (5 ability groups). The model is a second-order CFA. Because students are nested in classes, I controlled for the cluster information(class) with Type=complex.I've got the following error message:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.303D-15. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 49, Group 1: K WITH M
THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS MINUS THE NUMBER OF STRATA WITH MORE THAN ONE CLUSTER.
The problem is that i couldn`t discover any problem with parameter 49. I did the same analysis with type=general and no error message appeared. Can I trust the results with type is complex?
With clustered data, independence of observations is at the cluster level. You must have 49 clusters. The message tells you that your model has more parameters than it has clusters. The effect of this on model results has not been well-studied. This is a warning.