I'm currently doing a psychometric study that includes measures at the classroom level nested within schools. Accross the 3 years of data the multilevel CFA's all converged(!!!) and fit well. I'm now working on establishing invariance, following the Vandenberg and Lance (2000) review and a few other papers. My problem concerns the first step, testing whether the covariance matrices in the three years are different. It is not clear to me how to set up a model to test this, I have tried a few ideas which didn't seem to work. It seems like this should be very easy, I'm hoping you can enlighten me. Thank you very much.
No, TYPE=TWOLEVEL RANDOM MISSING; with the GROUPING option.
Anonymous posted on Friday, January 23, 2004 - 11:42 am
Is there a substantive meaning to restricting the Level-2 endogenous variable covariances to be equal across groups in a multilevel, multigroup SEM ?
In my particular application, I find that several of the Level-1 and Level-2 covariances are nearly identical across groups, and I'm hoping to restrict them so as to regain df's.
bmuthen posted on Saturday, January 24, 2004 - 9:27 am
Although all parameters carry substantive meaning, endogeneous covariances refer to covariances between residuals in the regression equations and typically one would not have hypotheses about these unexplained parts of the variation in the dependent variables. So my own feeling is that gaining df's this way is not recommended.
Anonymous posted on Sunday, January 25, 2004 - 10:19 am
As a follow-up: is it the case that testing restricted Level-2 covariances would ***NOT NECESSARILY*** be a test of whether both groups' Level-2 s are missing the same Level-2 predictor variables ?
bmuthen posted on Sunday, January 25, 2004 - 1:44 pm
Anonymous posted on Monday, February 16, 2004 - 2:27 pm
I hadve a couple of questions regarding performing multigroup multilevel models in Mplus.
1. Is there a certain estimated reliability value below which you recommend not using multilevel modeling (for example, .5, .4, etc.) ?
2. I seem to recall reading in Muthén 1997 that ML estimation may be complicated or untrustworthy in sitations where the number of Level-2 units is small. Is it estimation problematic when the number of Level-2 units is sizeable, but the number of cases per unit themselves is small (<5, say) ?
3. When Level-2 units contribute cases to members of all groups in a multigroup, multilevel model, does Mplus assume the error variances are homogenous within the original Level-2 units, or does it treat each Level-2 unit / group combination as a separate population ?
bmuthen posted on Monday, February 16, 2004 - 4:29 pm
1. No. Lower reliabilities simply give less power to detect relationships.
2. This does not necessarily lead to problems. The number of level 1 units required is related to the number of within-level parameters. In particular, you need several units per cluster in order to estimate within-level variance parameters. A Monte Carlo study in Mplus can tell you more specifics.
3. Within each group, the level 2 variances are assumed equal across level 2 units.
I was hoping someone could help me i have a paper due in my social research class and ihave to make up a survey without actuallt doing it using regression models and etc, does anyone know what i'm talkn aboutRemember the framework for your 3rd part: 1. Quantify your variables (really well) 2. Path analysis and direction of causality 3. Elaboration model and ideas for regression 4. Control issues
I have a problem concerning doing multigroup multilevel analysis. I want to specify some paths (between level) specific only to boys or girls. But I get an error message.
TITLE: PROOVIME; DATA: FILE IS nagudega.dat; VARIABLE: NAMES ARE ID GENDER SELF PEER VICT REJ EXTERN INTERN ADAPT VAEN SUMMA HOSTIL FRE ENE NEU; USEVARIABLES ARE PEER SELF HOSTIL ENE EXTERN INTERN; GROUPING = GENDER (1=MALE 2=FEMALE); CLUSTER IS ID; BETWEEN ARE PEER SELF EXTERN INTERN; WITHIN IS ENE; ANALYSIS: TYPE = TWOLEVEL; MODEL: %BETWEEN% HOSTIL ON EXTERN INTERN; %WITHIN% HOSTIL ON ENE; MODEL MALE: %BETWEEN% HOSTIL ON SELF; MODEL FEMALE: %BETWEEN% HOSTIL ON PEER;
OUTPUT: SAMPSTAT STANDARDIZED RES MOD (0.00);
*** WARNING in Model command Variable is uncorrelated with all other variables: PEER *** WARNING in Model command Variable is uncorrelated with all other variables: SELF *** WARNING in Model command All least one variable is uncorrelated with all other variables in the model. Check that this is what is intended. *** ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level of Group MALE: HOSTIL ON SELF * Statements in the BETWEEN level of Group FEMALE: HOSTIL ON PEER
In multiple group analysis, the overall MODEL command must contain the most general model. The group-specific MODEL commands show differences between the overall model and the model for each group. You would need to include all ON statements in the overall MODEL command and fix the ones you don't want in each group to zero.
Anonymous posted on Monday, June 07, 2004 - 12:28 pm
Assuming that i don't have the original data, how can I tell MPLUS to read one or two covariance matrices for a TWOLEVEL Model?
Dr. Muthen, I wanna do a multiple group(male and female ) for a multilevel path analysis, what is wrong for the following command? I wanna compare the path coefficient between male and female group, which I do not care whether or not the two group coefficient are equal.just wanna make a comparsion. thanks, boliang
following is the syntax:
USEVARIABLES ARE bul1 newac ctad class sex;
MISSING IS *;
BETWEEN = ctad; Grouping = sex (0 = g1 1 = g2); CLUSTER IS class;
ANALYSIS: TYPE = TWOLEVEL RANDOM Missing H1; MODEL: %WITHIN% s|newac ON bul1; bul1 ON newac@0;
You canot define random effect in the group-specific MODEL commands. Use the syntax below which will give each group the model you want. The overall MODEL command is the starting point for each group's model.
I am going to be attempting a multigroup multilevel path analysis for the first time, so I have read this discussion with interest. I have one specific question, however. Is it possible to use a grouping variable that is a level-2 variable instead of a level-1 variable? My data include 400+ children clustered in ~90 census block groups. I'd like to use neighborhood impoverishment (high vs. low/moderate) as the grouping variable. Is it okay to do this?
In multiple group analysis, a group should contain independent observations. So people from a cluster should be in the same group. With continuous outcomes, Mplus adjusts for this. With categorical outcomes, it does not.
Anonymous posted on Monday, June 13, 2005 - 9:01 am
I have multiple cohorts. Does it matter whether I specify using GROUPING or COHORT options? I guess i should use the COHORT option but what would be a senario in which the GROUPING option would instead be used?
The COHORT option is used in conjunction with the TIMEMEASURES. Together, the two options create new variables (based on the dates in COHORT option and the years specified in TIMEMEASURES) that are used in the analysis. This creates a pattern of missingness for observations of each cohort.
The GROUPING option is used when you want to analyze models where observations with a certain group membership are to be kept together.
Anonymous posted on Monday, June 13, 2005 - 10:52 am
So, if I want to conduct and compare analysis on two cohorts, I should use the grouping option?
Not if your data are strung out. Just treat it as regular data then. If your data are not strung out, then you use the GROUPING option. The setups for this are shown in the Day 2 short course handout.
Anonymous posted on Tuesday, June 14, 2005 - 9:34 am
I'm sorry. I see where to order the five day handout on the website but not the two day. Can you direct me? Thank you.
bmuthen posted on Wednesday, June 15, 2005 - 7:45 am
"Day 2" is the second day of the 5-day short course.
Anonymous posted on Wednesday, June 22, 2005 - 9:23 am
I am a novice, attempting to run a multilevel model with random slopes. I have students clustered in classrooms. I also have two groups (3rd&4th graders combined because some of them had the same teacher although they were in different grades, and 6th graders). It’s a simple path model, but I have some incomplete data. The way I have started is with a multilevel multigroup analysis.
Here are the input instructions:
USEVARIABLES ARE catchma holdma tcsex w4mint tid w4tmaint grade; MISSING = all (-99);
WITHIN = tcsex w4mint; BETWEEN = catchma holdma ; CLUSTER = tid; GROUPING = grade (0=young 1=old); CENTERING = GRANDMEAN (ALL); ANALYSIS: TYPE = TWOLEVEL RANDOM missing;
MODEL: %WITHIN% s1 | w4tmaint on tcsex*0; s2 | w4tmaint on w4mint*.5; %BETWEEN% w4tmaint s2 ON catchma holdma*0 ;
When I ran this model, the estimated between covariance matrix for the younger group was not positive definite. Then I fixed each of the slope variances to 0, and it ran ok.
I have a few questions:
1. Having fixed the slope variances to zero, it is unclear to me why I was still able to use between-level variables to predict variability in slopes (i.e., what variability was there to predict)? Is it that, by setting the slope variances to zero, I have only set the unaccounted for (resid) var to zero? If this is true (only residual variance was set to zero), then does seeting the resid var to zero affect the standard errors?
2. Second, my intraclass correlation for the 3rd&4th graders is low (.04). Could this have caused the problem?
Also, I’m wondering how it is best to write up results from this type of analysis, and it appears that there is a manuscript written by Muthén and others that has a similar analysis. Would it be possible to be sent a copy of the paper below?
Anonymous posted on Monday, June 27, 2005 - 7:38 am
In an attempt to compare path-coefficients of two groups I conducted a multigroup analysis. The sample sizes of the two groups are unequal (Na = 3.853 vs Nb = 440). The output says: NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. Could this be due to the unequal sample sizes? And if so, what would be a wise strategy?
It should not be due to that. See the user's guide for suggestions about non-convergence. If that does not help, send the output, data, and your license number to firstname.lastname@example.org. I would also get the model to converge for each group separately as a first step.
Anonymous posted on Tuesday, June 28, 2005 - 12:31 pm
I am conducting a multi-group cfa with ordered binary, complex sample data. Each group is contained in a separate data file. I have set up the default invariance model (which constrains thresholds and loadings equal across groups), and am getting the error:
"CLUSTER option cannot be used with multiple data files"
Does this mean that I cannot take into account the complex nature of data and test invariance at the same time? Is there a way around this (e.g., merging the two data files and distinguishing groups with a grouping variable)?
I am running a multi-group model with TYPE = COMPLEX. I have three groups 0, 1, and 2...the model has latent variables.
X ON Y; M1 M2 ON Y; M1 M2 ON X;
When I run the model I get the results I expect for the relationship X on Y in that the relationship is not significant in group 0 and it is very significant in group 2. To test for structural invariance I restrict the X on Y path to equal for the 0 and 2 model.
Such that I have:
X ON Y (1); M1 M2 ON Y; M1 M2 ON X;
MODEL 1: X ON Y;
However the chi-square diff. at 1 d.f. is not even remotely significant. (I used the scale correction factor as it is MLR)
That seems a bit odd to me as in one group X ON Y is not significant versus very significant in the other. Which leads me to beleive that maybe I am modeling it incorrect. Any thoughts?
Looks like you set it up right. Sounds like you have a case where the group 0 estimate a, say, is not necessarily close to 0 but has a large enough SE to make it insignificant, while the group 2 estimate b say is a bit further away from zero and significantly so - but b-a is not that large relative to its SE.
You can also try doing it via Model Test, which uses the Wald test and therefore automatically takes non-normality into account (see UG on how to do this).
Whoops sorry that was kind of obvious wasn't it. Just to clarify, in the User's Guide on page 488 with the sentence starting "In the MODEL CONSTRAINT command, the factor loading for y3 is constrianted to be....". Should that be y4 as p4=2*p2?
That being the case, if I want to constraint the structural path Y on X to be equal across two groups do I use:
MODEL: Y on X (P1);
MODEL TEST; P1 = 1; (or should this be a different number)
And when I have three groups and only want to constrain Y ON X for group 1 and 3 do I use:
MODEL: Y on X (P1);
MODEL TEST; P1 = 1;
MODEL 2: Y ON X; !allowing this parameter to vary for group 2?
If you have three groups and you want to use the Wald test to test a model where a regression coeffcient is free across three groups versus a model where the parameters are constrained to be equal across two groups, one way to specify it is the following:
MODEL: y ON x; MODEL g1: y ON x (p1); MODEL g3: y ON x (p3);
MODEL TEST: p1 = p3;
Note that MODEL TEST has the most restrictive model.
I am having some difficulty adding a multiple group analysis to a TWOLEVEL latent curve model that I am working on. I am very interested in how a between level variable (whether the husband in a couple is an alcoholic) may or may not modify the effect of divorce on drinking behavior in both the husband and his wife. I don't think I can use the between level variable (alcdad) to examine this witin level association (drink on divorce). When I try to use alcdad as a grouping variable instead, I get this error message " ALGORITHM = INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE = MIXTURE."
Here is my syntax
TITLE: rsa model using alcdad as grouping var; DATA: FILE IS e:/marriage/noalign.RAW; TYPE is INDIVIDUAL;
You need to use the KNOWNCLASS option instead of the GROUPING option. See Example 7.21 in the Mplus User's Guide to see how this is used. In your case, you would have only the KNOWNCLASS variable in the CLASSES statement. If you have further questions about this, please send them along with your license number to email@example.com.
I am doing multilevel modeling and am looking whether some associations are moderated by gender. I find that some of the variables do not have a between-level variance for girls. So, some of the between-level associations are set to be 0 for girls...Should I still compare the models where I constrain all the paths to be equal across the genders to the model where the paths are freely estimated (even when I know that for boys, there is a significant variance for some of the variables, while for girls, there is not)?
One more question: I am comparing nested models by doing chi-square difference tests. I am comparing the model where I constrain all the paths to be equal across genders and the model where I free one of the paths at a time. If the difference test is significant, am i correct when i conclude that i should retain the freely estimated path? And, when I am using MLR estimator, can I just multiply the chi-square value by scaling correction factor and then conduct the chi-square difference test?
Is it possible to make a Wald test to test whether or not a coefficient is lower or equal (LE) than cero in one group? Or if one coefficient is LE the same coefficient in another group? (I have unsuccessfully tried different combinations of model constraint/model test). Thanks.
Logical operators cannot be used in MODEL CONSTRAINT or MODEL TEST. Only arithmetic operators can be used. Please send your input, data, output, and license number to firstname.lastname@example.org so I can see what you are doing that is not working.
On page 486 under MODEL CONSTRAINT, it says: "Linear and non-linear constraints can be defined using the equal sign (=), the greater than sign (>), the less than sign (<), and all arithmetic operators and all functions that are available in the DEFINE command with the exception of the absolute value function."
I don't see where it says that logical operators can be used. If you look at DEFINE starting on page 409, you will see that there is a distinction made between logical operators, arithmetic operators, and functions.
Well, I tried with the < sign, by doing: Model Constraint: New (c); 0<exp(c)+p3-p2; Now, I hoped that making another run without the constraint, 2*difference Loglikelihood I could get a chi2 with 1 degree of freedom, testing p3<p2. Thanks.
I'm investigating measurement invariance across two groups in a multi-level model with a complex sample (TYPE = COMPLEX TWOLEVEL).
When I use the scaling correction factor with the likelihood difference test - invariance is rejected (the metric invariant model fits significantly worse than a configural invariance model), but when I use the scaling correction factor with the chi-square difference test - invariance holds (the metric invariant model does not fit significantly worse than the configural invariance model).
My question is whether one of these difference tests is preferred in this modeling situation? Also, I'm curious as to why they might be different.
I just looked at a COMPLEX TWOLEVEL output and I don't see that you get a chi-square fit test. Please send your output and license number to email@example.com so I can see what you are referring to.
I am doing invariance testing for a multiple groups latent growth model with a quadratic term estimated using MLR. I am interested in whether there are significant group differences in covariances, intercepts, and residual variances. In addition to my baseline model...
cluster IS momid; grouping IS female (0 = male 1 = female);
Thank you very much for your reply. A follow up question, if you don't mind. I am running a series of constrained models so that I may determine whether or not males and females differ significantly on their slopes, variances, effects of covariates, etc. Three of the 21 constrained models that I have run do not converge. first one - i s q | y1@0y2@2y3@4y4@6y5@8; i ON x1 x2 x3 x4; s ON x1 x2 x3 x4; q ON x1 x2 x3 x4; s ON x1 (1); second one - i s q | y1@0y2@2y3@4y4@6y5@8; i ON x1 x2 x3 x4; s ON x1 x2 x3 x4; q ON x1 x2 x3 x4; s WITH i (1); third one - i s q | y1@0y2@2y3@4y4@6y5@8; i ON x1 x2 x3 x4; s ON x1 x2 x3 x4; q ON x1 x2 x3 x4; [i] (1);
All of the other constrained models seem to run with no problems. Any idea what could be wrong? Thank you very much.
It has been sent. Thank you very much for your time.
Kätlin Peets posted on Tuesday, November 27, 2007 - 11:11 am
I have a question concerning conducting multigroup modeling (e.g.,examining associations by gender) vs. conducting analyses separately for each group. Have I understood correctly that in the case of multigroup modeling all observations are used to estimate the effects? And, if I run the analyses separately for each group, the number of observations is smaller.
I just noticed in the multigroup example on the website that it uses TYPE = MGROUP;
I can not find this in the user's manual. What is the difference if you are running a multigroup analysis and you don't use the TYPE = MGROUP command but do indicate that there are groups through Grouping is...say for example if you are running a TYPE = GENERAL?
I am trying to compare the mean of the slope growth factor across the two groups. The model that I have run is a multi-level GMM with KNOWN class option. The dataset is a repeated measures, clustered data. Here is the Mplus code: MODEL: %WITHIN% %OVERALL% iw sw | BP1@0BP2@1BP3@2BP4@3BP5@4;
How can I use log-likelihood difference test to see which class slope growth factor mean is higher / lower when compared to the other class in the model? The Mplus output contains *only* one log-likelihood.
Thanks, Linda. One last question. Since I have already run the Multilevel Growth Mixture Model with Known class option, how can I use MODEL TEST option? Do I have to re-run the model with MODEL TEST option? or, I can write a small, new code and test the slope differences using MODEL TEST option?
I ran the model, the one I described above in my question posted on January 05, with MODEL TEST option. I, however, cannot find the results from this test in the Mplus output. Where can I find in the output whether the overall mean of the slope factor in class#1 is statistically different from the overall mean of the slope in class#2?
Thank you. I have one last question on this thread. The main purpose of my analysis is to compare the overall mean of the slope growth factor of one particular group / class (i.e., the reference group) with the overall mean of the slope growth factor of 5 other groups in the dataset, respectively. In total, I have 5 pairs of multi-level, multi-group comparisons. I have successfully run all the 5 pairs of comparisons, however, I noticed that the coefficients for the reference group slightly change from one pair of comparison to the other. Can you please help me understand why the coefficients for the reference group slightly vary from one comparison to the other? Thank you.
Nidhi Kohli posted on Wednesday, February 22, 2012 - 10:32 am
My question is in the context of multi-level growth mixture model with known class membership. I was wondering if you can tell me why does Mplus, by default, fixes the mean of the intercept growth factor to zero in one class and allows it to be free in the other class? Secondly, since the mean of the intercept growth factor is fixed to zero in one class, how can one then test if the two means of the intercept growth factors in class1 and class2, respectively, are different? Thank you.
It sounds like you a categorical outcome. In this case, the growth model parametrization is to hold thresholds equal and fix the intercept mean to zero in one class. The test of mean differences would be means zero in all classes versus mean zero on one class.
Nidhi Kohli posted on Wednesday, February 22, 2012 - 1:29 pm
Thank you. Just to make sure that I understand you correctly. To test the mean differences I should run the model in two ways. In the first approach, I should fix the value of means of the intercept growth factors in class1 and class2, respectively, to zero. In the second approach, I should run the model where the intercept mean in class1 is fixed to zero, and in class2 it is free to be estimated. Right? Should I then DIFFTEST to see if the two nested models are different? Thank you.
See the website where difference testing with MLR is described under How-To.
Sergio Ruiz posted on Saturday, February 25, 2012 - 12:16 pm
I am testing a multigroup regrssion model where I have latent and observed predictors and interaction terms including latent x observed variables. I centered all the observed predictors around the mean but I am not sure what to do with the latent means for both groups. Should I center them? if yes, how it is done in MG analysis? Can you please give me some advise?
I have one last question on my post dated February 22, 2012. How can I compute the 95% confidence interval around the mean of the slope growth factor in a Multi-level GMM with known class membership? Generally, the equation for computing confidence interval is the following:
sample statistic +/- (Critical value x Standard error of the statistic).
I know the sample statistics (i.e., the mean estimate of slope growth factor) and I also know the S.E. of the sample statistics. How can I find the critical value? In other words, what is the distribution of the sample statistic in this case? Thank you.
Since the slope growth factor is continuous variable which is assumed to be normally distributed, I can use the critical value taken from a z-table to create confidence interval around the mean of the slope growth factor, right? Thanks.
I am conducting a multilevel multigroup analysis in MPLUS with observed variables and have tried to find the answers to the following questions without success:
1. I have adolescents nested in schools (cluster) and am grouping by ethnicity. Hence, I get the error message “Cluster ID cannot appear in more than one group.” On a message board related to MPLUS, it suggests that to remedy this problem I should define new cluster values that are unique for each group. So for example, in school 101, I would recode data so that Asian students were in school 1011, Black Students in school 1012, Latino Students in school 1013, White student in school 1014, etc. Is this my best option? It is reported that there are no unintended consequences of this method. Is this correct? If so, how do I report that I did this in a publication?
2. A number of researchers have reported that it is not possible to obtain a true R2 value for a multilevel model in Mplus. Is this accurate? Would you recommend that I calculate pseudo-R2 values if I would like to report the variance explained by predictors? If so, which method would you recommend?
1. It sounds like you are using a within-level variable as a grouping variable. When you do this. the groups are not independent because members of the same cluster appear in the groups. This violates the assumptions that the groups contain independent observations.
2. Mplus computes R-square for multilevel models. For continuous outcomes, it is a regular R-square. For categorical outcomes, it is a pseudo R-square. I wonder where this misinformation comes from.
Thank you for clarifying. I have one more question.
If I instead run a multigroup model and cluster by school, would this procedure adequately reduce my Type I error that I would risk by not conducting multilevel analysis? The ICC = .02 but there are citations explaining that even very low ICC's risk of a Type 1 error (e.g., Barcokowski 1981).
I am working on a 3-level latent growth model that is also a multiple group analysis across 5 cohorts. Based on this thread and Example 7.21 in the U.G. I have the code below, which does not run because "THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE...." I've also tried including INTEGRATION=MONTECARLO, with the same error.
Analysis: Type = TWOLEVEL MIXTURE; ESTIMATOR = MUML; MODEL: %WITHIN% %OVERALL% iw BY wiscraw-wiscraw3@1; sw BY wiscraw@-1wiscraw2@0wiscraw3@1; iw ON ...; sw ON ...; iw sw; %cohorts#1% iw sw; iw WITH sw; %cohorts#2% ... %cohorts#3% ... %cohorts#4% ... %cohorts#5% ...
Weber Seaman posted on Tuesday, September 11, 2012 - 7:00 am
Dear Dr. Muthen:
I understand that in order to conduct a multiple-group multilevel modeling, the grouping variable has to be a level-2 (or higher) variable.
However, if I want to do the multigroup based on RACE, a level-1 variable, is there any way to bypass the restriction?
Some people suggest the method of re-assigning cluster membership. For example , in school 101, Asian students are re-assigned cluster ID to 1011, Black Students to 1012, Latino Students to 1013, White student to 1014, etc.
Is this a safe way to do so? Dr. Linda Muthen suggested another alternative, using KNOWNCLASS. How different are these two approcahes?
We will shortly have a web note posted that describes how to do this. Hopefully within a week.
Weber Seaman posted on Tuesday, September 11, 2012 - 7:24 am
Thank you. Please post a link to the web note here when it is online. I think it will benefit many readers with the same question as me. Thank you again.
Sarah posted on Wednesday, September 04, 2013 - 4:58 am
I was wondering if it is possible to investigate group differences in path coefficients in a SEM model (as opposed to the model as a whole). I have tested a model and want to see if certain paths are significantly different between groups.
Group #1: F1 to F2 = .45 Group #2: F1 to F2 = .30 Group #3: F1 to F2 = .10
Is there a way to know if .45, .30, and .10 are significantly different?
You can do this using chi-square difference testing or MODEL TEST.
Nate Breznau posted on Wednesday, January 08, 2014 - 2:17 pm
Long time lurker, first time poster!
I am working with a moderation analysis.
I have a latent variable PFB measured from 1 individual, 1 regional and 1 country-level variable. PFB predicts Y (an individual attitude). ID is a categorical moderator (3 identities) also measured at the individual level.
Question: When I run this as a multilevel linear model (i.e. in stata) it includes dummies for region and country plus accounts for the clustered standard errors at each level. This gives me effect sizes of the moderation of PFB (interacted with ID) that range from about .03 to .06 standardized at the individual level. When I run it in MPLus I get standardized coefficients in the three groups that range from .27 to .30. As much as I love these massive effect sizes I am instinctively thinking they are wrong because I don't know how to account for the nested data structure. The best I have is the following for syntax.
[..abridged...] CLUSTER = country; GROUPING = id (1=A 3=B 5=C); Analysis: Type = COMPLEX ; Model: Y BY w1 w2 w3 w4 w5 w6 w7; pfb BY pfb1 pfb2 pfb3; Y ON pfb;
How can I deal with the fact that much of the unobserved variance in Y occurs at the regional or country level? Or that pfb2 is regional and pfb3 is country level.
When you say that you do multilevel analysis with "dummies for region and country plus accounts for the clustered standard errors at each level.", I wonder what the levels are for the multilevel analysis. It seems that you either use region as level-2 (and country as level-3) and do a multilevel analysis, or have them as dummies and do a single-level analysis.
Using them as dummies affects the results of the regression of Y ON pbf and could explain the discrepancy you see.
One approach would be to use country as a grouping variable, region as the cluster variable, and id and its interaction effects as a level=1 dummies (14 countries is too few for taking a random mode approach with respect to country). The question then is what you do with your pfb measurement model.
Nate Breznau posted on Thursday, January 09, 2014 - 10:51 am
Thank you so much for your reply.
I have used the term 'dummies' in my post a bit falsely based on old fashioned approaches. I use a multilevel approach with individuals, regions and countries (xtmixed Stata operations, in case you know them).
If I use country as a grouping variable I think MPlus will then analyze the effect separately for each group/country. Or can I get around this by fixing the effects of independent variables to be identical for all groups? I ask because I still want the variance in Y (the DV) to be partitioned out of the variation I seek to explain at the individual level (like with the multilevel model where the unexplained variation at the country and region level is kept out of the estimation of the individual level variance of Y).
I am unsure about the pfb measurement model. This is similar to the question above, but if I use the grouping of country and clustering of region will MPLus account for the nested structure of the latent variable pfb (measured at each of the 3 levels)?
Finally, if I use no second- or third-level predictor variables (other than for pfb) and I include dummies for region and/or country in MPlus, instead of using clustering, am I missing something important that will go wrong that a multilevel model would otherwise correct for?
I am running MPlus 6 by the way in case it matters.
With country as a grouping variable you can use equality constraint across countries for any parameter in the model.
Am I understanding you correctly that the pfb factor is measured by the 3 indicators pfb1, pfb2, and pfb3, where the 3 are the same thing but on the individual, region, and country levels?
I don't think you should use dummies for 112 regions.
Nate Breznau posted on Wednesday, January 15, 2014 - 2:03 pm
It is unclear to me how grouping by country and then using equality constraints will help me accurately remove unobserved heterogeneity at the country-level that can account for country-level variation in the DV (there is a lot of it, roughly 30%). I don't have enough countries (14) to use more than one parameter at that level, and this is reserved for pfb3.
pfb1 is an individual level subjective evaluation of the number of foreign-born persons in the country, pfb2 is a census measure of regional-level percent foreign-born, and pfb3 is census country-level percent foreign born.
Yes... MPlus does not like 112 dummies. I can't get it to converge.
You have a latent variable Y measured by several indicators, so the mean of Y and the measurement parameters for the Y indicators can be different in the 14 groups, beyond the regional differences. The multiple-group approach takes care of the heterogeneity in a fixed mode fashion whereas region does it in a random mode fashion (assuming you use Cluster=region). for fixed- versus random-mode modeling of measurement models, see also
Muthén and Asparouhov (2013). New methods for the study of measurement invariance with many groups. Mplus scripts are available here.
which is on our website.
As for the pfb factor, you can define it on the regional level using the region-part of the pfb1 variable and the pfb2 region-level variable. I don't know how to get pfb3 in there.
Tom Aquin posted on Wednesday, January 15, 2014 - 11:12 pm
I have just a quick clarification question regarding a multigroup multilevel path analysis. As far as I understand, when doing multigroup path analysis, there is no measurement invariance analysis since there are only observed variables.
But what about a multigroup multilevel path analysis, where you have, for example, an observed variable specified on both levels? The observed variable at the within level has a latent counterpart on the between level. Are there any necessary steps of measurement invariance analysis? If so, which? Do you know any references applying a multigroup multilevel path analysis?
There are no measurement invariance issues related to the situations you describe. Regarding papers on multiple group multilevel models, try searching for papers by Preacher and Zyphur.
Nate Breznau posted on Thursday, February 20, 2014 - 12:23 pm
I am following up on my two previous posts, but have now abandoned the 3-level factor "pfb", and instead want to estimate each component of this factor independently; one effect at each of 3-levels.
So I have pfb_c at the country-level; pfb_ctx at the regional level; and pfb_s at the individual level. pfb_c and pfb_ctx are manifest observations of the percentage of foreign-born persons in countries and in regions within countries (14 countries; 113 regions). This is like a standard within and between setup, except for the within effects are at level-2 within each level-3 group; and between effects are at level-3. I have no idea how to program this.
1st Question: Can MPlus handle this three-level model? I have version 6.
2nd Question: Can MPlus do this three-level model for a 3-group multigroup model?
Any references to literature and/or code are sought.
Thank you for making the applied work of a (not-so-mathematical) sociologist possible.
sailor cai posted on Monday, February 09, 2015 - 8:09 pm
Dear Drs Muthen,
I have a question on unequal sample sizes for running multiple group SEM using Mplus. I want to compare a structural model with three exogenous variables and one endogenous variables across two groups (say, n1=14,000; n2=40,000). Do I need to weight the item parameters after modeling? If so, how? It would be appreicated if you can recommend some literature!
I think you want to use weighting only if you have oversampled in one of the groups and you want to estimate a quantity like the mean overall. In multi-group settings the situation is typically different in that you either estimate group-specific parameters or a parameter that is the same across groups and in neither case you want to weight.
Sonja Nonte posted on Thursday, February 12, 2015 - 4:12 am
I conducted a multigroup model for complex survey data (three independent cohorts) to check for measurement invariance and to identify "true" differences in reading motivation (for the latent mean scores). Strong invariance is given, so I decided to analyze the effect of different variables (level1 and level2) on reading motivation in a multilevel random slope model. I am using the grouping variable (to hold factor loadings and intercepts equal across groups), too. Now I have two questions: 1. How can I interpret the unstandardized regression weights in the output for the different cohorts? Are the betas the effects on reading motivation for each cohort or the differences compared to cohort 1 (reference group)? 2. I don't get any latent mean for the cohorts and according to this no indicator for mean-differences for reading motivation in the random slope model. Can I interpret the Intercept as difference? The intercept for cohort 1 is zero, so cohort 2 and 3 differs from cohort1 to the amount that is given in the intercept of the latent factor for reading motivation (within)? Am I right to conclude, if the intercept from cohort 2 and 3 are not significant, the cohorts don't differ from cohort1 after controlling for influencing factors? (In multigroup model the differ significantly.) Sorry for my questions.
Steven John posted on Thursday, April 02, 2015 - 4:06 am
I ran two identical 2-level SEM models for two different samples. I obtained different coefficients for the relationship of interest. I would like to examine whether the estimates differ statistically. Is it correct to compute two 2-level multiple group models, where 1) all parameters are constrained and 2) constraints are relaxed only for my relationship of interest? Thereafter calcultate the chi2 difference-test, as proposed by Satorra & Bentler?
1.Is it possible to fit a multiple group three-level CFA model with ordered categorical variables in Mplus 7?
2. I have seen in the UG that with Bayes option, multiple group analysis is done with MIXTURE and KNOWNCLASS option since GROUPING option is not available with ESTIMATOR=BAYES option. How would one specify such a model?
I have received the following error when conducting a multilevel mixture model: *** ERROR in MODEL command Unrestricted x-variables in TWOLEVEL MIXTURE analysis must be specified as either a WITHIN or BETWEEN variable. The following variable cannot exist on both levels: F_1 *** ERROR in MODEL command Unrestricted x-variables in TWOLEVEL MIXTURE analysis must be specified as either a WITHIN or BETWEEN variable. The following variable cannot exist on both levels: DTX
This is part of a mediation mode where DTX is the predictor and F_1 is the mediator. Both are BETWEEN level variables but since I am estimating different WITHIN-LEVEL effects for individuals within these clusters I have not specified them as BETWEEN or WITHIN.
Dear Mplus-Team, I'm analysing a twolevel multigroup-model for a categorical dependent variable using the knowclass command (intercepts only, no independent variabls). I have 120 groups. Is there a way to free all variances on L2? (something like the allfree command) Or is it necessary to use... %c#1% var; %c#1% var; %c#2% var; .... %c#120% var;
There is no such option for maximum likelihood estimation.
Will Thomas posted on Thursday, February 04, 2016 - 9:55 am
Dear Bengt/Linda, not sure if this is the right place to post this but here goes anyway.
I have run a multilevel cross lag model with data from 2 countries. I want to test whether there are country differences but I don't have a large enough sample size to run the necessary multigroup models. I was wondering if there is a way around this? Maybe constraining parameters? Or testing parts of the model separately (I have 2 different DVs)?
I'm relatively new to Mplus and this form of analyses, so slightly unsure how best to proceed.
You can let country be represented by dummy covariates.
Will Thomas posted on Monday, February 08, 2016 - 3:14 am
Thank you Bengt.
I'm sure this is simple to do but I'm not sure of the code needed. I looked at the examples but got a bit lost, is there one you could recommend? Alternatively, my current code is below (only within part shown due to space).
Your help is most appreciated!
NAMES = Team England ! coded 0 = Italy 1 = England IDENT0-IDENT3 T_Perf0-T_Perf3 A_Perf0-A_Perf3;
cluster is team; within ; between A_perf0-A_perf3;
ANALYSIS: type is twolevel;
! Autoregressive paths IDENT1 ON IDENT0 (1); IDENT2 ON IDENT1 (1); IDENT3 ON IDENT2 (1); T_Perf1 ON T_Perf0 (2); T_Perf2 ON T_Perf1 (2); T_Perf3 ON T_Perf2 (2);
!Cross Lag IDENT1 ON T_Perf0 (3); IDENT2 ON T_Perf1 (3); IDENT3 ON T_Perf2 (3); T_Perf1 ON IDENT0 (4); T_Perf2 ON IDENT1 (4); T_Perf3 ON IDENT2 (4); IDENT1 WITH T_Perf1 (5); IDENT2 WITH T_Perf2 (5); IDENT3 WITH T_Perf3 (5);
Anne Black posted on Tuesday, April 25, 2017 - 6:42 am
Dear Drs. Muthen and Muthen, I have complex survey data (Add Health data set) with weight, cluster and stratification variables. I need to regress a count outcome on four latent variables, allowing estimates to vary by a grouping variable (teen vs. older mothers). For dichotomous outcomes I used multiple group analysis, and type=complex, but can't use for the count outcome. I'm trying to translate what I've specified for the multiple groups, type=complex analysis to a known groups, type=mixture model. The first error I'm getting is that I can't use the stratification variable. Can you direct me to an example of modeling a count outcome as a function of latent variables using multiple groups and complex survey data? Can I get around this by specifying the count outcome as ordered categorical? Thank you for your guidance!
Asparouhov, T. & Muthén, B. (2012). Multiple group multilevel analysis. Mplus Web Notes: No. 16. November 15, 2012.
Marja Holm posted on Tuesday, November 21, 2017 - 2:36 pm
I want to investigate differences in three latent constructs between three groups at the within (student) and between (class) level. I have try to do this two-level multiple group analysis with continuous factor indicator and used following code:
USEVARIABLES ARE class SEMS JO PR AG AX SH HL BO; grouping= SEMS(0=full 1=part 2 =without); cluster =class;
ANALYSIS: TYPE= twolevel;
MODEL: %WITHIN% Fw1 by JO PR; Fw2 by AG BO; Fw3 by AX SH HL;
%BETWEEN% Fb1 by JO PR; Fb2 by AG BO; Fb3 by AX SH HL;
Model part: %WITHIN% Fw1 by JO PR; Fw2 by AG BO; Fw3 by AX SH HL;
Model without: %WITHIN% Fw1 by JO PR; Fw2 by AG BO; Fw3 by AX SH HL;
OUTPUT: sampstat stdyx;
I got this error message: *** ERROR Cluster ID cannot appear in more than one group. Problem with cluster ID: 1
Students in same group can be different class(room).
Marja Holm posted on Tuesday, November 21, 2017 - 10:27 pm
Marja Holm posted on Wednesday, November 22, 2017 - 3:28 am
Now, I used two multilevel logistic regression (cluster=class) to compare three groups (two dummy variables and one reference group).
However, I want to report also latent means across groups. So, I saved MPLUS factor scores (SAVE IS fscores) between (classroom) and within level (student), standardized them and calculated mean, SD, and standard error across groups. The problem is that I don’t understand the metric of these factor scores. The used observations variables are means of the items measured by 5-point scale (loaded on factors), but these latent values are very different and SD is quite high (e.g. one of the groups (within level): M = .016 and SD = .54 and standardized: M = .03 and SD= .98). I think that the differences between these latent means seem to be reasonable.
Should I use them or is there any other way to get these latent scores?
Marja Holm posted on Saturday, November 25, 2017 - 2:40 pm
Maybe I'm just doing something wrong. Here are some questions.
I wanted to form a multi-level model where the dummy variable is predicting simultaneously latent variables at the individual and class level (interesting the group (dummy) differences).
1. Do I have to group-mean (class) center this dummy variable at the individual level? 2. Do MPLUS automatically calculate group ratio by dividing the number of group by the total number of its students at the class-level? 3. In the second analysis, I use the test result (continuous) as a covariant to predict these latent variables on the both levels (in addition to dummy variables). Do I also need class-mean center this variable at the individual level? Do MPLUS calculate mean class scores of this variable at the between level.
4. Do I need to standardized all predictors also these dummy variable or only this test result. Only individual level predictors or both?
5. I am also interesting to look the compositional effects. Do I just replace this group-mean with the grand-mean centering in that model?
6. Example 9.1 (guide) shows that there are two identical variables x (grand-mean centering) and xm (takes the value of the mean of x for each cluster). Do I need to determine same variable at two times on the usevariable list (one goes to the within and other to the class level)
I will just answer the Mplus-related questions. For the general analysis strategy questions, try SEMNET.
3) Read the UG ex 9.1, 9.2 and the references therein.
6) Yes, as shown in the ex 9.1 input.
Marja Holm posted on Tuesday, November 28, 2017 - 11:18 am
Noah Emery posted on Monday, March 19, 2018 - 9:09 am
Below is my code. Pt1 CLUSTER IS ID; CATEGORICAL are like; Missing are all (-9999) ; classes = C(2); knownclass = C (useplan3 = 0 1); BETWEEN are sex; WITHIN are weekend; DEFINE: DATA TWOPART: NAMES = totgrams; CUTPOINT = 0; BINARY = like; CONTINUOUS = amount; ANALYSIS: TYPE = twolevel mixture; ALGORITHM=INTEGRATION; INTEGRATION=montecarlo(2000); MODEL: %WITHIN% %overall% like ON daynegaff dayposaff daycraving pfriends pplaces intox; amount ON daynegaff dayposaff daycraving pfriends pplaces intox; like amount ON weekend; daynegaff dayposaff daycraving pfriends pplaces intox WITH daynegaff dayposaff daycraving pfriends pplaces intox; daynegaff dayposaff daycraving pfriends pplaces intox ON weekend; %BETWEEN% %overall% DNA BY daynegaff@1; daynegaff@0; DPA BY dayposaff@1; dayposaff@0; DCR BY daycraving@1; daycraving@0; DFR BY pfriends@1; pfriends@0; DPL BY pplaces@1; pplaces@0; DI BY intox@1; intox@0; like amount ON DNA DPA DCR DFR DPL DI sex; like WITH amount;
Noah Emery posted on Monday, March 19, 2018 - 9:10 am
I am trying to run a multilevel multigroup SEM analysis to test for structural invariance across days using EMA data. My outcome has a floor effect with a preponderance of zeros; thus, I am using the DATA TWO PART function to specify it as semi-continuous.
This added complexity is making difficult to follow any examples I have come across and was hoping to get some guidance on how to best estimate this model. My attempts thus far have lead me to using the KNOWNCLASS function, but I am getting a fatal error and I am just too close to it to see where my error lie.
Any help would be greatly appreciated.
Noah Emery posted on Monday, March 19, 2018 - 9:12 am
Pt 2 %WITHIN% %C#1% like ON daynegaff dayposaff daycraving pfriends pplaces intox; amount ON daynegaff dayposaff daycraving pfriends pplaces intox; like amount ON weekend; daynegaff dayposaff daycraving pfriends pplaces intox WITH daynegaff dayposaff daycraving pfriends pplaces intox; daynegaff dayposaff daycraving pfriends pplaces intox ON weekend; %BETWEEN% %overall% DNA BY daynegaff@1; daynegaff@0; DPA BY dayposaff@1; dayposaff@0; DCR BY daycraving@1; daycraving@0; DFR BY pfriends@1; pfriends@0; DPL BY pplaces@1; pplaces@0; DI BY intox@1; intox@0; like amount ON DNA DPA DCR DFR DPL DI sex; like WITH amount;