Anonymous posted on Monday, March 21, 2005 - 2:28 pm
Hi. I was hoping to get your perspective on the following analysis scenario.
I have three level dataŚ400 customers (level 1) that belong to 30 different customer groups (level 2). The 30 different customer groups, in turn, share 1 of 12 providers (level 3).
I have independent variables at the provider level (level 3) which I want to use to predict customer level perceptions (level 1).
I am NOT interested in modeling the multilevel nature of the data. Rather, I want to test a model using SEM that accounts for the complex design and provides unbiased parameter estimates and chi-square values.
If the provider (n=12) is used as the clustering variable, design effects fall below 2. However, if the customer group (n=30) is used as the clustering variable, design effects are substantially greater than 2.
So, if I specify a SEM model (using the type=complex specification) where the provider-level variables (level 3) are utilized to predict the customer-level variables (level 1) and use the customer group (n=30) as the clustering variable (to account for design effects), do I get unbiased parameter estimates and fit indexes? In other words, does the type=complex estimation procedure generate unbiased estimates even if the clustering variable (level 2) and predictor variables (level 3) are not at the same level?
Your comments/suggestions would be greatly appreciated!
bmuthen posted on Monday, March 21, 2005 - 11:46 pm
If the design effects for the provider clustering variable are small, I would not worry if you use type=complex with cluster = customergroup and level-3 predictors (including provider dummies). I think that is the way to go here.
I have two level data, employees (n = 1200) and work locations (outlet = 140). I controlled for the clustered standard error by using STRATIFICATION command and analysis TYPE = COMPLEX. The results are fine in general comparing between before and after I applied this, i.e. there are some changes in the significance level of the path estimates but this does not change the nature of results. However, I found p value of RMSEA changed significantly. Here is the results.
The model fit before applying this control. RMSEA = .052 (CI = .051-.054) , p = .003.
The model fit after applying this control. RMSEA = .044 (CI = .043-.046) , p = 1.000.
Could you please suggest why this is the case and whether the results obtained are reliable. Thanks.
I would like to estimate cell means following a multilevel analysis with complex sample weights between and within. I have a cross-level interaction of a 2-category variable at level 1 and a 2-cat var at L2, so I'm using TYPE = COMPLEX TWOLEVEL RANDOM, and estimating the association between my response and the L1 predictor with a S_L1x | y ON x statement, and the cross-level interaction with S_L1x ON L2x at the BETWEEN level. I'm saving FACTOR SCORES to get the EB estimates. My question is, when I create a (say) Stata file with the factor scores and compute the estimated means for the four cells for the L1 by L2 interaction, are sampling weights already taken into account or should I use sampling weights again to estimate the means? The difference in estimates the two ways is not trivial in my data. Thanks! Bruce
Thanks Linda - Now I'm stuck again. Is there a way to get Mplus to give me estimated cell means for the cross-level interaction of two binary predictors (one at L1 and one at L2) from a multilevel model using TYPE = TWOLEVEL COMPLEX RANDOM? My other current software won't give me estimates with sampling weights at both the between and within levels. I can't figure out how to use TYPE = TWOLEVEL COMPLEX BASIC to give me cell means in Mplus. Best, Bruce
I'm a bit confused. I had the impression that you were interested in the random slope values. That you wanted factor scores = eb values for the random slope. When you say "cell means" I would think you refer to the cells of the 2 x 2 table of the L1 and L2 binary variables. But "means" of what? The random slope changes values only as a function of the L2 binary variable.
Thanks Bengt - Perhaps I'm the one who is confused, but I have the idea that we should be able to get EB estimates (predicted scores) for each observation, allowing the computation of group means for the 2x2 cross-level interaction -- in a way similar to estimated means across time for a multilevel longitudinal analysis. So, if the L1 variable is L1Group, and the L2 variable is L2Group, I should be able to compute the means for obs in L1Group1 (of course, averaged up to L2) and L2Group1, L1G2 and L2G1, etc. The parameter estimates for the L2 intercept and the slopes in the output table should -- as a regression equation -- allow the estimation of these "cell means" at Level 2, but the EB estimates should also allow computing them for the L1Group by L2Group variables. I'd like to get the EB estimates to do that, because using the parameter estimates to compute them gave impossible values (negative) for two cell means in an analysis. Am I confused? - Bruce
So are you talking about estimated cell means for the DV, not for the random slope? The random slope varies only over the cluster (Between) units. Maybe it helps us use the same language if we talk in terms of slide 45 of the Topic 7 most recent handout of 3/29/11 on our web site. There we have the L1 covariate x_ij and the L2 covariate w_j and we can choose to consider y_ij, beta_1j, or gamma_11 as our quantity of interest.
Thanks Bengt! Looking at Topic 7 slides 44 and 45, you have a random intercepts only model in 44 and a random slopes only model in 45.
Following Raudenbush & Bryk, what's missing is the combined model:
MODEL: %WITHIN%; beta1 | y ON x; ! This gives us both beta0 and beta1 ! beta0 = random intercept, in Mplus, latent y at Level 2 ! beta1 = random slope %BETWEEN%; y ON w; ! y is the same as beta0j ! estimates gamma01 beta1 ON w; ! estimates gamma11
The combined model allows computation of the estimated means for all four cells. It should be: y_ij = gamma00 + gamma01.w_j + gamma10.x_ij + gamma11.w_j.x_ij
The mean for x=0 and w=0 is the intercept: gamma00 The mean for x=0 and w=1 adds gamma00 and gamma01 The mean for x=1 and w=0 adds gamma00 and gamma10 The mean for x=1 and w=1 adds gamma00, gamma01, gamma10, and gamma11
I'd like to compute (what I think are) the marginal means for the four cells, where x is binary at L1, and w is binary at L2. I'm hoping that the EB estimates (factor scores) will help me check the computation. - Best, Bruce
That clarifies what you want for me. So you want to compute those 4 means which are functions of the 4 gammas. That doesn't involve the EB for the random slope beta_1j. Slide 45 shows that the "reduced-form" line starting with "i.e." has expectation zero for the last two terms (conditional on x). So the 4 means are obtained simply by labeling the 4 gammas in the Model statement and using Model Constraint to express the 4 means as you did. That gives the means and their SEs. I hope I am understanding you correctly.
Mygosh - I think I got it! I think I should have taken Mplus 101. (Would that I could.) I hope that B. Byrne's new book will help me learn these basics that I've not yet learned! Could you check this example syntax to see if it is giving what I'm after? (I think it is, but it would be reassuring if you could confirm it.
MODEL: %WITHIN% s_mhsa | brdt60 ON mhsapt ; %BETWEEN% brdt60 ON msa (p2) ; s_mhsa ON msa (p3) ; [s_mhsa] (p1) ; s_mhsa@0 ; [brdt60] (p0) ;
MODEL CONSTRAINT: NEW (m0ms0); NEW (m0ms1); NEW (m1ms0); NEW (m1ms1); m0ms0 = p0 ; m0ms1 = p0 + p2 ; m1ms0 = p0 + p1 ; m1ms1 = p0 + p1 + p2 + p3 ;
Thank you so much, Bengt! Thanks for your suggestion about the random slopes, also. I specified random intercepts only models because some of the variables examined at L1, and their cross-level interactions, were based on very unbalanced variables across the level 2 units. Some models took much longer (relatively speaking) to converge, and I attributed that to the random slopes estimation. Fixing the random slope variance to zero resulted in models that converged more quickly. Do you in principal think that is unwise? Best, Bruce
As a follow-up re random intercepts compared to random coefficients -- the estimated means from some RC models differ a meaningful amount, when random slopes are also estimated compared to the random intercept only models. That also makes me worry about the effect of unbalanced variables across level 2 units. Any thoughts? - bac