Mplus Discussion >> Hierarchical regression

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Hierarchical regression

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Tom Dietz posted on Sunday, December 03, 2000 - 6:10 am

I recently purchased Mplus and so am quite naive about it's use.

In a pilot study, I have about 100 subjects. Each was asked to make 14 decisions, say d. There are two characteristics that vary across the 14 decisions, say X1 and X2. These take on the same values for a particular decision across all subjects.

I'd like to model the decision with a logit with X1 and X2 predicting the probability of choosing d=1 rather than d=0 for each subject. Then given logit coefficients B1 and B2 for each subject, I'd like to use B1 and B2 as dependent variables that are a function of individual characteristics W1, W2, W3, etc. using the individuals as the between level. Of course, this is a pilot so the numbers are small for justifying estimation, but I'm trying to get a sense of the procedures.

Is these possible in Mplus? From the manual it wasn't clear how to model the coefficients at the within level as dependent variables at the between level.

Thanks,
Tom Dietz
George Mason University

M. Lee Van Horn posted on Sunday, December 03, 2000 - 7:19 pm

Multilevel SEM currently does not allow for cross level interactions. Further, you can not have binary dependent variables in the two level Mplus models. I think your best solution in a multilevel framework is to use HLM.

You could also, I think, do this in a one level model in which there are 14 binary outcomes with X1 and X2 as predictors and including interactions between the X and W variables. The advantage to this is that it allows the B's to vary between the different decisions, which may be a more realistic model. The disadvantage is that the interactions are not as clean and you have 14 dependent variables.

Bengt or Linda can correct me if I'm wrong. 8-) Best of luck.

Lee

bmuthen posted on Monday, December 04, 2000 - 10:09 am

I think Van Horn's answer is to the point. Mplus currently does not offer multilevel modeling with categorical outcomes or random slopes. But, Van Horn's suggestion of a multivariate approach (14 binary outcomes) is a possibility because of the fact that your decision characteristics x1 x2 do not vary across individuals. The multivariate approach is analogous to how growth modeling of repeated measures is done in a latent variable framework. Here you have in essence 14 "time points" and 2 "growth factors" for your 100 individuals, so a 14-variable, 2-factor, single-level categorical outcomes problem. The growth factors are the x1 and x2 slopes (probit instead of logit in Mplus), which vary across individuals. The growth factor time scores (i.e. loadings) are your x1 and x2 values, which do not vary across individuals (fixed parameters). The growth factors can be regressed on the w variables. A reference to this type of analysis for continuous outcomes is paper #79 on the Mplus web site under growth modeling. Paper #64 discusses growth modeling of categorical outcomes.

Anonymous posted on Saturday, January 06, 2001 - 11:30 am

If I have found a two-factor model with an EFA and have found it not to converge as a two-factor model in a multilevel framework, are there any cites that I could use to argue that the two-factor model found in the EFA is an artifact of the nested nature of the data?

Anonymous posted on Tuesday, May 18, 2004 - 11:51 pm

I ask for a help about the problem of some extent overlap between level-2 predictor and outcome in analyses moderating effect.
I intend to consider the model:
Level 1: Yij = b0j + b1j (Xij) + eij
Level 2: b0j=r00
b1j=r10+r11(Wj)+u1
The outcome variable Yij is an individual characteristic variable, such as social competemce, where the level 2 variable Wj is a composite group variable was created using sevel individual variables (such as academic performance, leadership, peer acceptance, and social competence), which also including social competence. The result of multilevel confirmation factor analysis revealed that the way of composition of level-2 variable is reasonable.
Now I want to know:

(1) If I only consider the effect of level-2 variable on the level-1 random slope, whether the overlap of predictor and outcome is a serious problem or not?
My consider is that I am look at level2 influence on slopes but not intercept, the slope is the association between two variables which is a distinctive concept from the level-2 variable, am I right?

(2) If I also consider the effect of level-2 variable on the random intercept, what should I do?

Thank you very much for any comment.

bmuthen posted on Wednesday, May 19, 2004 - 8:01 am

I am not sure about this one. On the one hand one can argue that w is a cluster-level variable that reflects the environment and even if it is created via aggregation using the individual-level y variable, the aggregated variable means something different. Which would say that one could allow w to predict both boj and b1j. On the other hand one can argue that this way of creating w introduces a spurious correlation between w and y. Which to me might then call for allowing both b0j and b1j to be predicted by w so that the w and y correlation can be most freely represented. I am inclined to favor the first view, but let's hear what other readers think.

Anonymous posted on Friday, April 08, 2005 - 11:05 am

Is there is no convenient means of generating Empirical Bayes (EB) residuals for models with random slopes and intercepts using Mplus 3.12 output ?

I'm interested in the EB residuals because I wish to: (1) test for omitted variables using frequency distributions and scatter plots (etc.) of the random coefficient-specific residuals; and (2) calculate a Mahalanobis Distance for each Level-2 unit so that I can get a sense of outliers, problem cases, overall model fit (in the manner suggested by Bryk and Raudenbush, Chapter 9).

As I understand it -- to calculate the EB residuals I would need the Level-2 unit-specific EB slopes and intercepts (which Mplus 3.12 does not provide), or the L-2 unit-specific reliabilities and the L-2 unit-specific OLS coefficents for each of the random coefficients (which Mplus 3.12 does not provide).

Are you aware of any way to calculate these quantities using Mplus ?

BMuthen posted on Saturday, April 09, 2005 - 3:48 am

The level-2 unit specific EB slopes and intercepts are available in Mplus using the FSCORES option of the SAVEDATA command.

Anonymous posted on Saturday, April 09, 2005 - 6:06 am

Hi Prof.Muthen,
I wanna model a multilevel mutiple regression model as following:
Y on X1 x2 x3,
and wanna each slop are ramdomized, how can I write the Mplus programme?
I do not know how to define each randomized slope in term of S|.
thanks for your instruction

Linda K. Muthen posted on Sunday, April 10, 2005 - 2:43 am

See the examples in Chapter 9 of the Mplus User's Guide and the description of the | command.

Anonymous posted on Friday, June 24, 2005 - 3:39 pm

I�m looking for a sanity check on a multilevel model I�ve built in Mplus.

The model has two outcome variables: a categorical variable (CAT) with three response levels, and a continuous variable (CONT), both of I posit as having BETWEEN and WITHIN cluster variation.

For a WITHIN predictor variables X, and BETWEEN predictor variables Z, I specify the following model in Mplus:

%WITHIN%
CONT on X;
CAT on X;

%BETWEEN%
CONT on Z;

I by CAT@1;
CAT@0;
I ON Z;

I obtain from the model a set of WITHIN and BETWEEN parameter estimates (�Beta�s), plus a BETWEEN cluster Tau-1 and Tau-2 estimates. The residual variance of I is non-zero. I�m interested in using the Mplus parameter estimates to predict values of CAT (CAT-HAT) and seeing how well they match with the �true� values of CAT.

I use the probit conversions:

p(CAT=1| X,Z)=cdf.normal(Tau-1 - X*Beta - Z*Beta),

p(CAT=2| X,Z)=cdf.normal(Tau-2 - X*Beta - Z*Beta)
- cdf.normal(Tau-1 - X*Beta - Z*Beta),

p(CAT=3| X,Z)=1-cdf.normal(Tau-2 - X*Beta-Z*Beta);

my results are surprising � I substantially under-predict the number of respondents that have fall in response category CAT=2. Its possible that my data do not fit the model assumptions, but I also wonder if I�m thinking about the Mplus output / model parameterization the right way.

1. Am I neglecting a �cluster-level� mean term in the P(CAT | X,Z) formulas (i.e., a mean cluster-level p(CAT)) ?

2. If the factor scores from I obtained from the above model are like empirical Bayes residuals, can they be used to get a mean, cluster-level p(CAT) ? Would this be obtained via:

CLUST_MEAN_CAT = EB_RESID - Z*Beta ?

Thanks.

BMuthen posted on Sunday, June 26, 2005 - 2:15 am

In your probability calculations, you need to take into account that the residual variances also include the between-level variation. Also, check the output to see that you have the weighted least squares estimator and not maximum likelihood. If you have maximum likelihood, you have logistic regressions not probit regressions.

Anonymous posted on Sunday, June 26, 2005 - 4:55 pm

Regarding your first point: are you saying that the BETWEEN level factor scores need to be included in the probability calculations, i.e.:

p(CAT=1| X,Z)=cdf.normal(Tau-1 - XBeta - ZBeta - EBRESID) ?

Regarding your second point: the Mplus 3.0 User Guide suggests that TYPE=TWOLEVEL {ETC.} and TYPE=RANDOM {ETC.} can only be used with ML, MLR, or MLF when one of the dependent variables is categorical (and when I try to run the two level model in question with WLS or WLSMV Mplus 3.12 reverts to ML). Am I misunderstanding your response ? I don�t see how the WLS estimators can be used to construct a multilevel SEM with concurrent categorical and ordinal outcomes, both displaying WITHIN and BETWEEN variation.

Thanks again.

BMuthen posted on Tuesday, June 28, 2005 - 8:18 am

No, I meant that the argument for the normal cdf function needs to be divided by a standard deviation that includes the between-level residual variance.

WLSMV is never available for TWOLEVEL. With CATEGORICAL outcomes and TWOLEVEL and RANDOM, you only have the estmators you mention. There is a table in Chapter 15 under the ESTIMATOR command that summarizes the estimators available.

Anonymous posted on Thursday, June 30, 2005 - 4:08 pm

Thanks for your response.

1. I am aware of the table you note from Chapter 15 of the Mplus 3.0 User�s Guide. From your first response, I got the impression that you were suggesting I use an Mplus WLS estimator, and that w/out doing so (i.e., using a ML estimator) Mplus would tread the categorical outcome as a dichotomous outcome. I�m not sure I understand your original msg from June 26 on this point.

2. I don�t understand why the pdf.normal argument needs to be divided by a SD. Could you provide a citation or further guidance ?

In a garden-variety HLM, each Level-2 (BETWEEN) unit is assumed to have a different intercept and the output from typical HLM software provides the overall mean BETWEEN intercept. The EB residuals can be used to get the BETWEEN units� actual intercepts.

In the Mplus multilevel model for ordered categorical variables does Mplus assume all BETWEEN level units have the same Tau�s but different underlying (latent) means ? If so, shouldn�t the �latent mean� be included in the p(CAT=1|X,Z)=cdf.normal (�) calculations ?

Perhaps the difficulty comes in because I use the work-around:

%BETWEEN%
CONT on Z;

I by CAT@1;
CAT@0;
I ON Z;

Thanks very much.

bmuthen posted on Sunday, July 03, 2005 - 5:56 pm

1. Because you were using the normal distribution function ("cdf normal"), I assumed you were using WLSMV since that is where probit is used (I overlooked that you had twolevel modeling). With ML, the logistic function is used so the probabilities have to be computed using that function instead.

2. You can read about related matters in Technical Appendix 1 on our web site - see the "latent response variable formulation" - although that is for single-level modeling. 2-level logit is a bit more complex since you have logit link combined with normally distributed coefficients (varying across the level-2 units). Say that you have a two-level logistic regression with a random intercept. This can be written in terms of a continuous latent response variable u* as

(1) u*_ij = alpha_j + beta*x_ij + e_ij

where j is the level-2 subscript and e is a residual with a logistic density and so has variance V(e) = pi^2/3 (see e.g. Maddala's book). For simplicity, no second-level predictor of alpha_j is included. Now, the intercept alpha has a mean, say a, and a variance, say v. In Mplus, a threshold (or several thresholds if a polytomous outcome) is estimated instead of the intercept, where the threshold is the negative of the intercept. So when Mplus prints out the threshold that should be taken as -a. We are interested in the probability of u and therefore have to relate u* to u. This is done by postulating that u=1 when u* GT 0, or equivalently (in Mplus style) when

(2) beta*x_ij + e_ij

exceeds the threshold (-a). To compute this we need the mean and the SD of u* given x. The mean given x is beta*x and the SD given x is sqrt(v+V(e)).

Now we have to revisit single-level logistic regression,

u*_i = alpha + beta*x_i + e_i

so that

(3) P(u* | x GT 0) = P(e LT (alpha + beta*x))

which because e has a logistic density is

logit = 1/(1+exp(-alpha-beta*x))

I hope I got that right. The last step is because e is logistic and therefore has variance pi^2/2. But in (1), if we don't condition on alpha_j but only on x, we have further variance in u* given x due to the random intercept. I think this leads to the need to integrate numerically to get the probability you want by considering the integral over alpha_j of the expression

P(u* GT 0 | alpha_j) * [alpha_j]

where the first part is a logistic function and the second part is the normal density for the random intercept. There must be literature on this in the first publications on 2-level logistic regression with a random intercept. Do we have anyone who can point us to that?

Anonymous posted on Tuesday, July 05, 2005 - 8:46 am

Thanks very much for your detailed response.

I now understand your comments on WLSMV vs. ML -- I was not aware that Mplus automatically used the Logit parameterization when NUMERICAL INTEGRATION is used in MLSEMs with ordinal categorical outcome variables (at L-1 and L-2).

It occurs to me after reading the second portion of your response that there may be one or two recent pieces in JEBS which address these matters. I'll be away from my office for a few days, but will be able to look into this further when I return.

bmuthen posted on Tuesday, July 05, 2005 - 5:03 pm

Yes, JEBS sounds like a likely outlet. Please let me know if you find something relevant there.

Anonymous posted on Tuesday, July 19, 2005 - 3:29 pm

I am back in the office and responding to your post of July 3. Below, I provide a handful of references which may useful to Mplus users interested in the issues alluded to in the latter portion of this thread. As you suggest, there is a rich literature on multilevel models for nominal outcomes, but as best as I can determine at present, little on using the estimated coefficients from these models to verify the model (or predict the outcome variable values for future cases using parameter estimates).

You may recall my interest is in fitting an MLSEM which features a categorical and an ordinal variable (CAT and CONT, respectively), both of which have WITHIN and BETWEEN variance components. I then wish to use the Mplus-generated coefficients and my original covariates / data to determine the extent to which my Mplus model replicates my original data. Afshartous and de Leeuw (2005) suggests that this is a worthwhile exercise (although my application differs slightly from theirs).

Mindful of your comments from July 3 as well as the references cited below, the remaining questions I have regarding the Mplus parameterization of my model and associated output are as follows:

1. I still do not understand how the Mplus model allows for Level-2 unit-specific effects for the model. According to the Rabe-Hesketh and Skrondal pieces, each Level-2 unit in such models should have unique Level-2 intercepts, but all Level-2 units share the same fixed thresholds (Tau) parameters regardless of the link function / parameterization used (logit / probit). It appears that these Level-2 unit unique effects are included in the standard calculations to find p(CAT=1 | X, Z), etc. (where X are a series of Level-1, and Z are a series of Level-2 variables) via a CDF.NORMAL or CDF.LOGISTIC distribution, depending on the link function.

My understanding of Rabe-Hesketh and Skrondal, and Hedeker and Gibbons is that provided Mplus can generate the EB residuals for the intercepts Beta-0j, such calculations can be readily performed after the user obtains each Level-2 units� specific Beta-0j (where j indexes Level-2 units in the sample, and the Beta-0j are assumed to be drawn from a common normal distribution as in a conventional HLM). Your comments from July 3 seem to imply that there is no intercept in the Mplus MLLOGISTIC model. If this is the case, I fail to see where the Level-2, unit-specific contribution to the Level-1 CAT probabilities enters into the Mplus parameterization.

The only other way I can think of to include Level-2 unit variation in the calculation of the CAT probabilities is to assume each Level-2 unit has a different mean in the CDF. LOGISTIC calculations (but the same Tau�s, X*Beta, and CDF scales); or that each Level-2 unit has its own Tau�s; neither of which I�ve seen discussed anywhere (or appear to make sense). Thus it seems more likely that each Level-1 (i.e., WITHIN) unit has a Level-2 (i.e., BETWEEN) intercept (but not a Level-1 intercept since, as you note, threshold parameters are used); and the Level-2 intercept must be included in generating the probabilities in �verification� models of the sort I�m interested in.

2. My MLSEM features two sets of variables with BETWEEN and WITHIN sources of variation; yet Mplus 3.12 only provides one set of factor scores (EB residuals). My sense is these are the factor scores pertaining to the (latent) variable I. Note that the above �I parameterization� is the current Mplus-recommended workaround for including categorical variables at BETWEEN and WITHIN in a MLSEM (i.e., I by CAT@1; CAT@0, I on Z). Can I be sure that the factor scores pertain to the categorical outcome ? How does one obtain the factor scores / EB residuals for the continuous outcome (CONT) also included in the above model ? . The latter are important for verifying the quality of the model for both outcomes.

3. You noted on July 3 that whenever numerical integration is used in Mplus, the link is logistic, a point I missed initially. Thus in trying to replicate my own data I use CDF.LOGISTIC with the CDF mean set to zero and scale set to 1 (corresponding to a variance of (pi^2)/3). After experimenting with various interpretations of the Mplus-provided coefficients for the CAT outcome portion of the model, the best I do is the following (rounded %�s):

CAT category �True coding� Mplus Predicted Value
1....................38%...................52%
2....................36......................40
3....................21.......................8

The values obtained in the third column are via probability calculations of the general form CDF.LOGISTIC (Tau - Beta-0j - X*Beta). Despite various attempts to reconstruct my CAT variable using the Mplus output, my estimates still appear to be quite a bit off. Does this degree of mismatch appear reasonable ?.

Provided that my calculations are correct, the only other source of error I can think of is that the continuous variable, CONT, is actually Poisson-distributed, something I do not take account of in my MLSEM. Could this small omission be affecting my results so much ?

4. Its possible I�m misunderstanding your last point from July 3rd regarding �integrating over alpha_j� to get p(u* > 0 | X); but since my interest is in replicating my data rather than inference, and given that the WITHIN and BETWEEN level errors are independent, can�t one simply use the estimated BETWEEN intercepts Beta-0j obtained from the EB residuals to verify the MLSEM ? Skrondal and Rabe-Hesketh (2003) perform an integration of the type you appear to be mentioning in one of their examples, but in the service of obtaining a population average. Afshartous and de Leeuw (2005) make predictive inferences using garden-variety HLMs (continuous outcomes) without performing any such integration.

If numerical integration has to be used to average over alpha_j, it would seem that multilevel SEMs with ordinal outcome variables cannot be readily used to make predictive inferences. Is this your sense ?

Thank you.

REFERENCES; MATERIAL OF POTENTIAL INTEREST

Afshartous, David and Jan de Leeuw. 2005. �Prediction in Multilevel Models� Journal of
Educational and Behavioral Statistics 30: 109-140.

Gibbons, Robert D. and Donald Hedeker. 1997. �Random Effects Probit and Logistic
Regression Models for Three-Level Data� Biometrics 53: 1527-1537.

Gibbons, Robert D., Donald Hedeker, Sara C. Charles, and Paul Frisch. 1994. �A Random-
Effects Probit Model for Predicting Medical Malpractice Claims� Journal of the
American Statistical Association 89: 760-767.

Rabe-Hesketh, S. and Skrondal, A. 2001. �Parameterization of Multivariate Random Effects
Models for Categorical Data� Biometrics 57: 1256-1264.

Skrondal, Anders and Sophia Rabe-Hesketh. 2003. �Some Applications of Generalized Linear
Latent and Mixed Models in Epidemiology� Norsk Epidemiologi 13: 265-278.

Hedeker, Donald, and Robert D. Gibbons. 1994. �A Random-Effects Ordinal Regression Model
for Multilevel Analysis� Biometrics 50: 933-944.

Wong George Y. and William M. Mason. 1985. �The Hierarchical Logistic Regression Model
for Multilevel Analysis� Journal of the American Statistical Association 80:
513-524.

Also of potential interest (refer also to Skrondal and Rabe-Hesketh, 2003):

Agresti, Booth, Hobert, and Caffo. �Random Effects Modeling of Categorical Data�.
Sociological Methodology. 2000. 27-80 (Mark P. Becker, Ed.).

Wong George Y. and William M. Mason. 1991. �Contextually Specific Effects and Other
Generalizations of the Hierarchical Linear Model for Comparative Analysis� Journal
of the American Statistical Association 86: 487-503.

Rabe-Hesketh and Skrondal have also recently authored a Chapman-Hall text out on Multilevel models that may be worth a look (although I have not had a chance to look at it myself).

bmuthen posted on Wednesday, July 20, 2005 - 9:45 am

Let me first answer your point 1. Say that you have 2-level logistic regression with a random intercept and a random slope just like in UG ex 9.2. Say that the dependent variable is ordered polytomous. This model then has a set of thresholds to be estimated (which are not varying across level-2 units) and there is a random intercept which is taken to be normal with mean zero (the mean cannot be identified separately from the set of thresholds) and variance to be estimated. So with logit (unlike the fixed-effects-only probit of Mplus) you can have an intercept that varies across level-2 units. You can compute estimated scores on this intercept "factor" by requesting factor scores. Same for the random slope.

bmuthen posted on Wednesday, July 20, 2005 - 6:48 pm

Here is more on point 1., also responding to point 2. The most straightforward Mplus setup I think would be:

%WITHIN%
CONT on X;
CAT on X;

%BETWEEN%
CONT on Z;
CAT ON Z;

ML estimation uses a logit link for CAT (assumed ordered polytomous) in line with my answer above (July 20, 09:45). This modeling gives a random intercept for CONT and a random intercept for CAT (both of which are modeled on the Between level, i.e. on level 2). Estimates of the level-2 values of these random intercepts are obtained when requesting factor scores.

From this model one can compute estimates of the marginal probability

(1) P(CAT | x, z)

or the conditional probability

(2) P(CAT | x, z, a_j),

where a_j is the estimated random intercept varying across the level-2 units j.

Because CAT is categorical, has a logit link and has a normally distributed intercept, (1) has to be obtained by numerical integration. This is quite feasible, although not included in Mplus. In contrast, (2) conditions on the intercept estimate for a specific level-2 unit j and can therefore be computed using a standard logistic regression expression. So the question is which probability is most useful for your purposes. Hope this is of some help. I will also take a look at the JEBS article you mention since I haven't read this.

bmuthen posted on Monday, July 25, 2005 - 6:12 pm

I realize that the current Mplus version does not *directly* give between-level factor scores for random intercepts when integration is required (as with categorical outcomes). But the workaround that was suggested above will give it:

%Between%
cont on z;
i by cat@1;
cat@0;
i on z;

So here i is a factor and it is the random intercept for which you get scores for each between unit when requesting factor score computations. Now if you want only the residual of i in the regression on z, then you have to subtract the beta*z term for each between unit outside Mplus.

Mplus just created a little Excel routine to compute probabilities for cat in these models - that is P(cat | x, z). It involves numerical integration but is perfectly doable.

anonymous posted on Wednesday, November 02, 2005 - 6:32 am

Hello Linda, hello Bengt,

I would like to detect multivariate outliers within non-independent data. Is there a way to compute Mahalanobis distances with Mplus when using "Type=Complex" for clustered data?

Many thanks for the support here in the Mplus discussion, it is invaluable!!

Linda K. Muthen posted on Wednesday, November 02, 2005 - 6:52 am

Several outlier detection measures including Mahalanobis distance will be available in Version 4.

Scott R. Colwell posted on Wednesday, March 15, 2006 - 8:10 am

Regarding detection measures for outliers in version 4. When you use:

PLOT: TYPE IS PLOT3;
PLOT: OUTLIERS ARE MAHALANOBIS;

The graphs available that I get under the "V" button are:

Histograms (sample values, outliers, estimated values)

Scatterplots (sample values, outliers, estimated values)

When I view these graphs, are they only showing me the outliers? I wonder this since it only shows me a portion of the total sample.

Linda K. Muthen posted on Wednesday, March 15, 2006 - 9:14 am

When you ask for histograms or scatterplots, the outliers are available as variable choices. When you choose these variables, you should see the entire sample. If you do not, you should send your input, data, output, and license number to support@statmodel.com. Note that it is not necessary to hae PLOT twice.

Liu Xiao posted on Monday, April 23, 2007 - 5:10 am

Hi, Dr. Muthen,I am doing a hierarchical model.
analysis: type=twolevel random missing;
model:
%within%
s | y on x;
%between%
s on z;
y on z;
because type=random is not allowed for STAND in output.Is it possible to get the standardized value of s and y's intercept? Thank you very much.

Linda K. Muthen posted on Monday, April 23, 2007 - 8:03 am

Standardized parameters are not given in this situation because the variance of y varies for each value of x. This makes it unclear what variance to use for standardization. This is a research topic.

c parker posted on Friday, July 13, 2007 - 11:40 am

I have fit a multilevel model, with individuals nested within neighborhoods. At level 1 I have a predictor of family type(6 categories, so I am working with 5 dummy variables at level 1). At level 2 I have a predictor of neighborhood type (3 categories, so I am working with 2 dummy variables at level 2). I have performed analyses using the raw metric, grand mean centering, and group mean centering, and am looking at a random intercept model. I have run several analyses, and have changed the reference variables to verify that my results are consistent (e.g., a. using family type 1, neighborhood type 3 as the reference groups (i.e., the omitted dummy codes) in one analysis, b. using family type 2, neighborhood type 3 as the reference groups in another analysis.) It is my understanding that the coefficient associated with family type 2 from analysis a. should be equal but of the opposite sign as the coefficient associated with family type 1 from analysis b. I've obtained parameter estimates consistent with this when using group mean centered dummy variables and dummy variables in their raw metric. However, when using grand mean centered dummy variables, I am not obtaining parameter estimates that are consistent with this. Since grand mean centering is simply a rescaling of the variables, shouldn't I obtain results that follow the equal but of opposite sign situation described above? Any feedback would be greatly appreciated.

Linda K. Muthen posted on Monday, July 16, 2007 - 9:40 am

I am not familiar with centering dummy variables because the numbers represent categories.

Joy Oliver posted on Monday, August 13, 2007 - 12:40 pm

Hi, I am very new to MPlus, and I am getting an error statement from what I thought was a relatively simple multilevel regression. It is posted below. Could you help me understand what this means and how I can fix it? I have increased the iterations and nothing seems to work.
Thanks!

THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. PROBLEM INVOLVING VARIABLE VI [Within-Level Y]

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE H1 MODEL ESTIMATION DID NOT CONVERGE. SAMPLE STATISTICS COULD NOT BE COMPUTED. INCREASE THE NUMBER OF H1ITERATIONS.

Linda K. Muthen posted on Tuesday, August 14, 2007 - 2:03 pm

Please send your input, data, output, and license number to support@statmodel.com.

Kajsa Yang-Hansen posted on Friday, February 01, 2008 - 5:49 pm

Dear Bengt and Linda,
I plan to do some multilevel analysis with IEA PIRLS data. International Association for the Evaluation of Educational Achievement (IEA) conducted a Progress in International Reading Literacy Study (PIRLS) to examine the trend of reading achievement of 9/10-year-olds around the world. Data collection is done every 5 years and these data are obtained by stratified two-stage cluster sampling scheme. The first PIRLS is in 2001 and now we have PIRLS 2006 data available. 28 countries took part in both studies. The two studies used the same questionnaire instruments. So we can say that at country level it is a longitudinal design, but at the lower-levels (student-, classroom- and school-levels) are not (we have same variables but different individuals).

Having said the background, now my first question is:

I want to look at the effects of changes of some reading related variables, such as SES, early home reading activates, school resources, etc., on the changes of reading achievement (with change I mean that the differences in scores of the reading related variables and reading achievement between 2001 and 2006), is it possible to make any multilevel analysis of change in Mplus?
(continue in the next post)

Kajsa Yang-Hansen posted on Friday, February 01, 2008 - 5:51 pm

If not, my second question:
I want to do a multilevel analysis with the reading related variables at students, teacher and school levels to predict reading achievement, and I want to compare the 28 countries simultaneously in the model. Should I create a set of dummy variables, one for each country, and bring them into the model or are there other solutions in Mplus?

I am looking forward to your suggestions. Thanks in advance.

Best regards,
Kajsa.

Bengt O. Muthen posted on Friday, February 01, 2008 - 6:31 pm

I don't think you can do much of a change analysis here. If you aggregate your variables to the country level - i.e. use country-level variables - you are right that you have longitudinal data for the 28 countries, but only 2 timepoints. That only affords say a random intercept, fixed slope model.

As for the second question, I would use 27 dummy variables for country.

Xiaorui posted on Friday, April 17, 2009 - 9:00 am

I am using multilevel regression analysis, the model running normally except a warning. Can i fix it and can this results be used?

WARNING: THE MLR STANDARD ERRORS COULD NOT BE COMPUTED. THE MLF STANDARD ERRORS WERE COMPUTED INSTEAD. THE MLR CONDITION NUMBER IS -0.128D-03.
PROBLEM INVOLVING PARAMETER 35 {which is psi (c1r3, between-level)}. THIS MAY BE DUE TO NEAR SINGULARITY OF THE RANDOM EFFECT VARIANCE/COVARIANCE OR INCOMPLETE CONVERGENCE.

The syntax are:
ANALYSIS:
LOGCRITERION=.002;
type=twolevel;
SDITERATIONS=200;
H1iterations=10000;
MITERATIONS=50000;

MODEL:
%within%
C1R3 on ms g;
C2R3 ON ms g C1R3;
C4R3 ON ms g C2R3;
C5R3 ON ms g C4R3;
C6R3 ON ms g C5R3;

%between%
C1R3 on MS ZMC1;
C2R3 ON ms ZMC2;
C4R3 ON ms ZMC4;
C5R3 ON ms ZMC5;
C6R3 ON ms ZMC6;

Thank you very much.

Xiaorui

Bengt O. Muthen posted on Friday, April 17, 2009 - 1:38 pm

The results are probably ok here, but if you want to make sure you need to send your input, output, data and license number to support@statmodel.com.

Calvin D. Croy posted on Wednesday, May 06, 2009 - 11:46 am

We surveyed people with stratified random sampling, where the strata were combinations of gender and age. Then we constructed variables from census data that show the proportion of each blockgroup population that has different characteristics. We appended the census blockgroup proportions to the survey participants by matching on blockgroup (var BlockgID). For some blockgroups we have just one observation; for others we have many.

I want to fit a two-level random intercept random slope model. I want to see how individual characteristics (e.g. age)influence a continuous dependent variable (level 1), and I also want to see how the level 1 intercept and slopes depend on the census proportions (the level 2 models).

Our obs are correlated because some occur in the same blockgroup. However, cluster sampling was not used -- our sampling made no use of blockgroups.

1. Do I need to indicate that the obs are grouped within blockgroups by specifying Cluster = BlockgID?

2. Do I need to somehow indicate that the Level 2 proportion variables represent blockgroups?

3. Or can I just skip mentioning anything about blockgroups and specify the stratification and weight variables?

4. Should I specify Type = Complex or Type = twolevel random? If it makes a difference, why is one of these types preferred for this analysis?

Thanks so very much for your assistance!

Linda K. Muthen posted on Thursday, May 07, 2009 - 10:28 am

Even though your sample consists of individuals from different blockgroups, if they were not actually sampled from blockgroups, the CLUSTER option is not appropriate. It sounds like you would need to use TYPE=COMPLEX with the STRATIFICATION and WEIGHT options.

Calvin D. Croy posted on Friday, August 07, 2009 - 4:05 pm

Linda, thanks for your reply. I hope my May 6 post above showed that we want to examine random effects representing variation in the intercepts and coefficients across the block groups, even though we did not sample using the block groups.

1. If I just specify type = complex rather than type = twolevel random, how do I tell Mplus that the grouping variable for the random effects is the blockgroup ID variable (BlockgID) without listing it in the cluster statement (which you said I don't need)? I couldn't find any examples in Chpt 9 of the user's manual about how to run this type of analysis using type = complex. All the multilevel analyses shown there (that I could understand) used type = twolevel.

2. Please consider the following multilevel model where depression and drink are binary variables:

Level 1: depression = b1*Drink
Level 2: b1 = b2*(Pctrenters in blockgrp) + u

With type = twolevel random I'd list Drink on the WITHIN statement and Pctrenters on the BETWEEN statement. How do I do that using just type = Complex as you've said I should do?

3. Do I need to do anything special so that with FIML no observations will be omitted because of missing values for depression, Drink, or Pctrenters?

Thanks for your help!

Calvin D. Croy posted on Friday, August 07, 2009 - 4:15 pm

P.S. Could you show me the syntax I should use in my input file for the simple two level model in my above post that predicts the occurrence of depression?

Thanks again so much. Much appreciated!

Linda K. Muthen posted on Friday, August 07, 2009 - 6:13 pm

If you want to estimate random intercepts and random slopes, you can use TYPE=TWOLEVEL with blockgroups as a cluster variable. See Example 9.2.

Calvin D. Croy posted on Monday, August 10, 2009 - 9:02 am

Thanks for the clarification, Linda. I was just confused by your earlier reply saying to avoid specify a cluster variable.

Could you also answer my question #3 above? It appears that if I just wanted to run a one-level logisitic regression under FIML, to avoid having observations omitted because of missing values on the predictors, I would have to bring the predictors into the model by making assumptions about their distributions (e.g. normality) by mentioning their variances in the model statement.

Will I have to do this "bringing the predictors into the model" in my twolevel random intercepts and random slopes logistic regression model?

If yes, what syntax should I use in order to tell Mplus that the dichotomous predictors should follow a binomial distribution and not a normal one?

Thanks again for your help!

Linda K. Muthen posted on Monday, August 10, 2009 - 10:05 am

In all models except for all continuous outcomes and TYPE=GENERAL, if you don't want observations with missing values on covariates eliminated, you need to mention their variances in the MODEL command.

Calvin D. Croy posted on Monday, August 10, 2009 - 12:03 pm

Linda, thanks for the confirmation.

I'm sorry, but I need some more hand holding. It would appear that if I only mentioned the variances of the dichotomous predictors in the Model command that Mplus would assume they're normally distributed. Right?

Rather than assuming they're normally distribjuted, it appears I need to tell Mplus that the dichotomous predictors follow a Bernoulli distribution with variance = p(1-p)? How do I specify the value of p (the proportion of successes) or their actual variance p*(1-p) in my Mplus syntax?

I know in multiple imputation that dichotomous variables are sometimes treated as normally distributed, but if I don't want to do that, what syntax should I use in my random intercepts and random slopes model so that observations with missing values for dichotomous predictors (e.g. gender) aren't omitted?

I appreciate your continuing guidance on this issue!

Linda K. Muthen posted on Tuesday, August 11, 2009 - 10:57 am

In regression, all covariates are treated as continuous whether they are binary or continuous. You should not specify anything about their scale. You have two choices with covariates that have missing data. You can estimate the model conditioned on the covariates and all observations with missing on one or more covariates will be elimiated from the analysis or you can bring them into the model and make distribtuional assumptions about them.

Calvin D. Croy posted on Wednesday, August 12, 2009 - 2:21 pm

Linda, I need help understanding your answer. Does "and make distributional assumptions about them" mean binary predictors must be treated as normally distributed in an Mplus FIML logistic reg to avoid listwise deletion?

Did Bengt suggest below that normality was just one possible solution under FIML for dichotomous predictors? Or is normality for all vars assumed with FIML because it's the basis of the likelihood function for each observation? Could you or Bengt please clarify?

How egregious is it to avoid omitting obs in logistic regression by assuming normality for dichotomous predictors? Would reviewers say we made an absurd distribution assumption in our multilevel model?

As Bengt alluded, people often impute binary variables via Markov chain Monte Carlo though it assumes multivariate normal data. Imputation with regression switching (chained equations, MICE, ICE) was invented to address the normality assumption. Thus maybe some people wouldn't accept treating binary covariates as normally distributed. Your thoughts?

Thanks for your attention!

---------------
Bengt�s 1-17-06 partial reply to a previous post of mine:

Missingness in covariates can be handled by adding to the original logistic regression model an assumption of (for example) normality for the covariates. Imputation techniques often use this assumption as a proxy even when some covariates are dichotomous.

Bengt O. Muthen posted on Wednesday, August 12, 2009 - 4:42 pm

Linda and I are saying the same thing here - by bringing the covariates into the model mentioning their means or variances, Mplus treats the covariates as continuous-normal.

There are imputation programs such as Schafer's that acknowledge that not all variables to be imputed are continuous-normal but that some variables are categorical. I don't know how wrong you will be ignoring the categorical nature or how sensitive reviewers are to ignoring it. Perhaps others have experience?

Related to this, you don't want to put categorical covariates on the CATEGORICAL= list because then you change the model, no longer conditioning on the covariates.

Calvin D. Croy posted on Thursday, August 13, 2009 - 11:40 am

Bengt, thank you very much for the explanation.

When I attempted to run my multilevel logistic regression, I got this error message:

"Clusters are not nested within strata. Each stratum must contain unique cluster IDs. Cluster ID 1 appears in more than one stratum."

In this analysis I listed the variable STR on the Stratification = statement, SITEWT on the Weight= statment, and BGID on the Cluster = statment (based on Linda's Aug 7 6:13PM reply above). The values of STR identify strata formed from combinations of gender and age ranges. The values of BGID identify blockgroups. Our data were collected by interviewing people using stratified sampling within the gender x age strata that ignored their blockgroup membership, and now we want to examine variation in the intercepts and slopes across census blockgroups.

How can I do this in Mplus?

Bengt O. Muthen posted on Thursday, August 13, 2009 - 5:22 pm

Typically, cluster units are different in different strata with strata being different geographical regions. Your application seems different because the same cluster unit appears in several different strata. As a simplified approach, perhaps you should use cluster=bgid and let gender x age groups be handled by covariates.

Calvin D. Croy posted on Monday, August 17, 2009 - 9:15 am

Bengt, the approach you sugest of including the gender x age groups as covariates is exactly the same suggestion we got a while back from the support service for MLwiN, who said that MLwiN could not allow for stratification variables. Thus we thought we'd try Mplus for our multilevel modeling.

Since you've suggested including the gender x age groups as covariates "as a simplified approach", is there an "unsimplified" approach in Mplus that will take our stratified sampling into account so that the standard errors in our multilevel model will be as small as possible?

The principal investigator for this project, who has considerable experience with MLwiN, thought that Mplus might provide a superior analysis.

Thanks again for your help. It is truly appreciated!

Tihomir Asparouhov posted on Monday, August 17, 2009 - 11:02 am

There does not seem to be an established multilevel method for handling non-nested structures like these (where the sampling and the modeling are not nested) and one would have to choose among these two slightly disadvantaged alternatives.

Alternative 1. Split each block into gender.age subblock and have different random effect for each subblock. This you do by adding the command in the define section (assuming BGID<10000)
BGID = BGID+10000*STR.
This is actually a more general multilevel model then the one you want because it has individual random effects for each subblock (the model you want is the one where each block has one random effect - which is the same as highly correlated subblock random effects).

Alternative 2. Have one random effect for each block. Since Mplus does not support non-nested structures you have no choice but to drop the strata variable and make it a covariate (and possibly add interactions between the strata and other covariates where you think there is strata advantage).

Tihomir Asparouhov posted on Monday, August 17, 2009 - 11:02 am

One has to keep track of what the stratification really delivers - very often it's not as helpful as one would think. So keep track of the design effect (and may be run some single level models first to evaluate the design effect). Careful considerations should help you make the right choice. If you don't have a sizable design effect you can pursue Alternative 2 without adding the strata variable as a covariate and still have a simple model. The disadvantage of Alternative 1 is that you get more but smaller clusters that may have bigger measurement error for the random effect which will counteract on the gain from the stratification.

Calvin D. Croy posted on Monday, August 17, 2009 - 4:10 pm

Thank you for your ideas. They give me a lot to think about!

Susan M. posted on Friday, April 23, 2010 - 8:38 am

I have a very general question:
I have been using SAS for a HLM model that has repeated measures for patients nested within doctors nested within clinics. This nested model can�t run due to memory constraints (using Proc MIXED or NLMIXED) The newer SAS proc (HPMIXED)will allow continous outcome models to run because it uses sparse matrix techniques. However, the investigator really prefers nonlinear outcomes.
I was curious if MPLUS might be using estimation methods that would enable these particular nested nonlinear models to run, but now that I am reading more in the manual I am concerned that I will encounter similar problems. At least my current thinking is that I will be most similar to examples that have the notation below.
�* Example uses numerical integration in the estimation of the model.This can be computationally demanding depending on the size of the problem�
In addition, as I was playing with a very simple model just using clinic nestings, it appeared I would need to create dummy variables for the 20 clinics. That is certainly fine and doable, but I have hundreds of doctors and many thousands of patients�(which wouldn�t be doable). Am I missing something in my understanding?
I don�t expect a detailed answer so much as I am seeking insight as to whether other folks have successfully created large population nested models with categorical outcomes in MPLUS. Thank you !

Linda K. Muthen posted on Sunday, April 25, 2010 - 11:36 am

I'm not sure whether there would be problems in Mplus. If you send an Mplus input and your data to support@statmodel.com, I can try it and see.

With categorical outcomes and maximum likelihood, numerical integration is needed only of the model includes latent variables with categorical factor indicators.

Mathias H�glund posted on Saturday, May 01, 2010 - 6:59 am

Hello mplus team. I've been using your software for 6 months and it's really great.

I have a question about multilevel modeling. I've seen more and more HLM models in the management literature lately. In some cases researchers tend to regress unit level variables on individual level variables directly instead of separating the variance of the individual variable to within and between. I'm a bit surprised since in my mind this procedure could bias the relationship by either over or underestimating the effect. Am I mistaken thinking that a group level variable should only influence the intercept or slope of the group rather than the variance of individuals?

Linda K. Muthen posted on Sunday, May 02, 2010 - 8:50 am

In Mplus a between-level variable cannot be regressed directly on a within-level variable. See Examples 9.1 and 9.2 of the user's guide for further information about how this is handled in Mplus.

Mathias H�glund posted on Sunday, May 02, 2010 - 11:11 am

Thank you Linda for your quick answer. Sorry for not being more specific. What I'm trying to get my head around is whether the results would be biased if I would instead run the analysis at one level as type = general where I regress a variable measured at the between level on within level observations?

Mathias H�glund posted on Sunday, May 02, 2010 - 12:49 pm

Sorry about the numerous messages. However, I'm trying to figure out when to use group and grand mean centering. I read an article that grand mean centering can sometimes lead to biased results in a between level mediation analysis where a variable measured at the group level is mediated by a variable measured at the individual level on an outcome also measured at the individual level. However, I could not apply group mean centering on the mediator because it seems group mean centering can only be applied to X variables.

Linda K. Muthen posted on Sunday, May 02, 2010 - 1:30 pm

I think that the point estimate will be correct but that there will be a distortion of the standard errors. You can see if this is true by generating data where y is a between variable and x is both a between and a within variable using TYPE=TWOLEVEL, for example,

model population:
%within%
[x@0]; x@1;
%between%
y on x*1 ;
y*.5;

and save it. See mcex9.1.inp and Example 12.6 in the user's guide.

In a second step you can do an external Monte Carlo where you analyze the data as TYPE=GENERAL.

Mathias H�glund posted on Tuesday, May 11, 2010 - 10:54 am

Thank you Linda very much! It works as you suggested. The standard errors differ between the models.

Further, I noticed that when both X and Y variables are modeled on within and between the relationship between the two can be significant within and not significant on between or vice versa.

Rob Dvorak posted on Thursday, October 21, 2010 - 10:15 am

Hi,

I have a general analysis question. I'm estimating a two-level model, with an interaction between 2 of the level 1 slopes. The output indicates a random variance component for the interaction, and I am wondering if allowing the interaction slope to vary is ok. I'm wondering if there is some sort of dependency since it's an interaction, or if it's simply handled as if it were any other model predictor. Similarly, what if I added a quadratic slope? If this had a random variance component would I allow it to vary even though the slope of it was built from is in the model?

Thanks,
Rob

Linda K. Muthen posted on Thursday, October 21, 2010 - 2:32 pm

It sounds to me like you have a model with three random slopes: one for one level 1 variable, a second for another level 1 variable, and a third for the interaction between the first two variables. These should be correlated on level 2 as should a quadratic growth factor if you have one.

Anne Chan posted on Sunday, December 12, 2010 - 4:31 pm

Hello! I run a two-level regression, including "ANALYSIS: TYPE = TWOLEVEL RANDOM MISSING;" in my syntax.

There are missing data in my level 2 data and I found that MPLUS did not include the cases with missing data in the analyses. (There was a warning statement in the output: Data set contains cases with missing on x-variables. These cases were not included in the analysis.)

May I ask is that MPLUS exclude the cases which missing value at level 2 (when these variables are the predictors in the models)?

Linda K. Muthen posted on Sunday, December 12, 2010 - 5:34 pm

Missing data theory does not apply to observed exogenous variables. The model is estimated conditioned on these variables. You can mention the variances of the observed exogenous variables in the MODEL command and these variables will be treated as dependent variables and distributional assumptions will be made about them but cases with missing on them will not be excluded.

Peggy Clements posted on Sunday, March 13, 2011 - 6:56 am

First of all, I apologize for having asked this same question yesterday (under a different thread).

So -- Linda's post (above) from 12/12 answers the question I posted yesterday about why cases are being excluded from the analysis.

However, I don't understand the practical implications of Linda's answer to Anne Chan.

How would I "mention the variances of the observed exogenous variables in the MODEL command"?

My understanding of Linda's 12/12 response is that if I did this, the cases would not be excluded (am I right about this? that she is saying that by bringing the variances of the observed exogenous variables into the model command, the model will treat these variances as dependent variables and, as a result, the cases will not be excluded?).

Thanks.

Linda K. Muthen posted on Monday, March 14, 2011 - 12:03 pm

You mention the variance by mentioning the variable name in the MODEL command. This is how variances/residual variances are referred to in Mplus. If you do this, the variables will be treated as dependent variables and the observations with missing data on them will not be excluded from the analysis.

Jonny Nesmith posted on Monday, May 30, 2011 - 1:06 am

Hello, I have a question about mixed effects logistic regression with nested random effects. In my data I have individual trees sampled within plots that are nested within prescribed fires. I have been modeling the effects of fire on tree mortality in R using the lme4 package and am trying to replicate the results in Mplus. However, I am getting different results in R and Mplus and think it is due to how I have specified the model.

My Mplus script is:

VARIABLE: CATEGORICAL = Status;
USEVAR = Status4 DBH VolSc;
WITHIN = DBH VolSc;
STRATIFICATION = Site;
CLUSTER = Plot;
ANALYSIS: TYPE = TWOLEVEL COMPLEX;

MODEL:
%WITHIN%
Status4 ON VolSc DBH;
%BETWEEN%

The Mplus output is:
Akaike (AIC) 349.021
Within Level
STATUS4 ON Estimate S.E.
VOLSC -0.047 0.004
DBH 0.027 0.010
Between Level
Thresholds
STATUS4$1 -1.149 0.445
Variances
STATUS4 0.299 0.259

The output I get from R is:
AIC BIC logLik deviance
348.4 368.8 -169.2 338.4
Random effects:
Groups Variance Std.Dev.
PlotID:Site 0.094263 0.30702
Site 0.322407 0.56781
Fixed effects:
Estimate Std. Error
(Intercept) 1.466283 0.467578
DBH 0.024831 0.005512
VolSc -0.048221 0.004925

Thank you for your help!

Bengt O. Muthen posted on Monday, May 30, 2011 - 7:33 am

You could try sharpening the convergence criteria in the programs. In Mplus, you would use mconvergence, say

mconv = 0.00001;

You can tell which program reaches the best solution by comparing their loglikelihood values (high is good).

Jonny Nesmith posted on Monday, May 30, 2011 - 11:32 am

I sharpened the convegence as you suggested but I think there is a more fundemental problem with how I am specifying the model in Mplus that I am not recognizing. I think I am somehow specifying the models differently in Mplus and R because the coefficient standard errors of the fixed effects (DBH, VolSc) are quite different and the variance estimates of the random effects are different as well.

In Mplus when I set
VARIABLE
STRATIFICATION = Site
CLUSTER = Plot

using TYPE = TWOLEVEL COMPLEX

Are both grouping variables being treated as random effects?

I also made sure to set REML=FALSE in the R script as I believe Mplus model estimation is based on ML correct?

Linda K. Muthen posted on Monday, May 30, 2011 - 3:03 pm

No. When there is clustering due to both primary and secondary sampling stages, the standard errors and chi-square test of model fit are computed taking into account the clustering due to the primary sampling stage using TYPE=COMPLEX whereas clustering due to the secondary sampling stage is modeled using TYPE=TWOLEVEL.

Jonny Nesmith posted on Monday, May 30, 2011 - 4:32 pm

OK, I see. I do get the same results between R and Mplus when I just use one grouping variable and TYPE=TWOLEVEL, so my problem definitely is related to how I am specifying the first order grouping variable (Site) in Mplus.

So my question is how can I specify in the model that both grouping variables should be treated as nested random effects?

I am not interested in Site or Plot effects, but I do want to account for the nested structure in the data so that I am accurately estimating the individual level affects of fire damage and tree size on mortality. I know averaging the data at the Plot level is one option, but I don't think this is ideal and would like to avoid doing so if possible.

Thank you both for your prompt responses. Your help has been much appreciated!

Linda K. Muthen posted on Tuesday, May 31, 2011 - 7:12 am

Both cannot be random in the current version of Mplus.

Drew C. Coman posted on Thursday, September 08, 2011 - 7:52 am

Hello Dr. Muthen,

I am currently working on my dissertation which entails an analytic approach involving a 2-2-1 multilevel SEM framework (MSEM). More specifically, I'll be assessing multilevel mediation at level-2. Level-1 involves student outcomes and level-2 involves teacher/classroom predictors (commitment and burnout). My question is whether this type of approach is appropriate with N = 204 students at level-1 and J = 75 teachers at level-2. If so, do you happen to know of any supporting literature.

Additionally, it is also the case where I have inequality of the # of students nested within classrooms. Some classrooms have one student, while others may have 3 to 4. Does this create any biases with estimating parameters or can these biases be rectified by the planned MSEM approach. Thanks so much for the help!

Bengt O. Muthen posted on Thursday, September 08, 2011 - 9:29 am

I think a good way to get more information on this is to do a Monte Carlo simulation study, which can be done in Mplus. See Chapter 12 of the Version 6 UG.

As an alternative to the Monte Carlo approach, there are articles on analytically-determined power by several authors including Raudenbush (Google it), but I am not sure they cover general enough situations to be helpful to you. And they do not cover quality of recovery of parameter estimates, SE estimates, and chi-square test of model fit - which you can get out of a Monte Carlo study.

Jaeeun Russell posted on Monday, October 17, 2011 - 9:11 pm

Hi, I just purchased the M-plus and tried to run two level regression for my dissertation.

Cluster is SCHOOL

How do I write MODEL statement?

%WITHIN%
y on x;
%BETEEN%
???

The example ex9.1a, states as
y on w xm;
And I'm not sure how to get w xm from x.

Thank you so much!

Bengt O. Muthen posted on Monday, October 17, 2011 - 9:35 pm

w is a cluster-level variable; if you don't have one, you ignore this. xm is the cluster-mean of x, which you can create by the cluster_mean(x) option (see the UG). If you don't want that, just say

%Between%
y;

Jaeeun Russell posted on Saturday, October 22, 2011 - 11:27 pm

Thank you for your response!

I'm still struggling the multilevel analysis.

Below is the incomplete statement I tried. These data are nested in 19 different courses. I don't have any independent variables for cluster level. What BETWEEN statement would be and did I miss anything else?

TITLE: MULTI LEVEL;
DATA: FILE IS Motivation_path.csv;
TYPE IS INDIVIDUAL;
VARIABLE: NAMES ARE ID COURSE GRADE Int Tech_eff Pre_Eng Auto_Sup Const Post_Eng Pre_Mot Post_Mot;
USEVARIABLES Grade Tech_eff Auto_Sup Const Pre_Mot Post_Mot Int;
CLUSTER IS COURSE;
ANALYSIS: TYPE=GENERAL TWOLEVEL;
MODEL:
%WITHIN%
Post_Mot ON Int Pre_Mot Tech_eff Auto_Sup Const;
Grade ON Post_Mot;
%BETWEEN%
?????

Thank you so much!

Bengt O. Muthen posted on Sunday, October 23, 2011 - 12:28 pm

%BETWEEN%
Post_Mot;
Grade;
Post_Mot WITH Grade;

For simplicity, you also want to add in the VARIABLE command:

WITHIN = Tech_eff Auto_Sup Const Pre_Mot Int;

Jaeeun Russell posted on Sunday, October 23, 2011 - 3:36 pm

Thank you so much!

Jaeeun Russell posted on Wednesday, October 26, 2011 - 7:37 pm

This time I have a question for Two-level SEM analysis. I don't have independent variables for between level. i just want to test the proposed model in two levels because students nested in different courses. Could you take a look the between statement?

WITHIN = Pre_Mot Tech_Eff Auto_S Const;
CLUSTER IS COURSE;
ANALYSIS: TYPE IS TWOLEVEL;
MODEL:
%WITHIN%
Const BY CON1-CON7;
Pre_Mot BY Pre_M1 Pre_M2;
Post_Mot BY Post_M1 Post_M2;
Auto_S BY AS Cont;
Post_Mot ON Pre_Mot Tech_eff Auto_S Const;
Grade ON Post_Mot;
%BETWEEN%
Post_M;
Grade;
Post_M WITH Grade;

Thank you so much!

Linda K. Muthen posted on Thursday, October 27, 2011 - 10:44 am

It seems post_m should be on the BETWEEN list.

The best way to know if an input is correct is to run it and see if you get what you want.

Jaeeun Russell posted on Monday, October 31, 2011 - 10:35 pm

I'm sorry to ask the same question again. This is my first time to try SEM analysis so I'm a beginner.

I have done one level (student level) SEM just fine. But because students are nested in different courses, I like to consider course effect to test the model. I don't have different variables in the between level.

Q1: Is two level SEM in Mplus right method to consider different course effect?

Q2 Then, When I tried below, error showed up saying Within-level variables can not be used on between level. Then what statement should be in Between level?

Could you please help me? I'd greatly appreciate it.

WITHIN = CON1-CON7 Pre_M1 Pre_M2 Post_M1 Post_M2 AS1 AS2 Tech_eff;
CLUSTER IS COURSE;
ANALYSIS: TYPE IS TWOLEVEL;
MODEL:
%WITHIN%
Const BY CON1-CON7;
Pre_Mot BY Pre_M1 Pre_M2;
Post_Mot BY Post_M1 Post_M2;
AS BY AS1 AS2;
Post_Mot ON Pre_Mot Tech_eff AS Const;
Grade ON Post_Mot;

%BETWEEN%
Post_Mot;
Grade;
Post_Mot WITH Grade;

Linda K. Muthen posted on Tuesday, November 01, 2011 - 7:05 am

If you don't put variables on either the WITHIN or BETWEEN list, they can be used at both levels. Please read Example 9.1. These issues are described.

Jan Eichhorn posted on Tuesday, January 10, 2012 - 2:50 am

Hello,

I am running a 2-level multilevel model and I have a latent variable at the aggregate level that I would like to have a cross-level interaction effect with an observed binary variable at the individual level with.

I would be grateful if you could let me know whether this is at all possible to do in MPLUS, (i.e. is there a way of combining the Type=Twolevel with the Type=Random option in some way).

Thank you for your time,
Jan

Linda K. Muthen posted on Tuesday, January 10, 2012 - 10:15 am

See Example 9.2 which illustrates a cross-level interaction.

ElineB posted on Friday, April 20, 2012 - 1:18 am

Hello,
I am trying to to test 3 random slopes in a twolevel, multivariate regression model with 3 dependent (binary) variables. I included WITH statements to estimate the covariances between the Y's on both levels. However, I get the following errors:

*** ERROR in MODEL command
Covariances for categorical, censored, count or nominal variables with
other observed variables are not defined. Problem with the statement:
Y1 WITH Y2
*** ERROR
The following MODEL statements are ignored:
* Statements in the WITHIN level:
Y1 with Y2;
Y2 with Y3;
Y1 with Y3;

In short, this is the syntax I used:
VARIABLE:
NAMES = Sample Y1 Y2 Y3 X1 X2 X3 X4;
CLUSTER = Sample;
USEVAR = Y1 Y2 Y3 X1 X2 X3 X4;
CATEGORICAL = Y1 Y2 Y3;
MISSING = ALL (9999);
WITHIN =X1 X2 X3 X4;

ANALYSIS:
ALGORITHM = INTEGRATION ;
INTEGRATION = 5;
TYPE = TWOLEVEL RANDOM;

MODEL:
%WITHIN%
Y1 on X1 X2 X3 ;
Y2 on X1 X2 X3;
Y3 on X1 X2 X3;
S1 | Y1 on X4;
S2 | Y2 on X4;
S3 | Y3 on X4;
Y1 with Y2;
Y2 with Y3;
Y1 with Y3;

%BETWEEN%
S1 with Y1;
S2 with Y2;
S3 with Y3;
Y1 with Y2;
Y2 with Y3;
Y1 with Y3;

OUTPUT:
SAMPSTAT;

I hope you can help me solve this problem. Thanks in advance!

Linda K. Muthen posted on Friday, April 20, 2012 - 8:12 am

With maximum likelihood and categorical outcomes, you cannot use WITH to specify residual covariances. Each residual covariance requires one dimension of integration. You can use BY to specify them as follows:

f1 BY y1@1 y2;
f1@1; [f1@0];

You will find the residual covariance in the factor loading for y2.

Marian Scholz posted on Wednesday, July 18, 2012 - 3:13 am

Hello,

I am handling a dataset with a hierarchical structure. In more detail: I have measures of different constructs on the day-level (for example states) and oftentimes the same measures as traits on the person level. My sample consists of 73 participants each one assessed on three days. Furthermore I would like to control for the common confounds like age, sex and so on.

So, my problem is that I don�t know how to include more than one level-2 predictor in my analysis. Would be great if you could help me out on this one.

Thanks in advance and a big compliment for keeping up with all these requests!

Linda K. Muthen posted on Wednesday, July 18, 2012 - 11:40 am

In Mplus this would be a single-level model. See Example 6.10 which is a growth model with both time-invariant and time-varying covariates.

Jinni Su posted on Tuesday, August 28, 2012 - 7:55 am

Hi, Dr.Muthen,
I am running a multilevel regression model, in the output file, all S.E. are 0 and all P values are 1. Could you please provide some insight on how/why this happen? here is part of the input:

Analysis:
Type = twolevel ;
h1iteration = 5000;
Model:
%within%
normuse on age gender black hispanic hmong asian drugdisa parinv pardisa peeruse famchaos ;

%between%
normuse on mage mwhite mdruguse mschcon mschprob mdrugdis ;

Thanks,
Jinni

Linda K. Muthen posted on Tuesday, August 28, 2012 - 8:41 am

Please send the output and your license number to support@statmodel.com.

Brian Feinstein posted on Monday, January 27, 2014 - 10:52 am

Hi all,

I'm running a simple linear regression (regressing a continuous variable onto a dichotomous variable), but using Mplus to account for a complex survey design.

How can I get Mplus to tell me the mean/SD for each level of the dichotomous predictor variable?

Thanks!
Brian Feinstein

Linda K. Muthen posted on Monday, January 27, 2014 - 12:04 pm

Use the binary covariate as a GROUPING variable and do a TYPE = COMPLEX BASIC with no MODEL command.

SABA posted on Tuesday, December 15, 2015 - 6:55 am

Hi, I want to do hierarchical regression model in Mplus. From this I do not meant the multilevel (hierarchical) model. I meant controlling for or taking into account the impact of a different set of independent variables on the dependent variable and model.However, I could not find in user guide how to specify a model for that. Can Mplus do that? if yes, then could you please tell me how to specify the model. Thank you

Bengt O. Muthen posted on Tuesday, December 15, 2015 - 2:27 pm

If it is regression or mediation, you simply add the variables as further x's.

Xiaoqiao posted on Wednesday, December 16, 2015 - 10:54 am

What about blocks? Is it possible to obtain indices of fit (e.g., R-square) for specific blocks of predictors, evaluate change in fit after addition of a new block for statistical significance?

Bengt O. Muthen posted on Wednesday, December 16, 2015 - 5:41 pm

No, that's not available.

Olaug Strand posted on Wednesday, March 28, 2018 - 2:48 pm

The output of my stepwise regression model contain correlations matrix as part of the sample statistics. The matrix do not come with the p-values. How can I get this information?

Bengt O. Muthen posted on Wednesday, March 28, 2018 - 3:51 pm

Currently this is obtained using a Model statement that has WITH for all variable pairs; look at the STDYX standardization.

Bharath Shashanka Katkam posted on Friday, March 15, 2019 - 11:37 pm

Dear Ma'am,

I have just gone through an article "When to Use Hierarchical Linear Modeling - Huta (2014)". In that article, the:
1) Within-level Slope is negative, which are "100" in number (i.e., there are 100 Within-level Slopes)
2) Between-level Slope is positive
In the "Multilevel Structural Equation Modeling", there would be:
1) Within-level Structural Model
2) Between-level Structural Model
In the "Between-level Structural Model", there is a "Single Regression Coefficient" (which is positive in nature), between the "Latent Construct (DV)" & "Latent Construct (IV)".
But, in the "Within-level Structural Model", there are "100 Regression Coefficients" (i.e., 100 Within-level Slopes for 100 Individuals, which are negative in nature), between the "Latent Construct (DV)" & "Latent Construct (IV)".
My question is:
Q) How are the "100 Regression Coefficients", at the Within-level, represented by the by a Single value (i.e., as a Single Regression Coefficient), as it is done in the "Between-level Structural Model"?
Do we average all the "100 Within-level Regression Coefficients" & convert it into a Single value?

Bengt O. Muthen posted on Saturday, March 16, 2019 - 4:46 pm

The 100 regression coefficients are values on the Within-level random slope variable. This variable has a mean and a variance estimated on the Between-level.

If you are using Mplus, you might want to study our Short Course Topic 7 video and handout.

Bharath Shashanka Katkam posted on Saturday, March 16, 2019 - 11:03 pm

Thank you, Ma'am. Where do I get that? Could you please provide me the web-link??

Bengt O. Muthen posted on Sunday, March 17, 2019 - 4:23 pm

Our Short Courses are found at

http://www.statmodel.com/course_materials.shtml

Bharath Shashanka Katkam posted on Monday, March 18, 2019 - 1:16 am

Thank you, Ma'am.

Marcus Pietsch posted on Tuesday, March 19, 2019 - 3:21 am

Hello,

for an analysis I used 3-level-data (Students, Classes, Schools) and applied example 9.20 to the data.

The model could be summarized as follows: school variables (level 3) have an impact on individual achievement (level 1) mediated by classroom variables (level 2).

Unfortunately, I am somewhat lost with regard to the interpretation of the slopes s1 and s12.

For example within my data I found, that a level 3 variable is positively associated with slope s12, but has no significant association regarding slope s1.

If I understand example 9.20 correctly, slope s12 moderates the association of a variable at the second level and the intercept of the slope s1.

Further slope s1 moderates the association of a level 1 independent variable and the intercept of a dependent variable.

Is this right?

Can you please provide some guidance with regard to the interpretation of the slopes s1 and s12 and/ or refer to a publication where such a model has been applied.

Thanks a lot!

Bengt O. Muthen posted on Tuesday, March 19, 2019 - 1:51 pm

Send your output to Support along with your license number.

Michael Covell posted on Thursday, August 01, 2019 - 1:15 am

I am regressing a between-level outcome on cluster means. It's a single level regression. I want to know the degrees of freedom. How can I get this? The output provides the number of observations but this includes all the level 1 cases. I need to know the number of cluster means included in the analysis in order to compute df. Any ideas?

Michael Covell posted on Sunday, August 04, 2019 - 12:49 pm

I guess it would be "Number of clusters?"

Bengt O. Muthen posted on Monday, August 05, 2019 - 12:13 pm

That's right.