Mplus Discussion >> Random Variances

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Random Variances

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Christoph Weber posted on Thursday, February 01, 2018 - 2:53 pm

Dear Mplus-Team,
is it possible to specify a random variance in threelevel models?
(random Level 2 variance)
Mplus starts, but doesn't open an Output. if I manually open the Output - it is truncated after the Input commands.
regards christoph

Bengt O. Muthen posted on Thursday, February 01, 2018 - 3:32 pm

We don't have that yet for 3-level - there should have been a stoppage.

Christoph Weber posted on Friday, February 02, 2018 - 12:57 pm

thanks, I also tried to model the variance using model constraints.
But specific constaints are also not allowed with type threelevel.

Do you have any other suggestions for the following problem.

I have threelevel data (student, class, school). I want to regress L2-variance on L3-variables. The outcome is a continuous SES-measure. Thus, the L2-variance is a measure of within school (between class) segregation. Between school differences in the L2-variance should capture school differences in within school segregation, which in turn should be explained by school level variables.

My next idea was to use a two-step approach. First, estimate a null model for ses and save the factor scores.
then I will use the factor scores b2_ses and b3_ses to calculate the between class/within school variance for each school. Finally, I would use standard regression at school level. Is this a reasonable approach, or are there any other (better) possibilities?

and a final question. Do I get it right, that the factor scores in the output (SES, b2_ses, b3_ses) refer to decomposision xijn = xwijn (SES) + xb2jn (b2_ses) + xb3n (b3_ses)?

Thanks

Tihomir Asparouhov posted on Friday, February 02, 2018 - 11:09 pm

The two step approach sounds reasonable. I would recommend instead of running a three level model to run a two-level model with random variance (drop the top level). Get plausible values for the random variance. You can than run the second step using type=imputation and regress these plausible values on predictors. See the plausible value note if you are not familiar with this process. http://statmodel.com/download/Plausible.pdf
Add the top level to the second step, so your second step model will be run either as type=complex or type=two-level with the ML estimator.

The answer to the last question is yes.

Christoph Weber posted on Saturday, February 03, 2018 - 11:44 am

Thanks, I'm not sure if I get this right. The random variance in the 2level-model will capture the variance differences between classes (L2). In the next step I would drop L1, and regress the L2-PVs on L3 predictors. Thus, I will explain why some classes (L2-units) are more SES-homogenous than others. But this way, I can't differentiate whehter SES-homogeneity is due to school-SES-homogeneity or due to within school segregation (i.e heterogenous school, but homogenous classes). Isn't it?
I want to explain, why in some schools the (latent) class means deviate more from the (latent) school mean, than in others schools?

Christoph Weber posted on Tuesday, February 06, 2018 - 3:18 am

Just to back up: Is my Interpretation of the random variance (in the 2level case) correct?

And are PVs generally "better" than ML or IRT factor scores?

Thanks
Christoph

Tihomir Asparouhov posted on Tuesday, February 06, 2018 - 7:42 pm

Maybe you have to estimate the latent class means in the first step and then use these PV in the second step - where you would have to run a two-level model with school specific variance. PV are better than single estimate such as factor scores. You can also just use sample statistics at least for exploratory purposes, such as sample variance within classroom.

Christoph Weber posted on Wednesday, February 07, 2018 - 2:16 pm

Thanks a lot, that helps!
One final question: The outcome has several missing values (25%). Thus, I would have used multilevel MI to deal with the missing data. But after MI I need bayes estimation (to estimate PV), which is not suited for multiple data sets.
So what would you recommend for this problem? Is it possible to use auxiliary variables as missing correlates in the PV estimation step?

Tihomir Asparouhov posted on Wednesday, February 07, 2018 - 3:43 pm

There are two ways to proceed I think.

1. Avoid the MI. Instead just use the specification for missing data
variable:
missing=all(999);
(that way the imputation is internal)

Alternatively
2. In principle you can proceed in the second step with "manual MI". You would run each imputed data set separately and then combine the results manually using the standard imputation formulas.

Christoph Weber posted on Thursday, February 08, 2018 - 1:52 pm

The first Approach seems quite comfortable. If I use an intercept only model for the PV estimation, cases with missing values would be excluded, isn't it? Thus, latent means are based on "listwise" data and might be biased? Or should I use groupmean centered covariates? This should also lead to "unweigthed" class means, isn't it?
Christoph

Tihomir Asparouhov posted on Thursday, February 08, 2018 - 4:19 pm

Yes ... I would not recommend using the intercept only univariate model given the missing data. The variables that could be related should be included (the way you would include these in the imputation model). So a model like that
%within%
Y1-Y10 with Y1-Y10
%between%
Y1-Y10 with Y1-Y10
is better to use even if you just want the PV for Y1_between. I do not see advantages in using groupmean centering, as compared to the above suggestion. Groupmean centering could be detrimental, see http://www.statmodel.com/bmuthen/articles/Article_127.pdf

Christoph Weber posted on Friday, February 09, 2018 - 1:07 pm

Now I get it, thanks a lot!

Christoph Weber posted on Thursday, April 26, 2018 - 12:17 pm

Dear Mplus-Team, a further question emerged. How many PVs would you suggest would be neccesary for the latent class means.
best Christoph

Tihomir Asparouhov posted on Thursday, April 26, 2018 - 4:55 pm

The more you do the better but I would do 10. Make sure these don't come from consecutive iterations. If you are using data imputation use thin=100 in the data imputation command (that is the default actually so you don't need to do anything). If you are using the factor scores, that 100 is not there for you by default - so add thin=100 to the analysis command (or save 1000 factor scores and take draws that are 100 apart).

Christoph Weber posted on Friday, April 27, 2018 - 1:20 pm

I'd like to back up if I get the PV applications right (referring to example 11.7).

SAVEDATA: FILE = ex11.7plaus.dat;
SAVE = FSCORES (20);

1.) If I use Savedata and Fscores descriptives of the PVs (means, median, ...) are saved in ex11.7plaus.dat for each case.

2.) If I want to save each of the 20 PVs per case in 20 files, I have to use
DATA IMPUTATION:
NDATASETS = 20;
SAVE = ex11.7imp*.dat;

3.) When I use the data imputation command, Mplus automatically uses a thinning interval of 100,

4.) ... which does also affect the PV Distribution in ex11.7plaus.dat???

5.) Single PVs can only be saved by using "data imputation".

6.) If I use PVs in a twolevel model for a catogorical (binary) variable X, the PV X_B is the threshold on a probit scale? (If this is correct, what might be the reason when the threshold from the imputation model and the mean threshold based on the PVs differ?)

Thanks
Christoph

Tihomir Asparouhov posted on Friday, April 27, 2018 - 2:40 pm

1-4. Yes

5. No. See page 838 in the user's guide. You can save the plausible values in one file with the FACTORS command, also you can control the thinking there as well SAVE = FSCORES(20 100) will give you the thinning of 100.

6. No. The cluster specific threshold is
"threshold parameter - X_B"
X_B is the cluster specific deviation from the average threshold parameter which is the non-random threshold parameter

Christoph Weber posted on Friday, April 27, 2018 - 2:56 pm

Thanks!
regarding 5: thus, if I only use savedata and factors, the pvs will be saved in one file, whereas data imputation will save the pvs in multiple files?

regarding 6: if X is continous X_B is the cluster specific mean?

Tihomir Asparouhov posted on Friday, April 27, 2018 - 4:13 pm

Yes on both