Mplus Discussion >> Plausible value

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Plausible value

Mplus Discussion > Missing Data Modeling >

Message/Author

Daniel K posted on Thursday, August 12, 2010 - 11:14 pm

Dear Dr. Muthen

What are plausible values of latent variables for? The values are simply factor scores? Then, in LGM, the values are individual intercept and slope values (like EB in HLM)?

Instead of latent means analysis, can we simply compare mean plausible values?

Thank you very much for your kind consideration.

Dan Kim

Linda K. Muthen posted on Friday, August 13, 2010 - 9:45 am

Plausible values are factor scores. I would not use factor scores as their means are not the same as latent variable means in a latent variable model.

Christoph Weber posted on Tuesday, August 16, 2011 - 2:55 am

Dear Dr. Muthen,

I am using complex data (PISA). There are 5 plausible values for test scores in math and reading. I am estimating a path model, where reading is an indep. var and math is the dep. var.

I am using the type = imputation command in conjunction with the cluster, weight and complex command.

1.) Is it necessary to use 25 Data sets (i.e.: Dat1 = mathpv1, readpv1; Dat2 = mathpv1, readpv2; ...; Dat6 = mathpv2, readpv1, ...) or would it also be correct to use just 5 data sets (Dat1 = mathpv1, readpv1; Dat2 = mathpv2, readpv2, ...)?

2.) What would be when I further use multiple imputation for the other variables (x1 - x10) of the model? Are 5 data sets (mathpv1, readpv1, x1imput1, x2imput1, ...) enough? Or do I also have to create data sets for each combination of imputed values?

Best regards and thanks

Christoph Weber

Linda K. Muthen posted on Wednesday, August 17, 2011 - 1:08 pm

I think you need to direct this question to someone who is knowledgeable about the PISA data.

Christoph Weber posted on Wednesday, August 17, 2011 - 2:48 pm

Thanks,
but is it correct to use type=imputation to analyse plausible values?
Christoph Weber

Bengt O. Muthen posted on Wednesday, August 17, 2011 - 6:07 pm

Yes. And typically you would have imputed data sets where each includes all your latent variables (math and reading in your case). So you would have say 5 of those, not 5 for one of the latents and 5 for the other.

Christoph Weber posted on Thursday, August 18, 2011 - 6:12 am

Thanks, I have further another question.
I'm running a path model with the data.

My dependent variable (y) is a binary var. The other variables are continuous.
y ON x1 x2 x3 x4 x5;

x1 ON x2 x3 x4 x5;

x2 ON x3 x4 x5;

x3 ON x4 x5;

I use MLR Estimation and request the stdyx solution.

How are the Coeff. standardizes?
How are these stand. coeff. interpreted?
Is R� something like Pseudo R�?

And I don't get exp(b) - coefficients?

Thanks
Christoph Weber

Bengt O. Muthen posted on Friday, August 19, 2011 - 6:31 pm

Standardization is as usual, described in the UG - with the exception that with a categorical DV, the residual variance is either 1 (probit link) or pi-square/3 (logit link).

R-square uses the usual formulas and these same residual variances, as advocated by McKelvey and Zavoina (1975) and also in books on categorical data.

You should get exp(b), but if you don't, define them in Model Constraint as New parameters.

Christoph Weber posted on Saturday, August 20, 2011 - 11:37 am

Thanks for your help
Christoph Weber

Artur Pokropek posted on Friday, February 03, 2012 - 2:07 am

Hi,
I've got a little question. In DATA IMPUTATION command we could find option for rounding number of decimals for imputed continuous variables (ROUNDING=). But how to set the number of decimals for plausible values i.e. imputations for latent variable? When I put name of latent variable in rounding option (for instance ROUNDING = f1 (5); ) it does not work.
Thank You
Artur

Linda K. Muthen posted on Friday, February 03, 2012 - 8:59 am

There is no such option for plausible values.

Ksnow posted on Tuesday, July 03, 2012 - 10:42 pm

Dear Dr. Muthen �G
I am doing analysis about bifactor. I want to get the plausible value of bifactor model. Can Mplus do that? I have
tried to add the plausible value code in the bifactor model. However, it doesn't work? Am I wrong or the Mplus can not do that�H

Linda K. Muthen posted on Wednesday, July 04, 2012 - 11:02 am

Please send the output and license number to support@statmodel.com.

Jan Zirk posted on Tuesday, September 25, 2012 - 6:14 pm

Dear Bengt or Tihomir,

In Asparouhov & Muthen (2010; Plausible Values for Latent Variables Using Mplus) you mention the plausible values (pvs) but when they are extracted from a categorical variable or a latent factor MPlus gives us mean median, SD and CI values; did you mean in Table 3 mean or median pvs?

Tihomir Asparouhov posted on Wednesday, September 26, 2012 - 5:46 pm

Table 3 is the mean pvs, but the median will give the same result because the posterior distribution for the plausible value is normal in that particular example.

Jan Zirk posted on Thursday, September 27, 2012 - 1:37 am

Thank you very much.

Maria Rasmusson posted on Friday, May 03, 2013 - 1:09 am

Dear Dr Muth�n,

In one of my PhD studies I've made secondary analysis on Swedish PISA data. The SEM models were tested with Mplus using the maximum likelihood parameter estimator (MLR) with the two-level complex analysis type. PISA 2009 data was used.

When I performed the analysis I was not aware of the possibility to use all 5 plausible values offered for each student. Instead I used one of the plausible values and tested the models with each PV at a time and compared the five outputs (that did not differ much).

However, one of the anonymous reviewers of my manuscript informed me about the possibility to perform the testing of the CFA models and estimating the parameters using all the five PVs in Mplus in order to get correct standard errors (since group-differences are in focus).

How is this done? Should I use type=imputation as referred to above by Christoph? What will the input instruction look like when it is supposed to call for five different data files? Is it all done in one analysis?

Thanks in advance!
Best regards,
Maria Rasmusson

Bengt O. Muthen posted on Friday, May 03, 2013 - 8:23 am

Yes, use TYPE=IMPUTATION and you will get averaged estimates and correct SEs and chi-2 using all the imputations. See UG ex 11.8, part 2. You find the data, including the "implist" file on our website under the User's Guide examples.

Maria Rasmusson posted on Tuesday, May 07, 2013 - 2:27 am

Tusen tack f�r hj�lpen!
Thanks a lot. It seems to work fine!

Best regards,
Maria

Thomas Rodebaugh posted on Wednesday, June 26, 2013 - 10:32 am

hi there,

we're scratching our heads over this output message we've been getting:

*** ERROR in DATA IMPUTATION command
Unknown option:
PLAUSIBLE

are there circumstances under which even proper use of the plausible command would produce this error? or perhaps somehow we are using it incorrectly?

a slightly shortened version of our input file is below, in case that helps. any insight much appreciated! we've tried a number of permutations of the below, but keep getting the same message. we're using mplus 7.

data: file is informant2.dat;
variable: names are [quite a few];
IDVARIABLE = id;
usevariables are avfmps12 avfmps16 avfmps19 avfmps24 avfmps30
fmps12 fmps16 fmps19 fmps24 fmps30;
missing are avfmps12 avfmps16 avfmps19 avfmps24 avfmps30
fmps12 fmps16 fmps19 fmps24 fmps30 (-99);
categorical are fmps12 fmps16 fmps19 fmps24 fmps30;
analysis: estimator = BAYES;
model: [is a complicated factor structure]
DATA IMPUTATION:
PLAUSIBLE = hplau.dat;
SAVE = hplau2.dat;
output: TECH1 TECH8;

Bengt O. Muthen posted on Wednesday, June 26, 2013 - 3:24 pm

PLAUSIBLE= is not an option in the DATA IMPUTATION command. This was changed in Version 7 to

SAVE = FSCORES(10);

in the SAVEDATA command. So the change is

SAVEDATA:
FILE IS hplau.dat;
SAVE = FSCORES(10);

where 10 is the number of imputations. See the Version 7 User's Guide.

Thomas Rodebaugh posted on Monday, July 01, 2013 - 2:21 pm

thanks, bengt! it works fine now. that will teach us to use version 7 while looking at the version 6 manual!

Hsien-Yuan Hsu posted on Tuesday, July 09, 2013 - 9:17 pm

Dear Dr. Muthen,

I have a quick question regarding Ex 11.7.

Factor scores for the latent variables f1, f2, and f3 would be obtained and can be used in secondary analysis.

My question is that:
Do I need to include important background variables (pointing to indicators) for better estimated factor scores? Do you have any reference?

I appreciate your reply in advance!

Hsien-Yuan

Linda K. Muthen posted on Wednesday, July 10, 2013 - 12:28 pm

If the factors are well-measured by several good factor indicators, you don't need to include background variables. See the special issues of JEBS and JEM on NAEP.

Dan Cloney posted on Monday, October 06, 2014 - 5:45 am

Hi,

I have been provided two datasets that include information on the same participants.

The first is a dataset with observations on a number of variables that include some missing values.

The second is a set of 5 plausible values for a proposed new latent indicator. There are plausible values for a subset of the participants observed in the first dataset.

Can you suggest the best approach to incorporate these into a single imputation analysis?

Should I estimate an arbitrary model with the first dataset, then manually add the PVs to the saved imputed datasets, and then estimate the final model?

Bengt O. Muthen posted on Monday, October 06, 2014 - 10:47 am

I don't know what to recommend here. It sounds like the first data set has more subjects than the second, but the second has more variables - the latent variable's plausible values. Not sure what the final modeling would be.

Dan Cloney posted on Monday, October 06, 2014 - 2:52 pm

Hi Bengt,

Thank you for your response.

You are right, the first data set contains more subjects than the second. e.g., u11, u12, u13...u33 and 3000 subjects long, with some missing data.

The second data set contains fewer variables that the first: 5 PVs for one latent variable. e.g., f1_pv1...f1_pv5 and 2000 subjects long.

The final model is intended to be a growth model, that includes f1 as a (time-invariant) covariate.

Does that extra information give you any ideas?

Bengt O. Muthen posted on Monday, October 06, 2014 - 4:35 pm

I assume that the two datasets have some observed variables in common. And that you don't have the observed indicators of the f1 factor in either dataset.

Dan Cloney posted on Monday, October 06, 2014 - 5:02 pm

That's right, the observed indicators of f1 do not exist in either data set.

The only element common to both data sets are the subject and cluster IDs.

Bengt O. Muthen posted on Monday, October 06, 2014 - 5:48 pm

Then I think all you can do is analyze the n=2000 subjects in common to the two data sets, merging the PV data sets with the n=2000 subset of the other data set to get those observed variables. Then use Type=Imputation data input.

Lisa M. Yarnell posted on Friday, February 13, 2015 - 11:46 am

Hi Bengt and Linda, we are running estimates (means) across 20 plausible values using TYPE=IMPUTATION for one variable in our data set, but are not getting standard errors for the mean score on the variable created through the combination of 20 p.v.'s. Can you help us know how to get the standard errors? We did not specify the variable created from the 20 p.v.'s as latent. Do we need to?

Or is there a specific output command that we need to utilize in order to get the standard errors?

We ran the analysis with gender as a grouping variable, and just receive this output (with no standard errors):

ESTIMATED SAMPLE STATISTICS FOR MALES

Means
MATHCOMP
________
1 218.521

Covariances
MATHCOMP
________
MATHCOMP 1413.018

Correlations
MATHCOMP
________
MATHCOMP 1.000

ESTIMATED SAMPLE STATISTICS FOR FEMALES

Means
MATHCOMP
________
1 225.236

Covariances
MATHCOMP
________
MATHCOMP 1283.463

Correlations
MATHCOMP
________
MATHCOMP 1.000

Tihomir Asparouhov posted on Friday, February 13, 2015 - 2:59 pm

Take a look at User's Guide example 13.13. This is what you will have to do.

The input file should look something like this.

DATA: FILE IS Listof20FileNames.dat; type=imputation;
VARIABLE: NAMES ARE MATHCOMP g;
GROUPING IS g (1 = male 2 = female);
MODEL: MATHCOMP; [MATHCOMP];
MODEL female: MATHCOMP; [MATHCOMP];

Lisa M. Yarnell posted on Thursday, February 19, 2015 - 12:46 pm

Hello, we specified out syntax as above, but are still not receiving SEs (see select input and output below). Do we need to specify any different options to receive SEs? We cannot send the data because it is restricted use.

DATA: FILE IS "G4reading2021215list.txt";
type=imputation;
VARIABLE: NAMES ARE dsex origwt srwt01-srwt62 mathcomp;
Usevariable is mathcomp;
MISSING ARE .;
WEIGHT IS origwt;
REPWEIGHTS = srwt01-srwt62;
GROUPING IS dsex (1=males, 2=females);

ANALYSIS:
TYPE = COMPLEX BASIC;
REPSE = JACKKNIFE2;

MODEL: [mathcomp]; mathcomp;
MODEL females:
[mathcomp]; mathcomp;
OUTPUT: Sampstat STDYX;
-----------------------------------------
(from our output)

NOTE: These are average results over 20 data sets.

ESTIMATED SAMPLE STATISTICS FOR MALES
Means
MATHCOMP
1 218.521

Covariances
MATHCOMP
MATHCOMP 1413.018

Correlations
MATHCOMP
MATHCOMP 1.000

ESTIMATED SAMPLE STATISTICS FOR FEMALES
Means
MATHCOMP
1 225.236

Covariances
MATHCOMP
MATHCOMP 1283.463

Correlations
MATHCOMP
MATHCOMP 1.000

Tihomir Asparouhov posted on Thursday, February 19, 2015 - 4:47 pm

The standard errors are in the results section with the title "MODEL RESULTS".

Lisa M. Yarnell posted on Friday, February 20, 2015 - 7:34 am

Dear Tihomir, I do not see the SE estimates. Our entire RESULTS section of the output is below.

Do we need to select an estimation type other than COMPLEX BASIC? Should we be using TYPE = COMPLEX instead (not specifying BASIC)? Is there an Output option that we need to select, in order to see the SEs? Thank you.

RESULTS FOR BASIC ANALYSIS

NOTE: These are average results over 20 data sets.

ESTIMATED SAMPLE STATISTICS FOR MALES

Means
MATHCOMP
1 218.521

Covariances
MATHCOMP
MATHCOMP 1413.018

Correlations
MATHCOMP
MATHCOMP 1.000

ESTIMATED SAMPLE STATISTICS FOR FEMALES

Means
MATHCOMP
1 225.236

Covariances
MATHCOMP
MATHCOMP 1283.463

Correlations
MATHCOMP
MATHCOMP 1.000

DIAGRAM INFORMATION

Mplus diagrams are currently not available for TYPE=BASIC. No diagram output was produced.

Beginning Time: 14:25:46
Ending Time: 14:34:52
Elapsed Time: 00:09:06

Lisa M. Yarnell posted on Friday, February 20, 2015 - 8:06 am

Tihomir, we are running the analyses without specifying BASIC and are now receiving the SEs.

Many thanks.

Tihomir Asparouhov posted on Friday, February 20, 2015 - 9:23 am

Yes - remove BASIC.

Pamela Medina posted on Saturday, April 18, 2015 - 2:09 pm

Dr.'s Muthen,

I have run imputation using Bayesian estimation for a set of data with missing values. The data is categorical (binary or ordinal) and I would like for the imputed values to remain between 0-7, and not include any decimal values.

I use the commands:
VALUES = 0 1 2 3 4 5 6 7;
ROUNDING = 0;

I am getting the error *** ERROR in DATA IMPUTATION command
Missing values at the end of the ROUNDING option:
0

Can you please advise?

Many thanks!
Pamela

Linda K. Muthen posted on Saturday, April 18, 2015 - 3:06 pm

See the user's guide for the proper specification of the ROUNDING option.

TA posted on Monday, May 25, 2015 - 10:50 am

Dear Linda and Bengt,

If one saves the FSCORE using ESEM, and WLSMV was chosen as an estimator (given the data was categorical), are the factor scores plausible values or factor scores?

Does one have to specify the estimator as Bayesian in order to get plausible values?

Thanks!

Bengt O. Muthen posted on Monday, May 25, 2015 - 3:05 pm

Q1. Factor scores - that is, one value per subject.

Q2. Yes. See our Plausible value paper under Bayesian Analysis.

TA posted on Monday, May 25, 2015 - 4:43 pm

Thanks Bengt. If we eventually conduct a latent class on the factor scores, would you recommend using plausible values or factor scores for this analysis?

Second, how exactly are the factors score computed given there are a number of ways to do so? Can you provide me with a reference of how Mplus does it?

Thanks for your help, as always!

Bengt O. Muthen posted on Tuesday, May 26, 2015 - 2:59 pm

Q1. Plausible values so you get the variation in scores covered. Although that then restricts the tests you can carry out.

Q2. You mentioned WLSMV where the factor scores are computed using MAP - the maximum a posteriori method. It is describe in the Tech appendix 11 for Version 2 on our website.

Lisa M. Yarnell posted on Wednesday, December 09, 2015 - 1:08 pm

Dear Linda and Bengt,

I notice that for a 2-level model using MLR estimation and 20 plausible values (pv's), the average loglikeihood (H0) across all 20 pv's is given -- but no scaling factor is given.

1) Is that because the scaling factor would differ for each of the 20 estimates of the model?

2) Does this mean that I cannot conduct loglikeihood tests for such models (with MLR), in comparing nested models?

When I ran the model based on just one pv, both the H0 value and the scaling factor were given.

Thank you.

Lisa M. Yarnell posted on Wednesday, December 09, 2015 - 1:39 pm

Additionally,

3) I am finding that standard errors for coefficients in the model are INCREASING rather than decreasing when twenty plausible values are used, relative to when I ran the model based on 1 pv.

Is this typical, and what is the reason for this? I know that pv's help derive *unbiased* standard errors...but what is the effect of pvs on precision?

Do the standard errors increase with 20 pv's because there is greater variation (for the estimated regression coefficient) across all 20 runs of the model than there was in the model based on a single p.v.?

Michelle Wu posted on Sunday, February 04, 2018 - 2:28 pm

Hi Linda and Bengt,

I'm new to the Mplus. I'm trying to conduct a path analysis with PISA 2012 and I'm still confused with how to handle the plausible values of student performance data intentionally set in the PISA dataset after reading this thread.

Suppose the plausible variables are pv1 pv2 pv3 pv4 pv5, should I separate these values, along with the covariates into five files and then use DATA: FILE IS implist.dat; TYPE = IMPUTATION;?

Or is it correct to specify pv1-pv5 as imputed values just by using TYPE = IMPUTATION;?

Another thought of mine is to set these five pvs as the imputed values of a newly created variable pv0 in say STATA. Then read this file in Mplus. Is this doable at all?

Please guide me with some directions. Thank you so much.

Bengt O. Muthen posted on Monday, February 05, 2018 - 9:14 am

Yes, separate these values, along with the covariates into five files and then use DATA: FILE IS implist.dat; TYPE = IMPUTATION;

Diane Putnick posted on Monday, April 30, 2018 - 7:23 am

I have a very large (N > 5000) and complex dataset of children nested in families. I am trying to create a number of latent variables and look at relations among them. I have a combination of categorical and continuous indicators, depending on the factor, and some of my CFA models are taking days to run because the factor structures are complex and I'm trying to use MLR rather than WLSMV estimation when possible.

Because of the complexity of these models, I want to save factor scores from these measurement models so I can look at relations between latent variables (which is my main interest). After reading a bit about this, it seems that the regression method is not ideal, and the Bayesian plausible values method does not work with complex data (nested subjects). The regression method clearly does not replicate the correlations (or lack thereof) between factors as they are specified and estimated in the model.

Would it be better to randomly choose a child from each family and use Bayesian plausible values, or keep all of the participants and use regression-based factor scores? Theoretically, I am more interested in the relations among variables than the within-family variance per se, so I am leaning toward Bayesian analysis with a reduced sample. Can I compare plausible values for different factors saved from different measurement models?

Tihomir Asparouhov posted on Monday, April 30, 2018 - 4:20 pm

I would recommend using two-level Bayes model. That will rescue you from waiting days for the estimation and will improve the quality of the estimation as well, in addition to the fact that the plausible values are more reliable for second step estimation. If you are not interested in the second level you have the option to make that second level fairly simple. For example, estimate a very simple second level model where each variable has a between component. You can even declare as within= those variables that do not have substantively large second level variance or statistically significant second level variance. If none of the variables have a substantive large / statistically significant variance you can ignore the second level completely and run the model as a single level Bayes model. Given the many possible alternatives I would not recommend reducing the data. You can also consider running type=complex wlsmv or type=twolevel wlsmv for comparative purposes.

Diane Putnick posted on Tuesday, May 01, 2018 - 5:45 am

Tihomir, thank you for this excellent advice. I will try a two level Bayes model!

Do you have any guidance about how many plausible values to save?

Tihomir Asparouhov posted on Tuesday, May 01, 2018 - 10:14 am

It depends what you are using the plausible values for. If you are using them to compute correlations between variables (or regression parameters) you can use 5 or 10. Likelihood-ration tests usually require more, for example 100.

If you are using the plausible vales to make inference of the population level, you can use as low as 5. f you are using it to make inference on the individual level you want to use 100.

Matthew Constantinou posted on Tuesday, July 03, 2018 - 10:28 am

In order to use plausible values of factors as factor scores in subsequent analyses, do you expect the user to manually create multiple data sets from the single data set outputted - which saves each random draw in a new column - and then specify a TYPE = IMPUTATION?

Or is there some way of specifying the distributional properties of plausible values from the single data set they are outputted in?

Note I am not referring to plausible values of response data (e.g., UG 11.8), but plausible values of factors. Also note I attempted to save the file as "plaus_*.dat" in the hope that each random draw would be saved in a different file but this failed.

Many thanks,
Matthew

Matthew Constantinou posted on Tuesday, July 03, 2018 - 10:50 am

Put another way, how did you achieve examples 4.1 (and 4.2) in the Asparouhov & Muthen (2010) technical report, in which you describe the following:

"we use the two imputation models
to impute the latent class variable C. We generate 5 imputed data sets.
The imputed data sets are then used to estimate a logistic regression of the
imputed values for C on the predictor variable X. This is done as in the
usual imputation analysis using the Mplus implementation of Rubin (1987)
method".

When I use the following:

"SAVEDATA: FILE = ex11.7plaus.dat;
SAVE = FSCORES (20);
FACTORS = f1-f3;"

I get a single data file which includes each draw/imputation for each factor in a separate column, rather than in a separate dataset.

Is there a way in Mplus to specify an imputation analysis where each draw/imputation is reflected as a variable within a single dataset, or must I manually create the imputation data sets from the single data file outputted?

Tihomir Asparouhov posted on Tuesday, July 03, 2018 - 3:57 pm

You need to add this command

DATA IMPUTATION: NDATASETS = 20; SAVE=plaus*.dat;

Matthew Constantinou posted on Thursday, July 05, 2018 - 5:15 am

Thank you Tihomir; that did the trick.

Herb Marsh posted on Monday, May 25, 2020 - 7:08 pm

Hi Bengt et al.
I am using PISA2018 data that has 10 Plausible Values for each achievement test and 1000 plus additional variables. There are no missing values for the achievement PVs but many patterns of missing data for most other variables>

1. is there an easier way to analyze the data that does not require creating 10 separate data sets?

2. Is it reasonable to create copy a subset (say 150) variables into 10 datasets and use one of the PVs for each dataset. Thus using FIML to handle missing data rather than doing a HUGE imputation?

3. Any other suggestions?

Tihomir Asparouhov posted on Tuesday, May 26, 2020 - 1:43 pm

1. I don't think so
2. Yes. This can be setup like in User's Guide 13.13 and you can treat the missing data with FIML