Plausible values for Factor Score Est... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Jonathon Little posted on Friday, December 03, 2010 - 10:47 pm
Given the following factor model:




F1 BY answer1* answer2 answer3;
F2 BY answer4* answer5 answer6;

F1@1 F2@1;



answer1-answer5 with answer6;
answer1-answer3 with answer5;
answer1-answer2 with answer3;
answer1 with answer2;

I generated 5 multiple impuations using Bayesian estimation with plausible factor scores saved. Using 5 imputed data sets I ran a multilevel CFA using WSLMV repeating the same model that was used for the imputation.

I want to examine how these plausible factor scores correspond to the behaviour of the True scores from my factor model.

Where I am confused is in how to go about reproducing the within-level factor correlations from the CFA model using my plausible factor scores. I would like to see how closely the plausible factor score correlations correspond to the correlations between my latents in my within-level CFA.

Any adivce on how to proceed would be very welcome?
 Jonathon Little posted on Sunday, December 05, 2010 - 2:01 pm
The line of syntax reading


is an error - please ignore
 Bengt O. Muthen posted on Sunday, December 05, 2010 - 4:53 pm
It seems like there are at least 3 ways to find the correlation between the 2 factors. (1) The 2-level Bayes run that you show gives a Bayes factor correlation estimate. (2) The WLSMV run based on the imputed data gives a WLSMV factor correlation estimate. (3) And the plausible values that your Bayes run generates can be used to compute the factor correlation.

As for (3) you get N*n factor scores for each factor, where N is the number of plausible draws and n is your samle size. You get the factor correlation from those N*n factor scores.

Also, take a look at this paper on our website:

Asparouhov, T. & Muthén, B. (2010). Plausible values for latent variables using Mplus. Technical Report.
 Jonathon Little posted on Monday, December 06, 2010 - 12:21 am
Thanks Bengt,

I think I didnt explain myself clearly. I was interested in examining how the plausible factor score estimates correlated with their true scores, that is, the within-level factors from my CFA. Is there any reason why I could not enter the plausible factor scores into my multilevel CFA as covariates (as below) in order to see how well they correlate with their respective true factor scores?


F1 BY answer1* answer2 answer3;
F2 BY answer4* answer5 answer6;

F1@1 F2@1 PlausibleF1@1 PlausibleF2@1;

F1 WITH PlausibleF1; (Correlation of Plausible factor socre with true factor score)
F2 WITH PlausibleF2;(Correlation of Plausible factor socre with true factor score)
PlausibleF1 WITH PlausibleF2;(correlation of plausible factor scores)

answer1-answer5 with answer6;
answer1-answer3 with answer5;
answer1-answer2 with answer3;
answer1 with answer2;
PlausibleF1 WITH answer1 to answer6;
PlausibleF2 WITH answer1 to answer6;

Any adivce on how to proceed would be very welcome?
 Bengt O. Muthen posted on Monday, December 06, 2010 - 3:26 pm
I am not sure that a high correlation from that approach is indicative of high-quality plausible values. I would instead look at how the plausible values for the two factors relate to each other and to other variables in comparison to estimating those quantities directly in the model.

If you are doing a Monte Carlo study you could you see how true scores (generated factor values) compare to plausible values.
 Jonathon Little posted on Monday, December 06, 2010 - 5:28 pm
I agree that looking at how the plausible values for the two factors relate to other variables in comparison to estimating those quantities directly in the model is useful.

Inspite of having a good fitting model I need to provide day-to-day users of the instrument (eg,clinicians) with confidence that a scoring method has correlational accuracy, univocality and that the scoring method (factor score estimates or plausible values) correlate with their true scores. This is because even for highly determinate factors (which mine are) I could still end up chosing a poor set of factor score estimates - these need to be evaluated somehow. Factor score estimates - Nunnally:

“If the multiple correlation [the proportion of determinacy in the factor] is less than .70, one is in trouble. In that instance the error variance in estimating the factor would be approximately the same as the valid variance. At a very minimum, one should be quite suspicious of factor estimates obtained with a multiple correlation of less than .50, because in that case less than 25 percent of the variance of factor scores can be predicted from the variables. Then one could not trust the variables as actually representing the factor.” (1978, p. 426).

Is it that the method that I suggested is technically incorrect to answer the question, or is it that you think it is not a useful test or not as useful as the other method you proposed?
 Bengt O. Muthen posted on Tuesday, December 07, 2010 - 12:04 am
Are you interested in the quality of the scores in terms of the quality of individuals' scores or in terms of the quality of summary measures and relationships with other variables?

Section 4.2 of this paper relates to the latter:

I recommend reading

von Davier M., Gonzalez E. & Mislevy R. (2009) What are plausible
values and why are they useful? IERI Monograph Series Issues and
Methodologies in Large-Scale Assessments. IER Institute. Educational
Testing Service.

If you can't find it, I am happy to send it.
 Jonathon Little posted on Tuesday, December 07, 2010 - 12:50 am
Both actually but for the moment, the former, the quality of scores in terms of individual's scores. I tried using regression weighting methods using the factor score coefficient matrix generated from a within-level polychoric correlation matrix with ML estimation and several unit weighted and course factor score methods but these all seem to perform very poorly using the method I pasted as syntax. The factor score estimates all correlate with their true scores no higher and often a lot lower than r=.65. (that's r not r-square).

I take it by your question, plausible values may not be suitable for individual use? I suppose this would make sense if we're using imputations as these are designed to estimate population parameters and standard errors and not individual's responses.

I will continue to read on the subject of plausible values but I'm still stuck with the problem of having a good approximate fit but no method for users to score the instrument with confidence. few people ever seem to bother evaluating their factor scoring methods for instruments so I wanted to make an extra effort to learn about it.
 Bengt O. Muthen posted on Wednesday, December 08, 2010 - 12:05 am
I think plausible values have an advantage over regular estimated factor scores in that each individual gets a distribution of values so that the uncertainty is clearly presented and the shape of the distribution is clear (an estimated factor score for an individual at most gets a SE). This is useful to have when you compute the variance over people and when you compute the relationship to other variables (see the section 4.2 of our imputation paper that I referred to). I don't know if I have seen that the average plausible value for a certain individual is any better than the estimated factor score for that individual.

Factor scores (and plausible values) have different strengths and weaknesses for different uses as was discussed already by Tucker in the 50's. For example, ranking individuals is one use, regression another. For a recent discussion, see Skrondal, A. and Laake, P. (2001). Regression among factor scores. Psychometrika 66, 563-575.

I think the evaluation approach you suggest may have a variation on the Heisenberger (?) problem - in evaluating the plausible values you alter the factor meaning. The factors become determined also by the

F WITH Plausible;

statements, which is not what you want.

Again, to really see how well the factor scores or plausible values work in the model and intended use situation, you want to do a Monte Carlo study so that you can compare the true, generated scores with your estimated ones.
 Jonathon Little posted on Thursday, December 09, 2010 - 5:11 am
Thank you Bengt, I appreciate your thoughts

 Caoimhe Martin posted on Monday, February 23, 2015 - 7:52 am
We asked people to evaluate 3 different profiles across 12 items (2x6) with wording corresponding across the items. We want to treat each set of 6 items as a latent variable and compare scores across the 3 profiles.
We tried using example 11.7 to get factor scores but got an error message.
Any guidance with this would be much appreciated, many thanks,
 Linda K. Muthen posted on Monday, February 23, 2015 - 3:45 pm
Please send your output and license number to
 Thomas Rodebaugh posted on Tuesday, April 07, 2015 - 8:04 pm
hi all,

i'm in receipt of a review in which the reviewer states (in part):

The authors also fail to mention known limitations of the three-step approach to examining relationships among latent variables (i.e., known biases in the effect estimates for the hierarchical regressions conducted on factor scores; see recent papers by Skrondal & Laake, and Lu & Thomas, and further back Tucker, 1971).

what i actually did was output plausible values based on the entire item/latent factor set, allowing the latent variables to correlate. for various reasons we then conducted regressions on the medians of those plausible values. thus, we did not use the three-step approach i've seen described elsewhere. further, as far as i can tell from the 2010 paper on the website about plausible values, the opinion on the mplus side would seem to be that what we did should be basically ok, leading to relatively unbiased estimates in the resulting regressions. i am not 100% sure of my interpretation, though.

any input from anyone knowledgeable in this area would be much appreciated. i'm about to use this basic approach again, and don't want to commit an error multiple times!
 Bengt O. Muthen posted on Wednesday, April 08, 2015 - 12:32 am
The advantage of having plausible values is to treat them as multiple imputations, that is, perform m repeated analyses with them when there are m plausible value data sets. It sounds like what you did lost that advantage. Factor scores, essentially only one draw instead of m, cause biases as the reviewer states.

See further the plausible value writings we refer to in our plausible value paper among our posted Bayesian papers.
 Thomas Rodebaugh posted on Wednesday, April 08, 2015 - 1:47 pm
thanks, that's helpful. in this case we already had MI data sets (5) in which we had tested factor structure using WLSMV and type=imputation. we ultimately analyzed through type=imputation 5 median plausible value datasets--one from each of the MI datasets. thus, there were 5 draws rather than 1, although 1 draw from each of 5 datasets imputed through other means (amelia II, as it happens).

would the resulting factor scores still be biased?

your answer is cuing me to realize that there are probably better or at least more elegant ways to do what we did, but it would still be helpful to know how far off we were, exactly, to begin with. thanks!
 Bengt O. Muthen posted on Thursday, April 09, 2015 - 12:28 am
The answers are somewhat complex, but found in those plausible values references:

[4] Mislevy R., Johnson E., & Muraki E. (1992) Scaling Procedures in
NAEP. Journal of Educational Statistics, Vol. 17, No. 2, Special Issue:
National Assessment of Educational Progress, pp. 131-154.

[6] von Davier M., Gonzalez E. & Mislevy R. (2009) What are plausible
values and why are they useful? IERI Monograph Series Issues and
Methodologies in Large-Scale Assessments. IER Institute. Educational
Testing Service.

There are probably other refs as well.
 Thomas Rodebaugh posted on Thursday, April 09, 2015 - 1:42 pm
thanks for the specific references!
 Bengt O. Muthen posted on Thursday, April 09, 2015 - 3:01 pm
Here are 2 more:

1.The effect of not using plausible values when they should be:
An illustration using TIMSS 2007 grade 8 mathematics data
--- DRAFT June 1, 2010 ---
Ralph Carstens, IEA Data Processing and Research Center,
Dirk Hastedt, IEA Data Processing and Research Center,

2. Predictive Inference Using Latent Variables with
Covariatesy (under review, Psychometrika)
Lynne Steuerle Scho eld
Department of Mathematics and Statistics, Swarthmore College
Brian Junker
Department of Statistics, Carnegie Mellon University
Lowell J. Taylor
Heinz College, Carnegie Mellon University
Dan A. Black
Harris School, University of Chicago
June 26, 2014
Plausible Values (PVs) are a standard multiple imputation tool for analysis of large
education survey data that measures latent pro ciency variables. When latent pro -
ciency is the dependent variable, we reconsider the standard institutionally-generated
PV methodology and nd it applies with greater generality than shown previously.
When latent pro ciency is an independent variable, we show that the standard institu-
tional PV methodology produces biased inference because the institutional conditioning
model places restrictions on the form of the secondary analysts' model. We o er an
alternative approach that avoids these biases based on the mixed e ects structural
equations (MESE) model of Scho eld (2008).
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message