Message/Author 


Hello! I have used Mplus to confirm a scale which I developed. Now I want to use my data and the scale to compare groups (demographics) according to the latent variables. I think what I need to do is to get a factor score for each of the observed variables using Mplus then transfer the data to SPSS and do the following: Example: I have 9 Latent Variables. The first Latent Variable (FAC1) is comprised of the following observed variables: C7, C11, C20 and C24. 1. Total the OV factor scores for each of the OV that contribute to a particular LV. Example: (FS for C7) + (FS for C11) + (FS for C20) + (FS for C24) = FAC1 FS Sum 2. Divide each OV factor score by the total of OV factor scores to get a "proportional factor score"  so that all of the OV factor scores will equal 1 when summed  to give a % of the impact of each OV. Example: (FS for C7)/(FAC1 FS Sum) = PFS for C7, where PFS = Proportional factor score 3. Use the proportional factor scores in a weighted sum formula. This would allow me to give meaning to the values b/c they would be on the same scale as the responses, 15. Example: FAC1 = ((PFS for C7)*C7) + ((PFS for C11)*C11) + ((PFS for C20)*C20) + ((PFS for C24)*C24), where C7 is the respondent's answer to question C7 and FAC1 is now the calculated response for that person's beliefs about Factor 1 (Latent Variable 1) Perhaps there is a better way to do this. Your advice would be greatly appreciated. Stephanie West 


You can compare the means, variances, and covariances of the latent variables using multiple group analysis and chisquare difference testing. Is this what you mean? 

Stephanie posted on Friday, May 14, 2004  9:56 am



Yes, but I don't know how to do that. 


I don't know how to use multiple group analysis so I was going to use the above method. Would that work and if so, how can I get the factor scores? 


You can get the factor scores using the SAVEDATA command. See the user's guide for details. If you are interested in seeing the steps we use to test measurement invarinace and population heterogeneity using multiple groups, you could purchase the Day 1 handout from our short courses. 


I have ordered the handout and will try that. In the meantime, I'd like to get those factor scores. But when I used the SAVEDATA command, it said that it couldn't find the data. Any suggestions on what I need to do. *** ERROR in Savedata command Only sample correlation matrix may be saved when there is at least one categorical dependent variable. 

Stephanie posted on Friday, May 14, 2004  12:39 pm



Latest error message: THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F3. THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL. 


You must have a negative residual variance or a correlation greater than one. Please send your output to support@statmodel.com and I will look at it. 

Paul Kim posted on Monday, August 09, 2004  9:42 pm



I'm trying to create factor scores but I get the following error message: MINIMIZATION FAILED WHILE COMPUTING FACTOR SCORES FOR THE FOLLOWING OBSERVATION(S) : 457 FOR VARIABLE V32011 484 FOR VARIABLE V32001 860 FOR VARIABLE V32001 Is this a data cleaning problem? I've looked through my data and cleaned it the best I could and checked for missing data, but I can't seem to fix this. Any suggestions? 


These observations must have unusual patterns, for example, getting the easy items wrong and the difficult items correct. This makes optimization difficult. 

Anonymous posted on Monday, February 14, 2005  9:39 am



What are the 8 columns one gets from the Savedata: save=fscores option? thanks 


If you look at the end of your output, the format and order of the variables in SAVEDATA file are described. 


Hello Dr. Muthen, Which regression method is used to obtain factor scores in Mplus  andersonrubin or Bartlett or is there some other? regards 

bmuthen posted on Monday, March 21, 2005  5:39 pm



Mplus uses the regression method (see e.g. LawleyMaxwell's FA book)  also known as the modal posterior estimator  for continuous outcomes and for categorical outcomes with WLSMV. In other cases, it uses the expected posterior distribution approach. 


Hello Dr. Muthen, Thanks for the clarification. 


Hi Linda, I used SAVEDATA: SAVE = FSCORES to save factor scores from the CFA, but I found all the factor scores in the output file are either 0 or 1. Can I ask what goes wrong here? Or if it is what I should get, how should I understand it? Thank you! Shuang 


Sorry Linda, please discard my last questionl. There are three columns that have values look like factor score. Is the first column the fs for the first factor, sceond column factor scores for the second factor, third column residual? Thank you! Shuang 


Hi Linda, I have one more followup question. The number of individuals in the factor score file is different from the number of individuals in the input file. Is there any way to match these two files to know which factor score is for which individual? Thank you! Shuang 

bmuthen posted on Wednesday, June 15, 2005  7:41 am



Please send your output,saved file, and data to support@statmodel.com. 

Anonymous posted on Wednesday, June 29, 2005  11:41 am



Hi, Linda, I have a question about the factor score obtained from CFA. I used save=fscore command. When I changed the sequence of indicators in the BY command, the factor score also changed. What’s happed? Which factor score should be used? Thanks 


When you change the sequence of the factor indicators, a different factor loading is fixed to one to set the metric of the factor. This is why you get different factor scores. Both scores are valid and should correlate one with each other. 

samruddhi posted on Tuesday, September 06, 2005  9:56 am



I have 10 categorical variables and am predicting one factor using CFA. here is the output: CFI/TLI CFI 0.971 TLI 0.986 Number of Free Parameters 10 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.036 SRMR (Standardized Root Mean Square Residual) Value 0.041 WRMR (Weighted Root Mean Square Residual) Value 1.281 I see that all but the WRMR value shows a good fit. Could you, please, share with me your insight on how to interpret these results in light of WRMR value being so much higher than what indicates a good fit for my model? Thanks much. 

Anonymous posted on Tuesday, September 06, 2005  9:58 am



can I get a factor score with Mplus if all my indicators are categorical? Mplus guide says this is not possible, any suggestions are most welcome. thanks! 


Our recent experiences with WRMR have not shown it to be as good as we had hoped. I would not worry about it. But I would look at my chisquare. 


Factor scores for factors with categorical indicators have been available for several years. You must be looking at a very old user's guide. 

Anonymous posted on Tuesday, September 06, 2005  10:39 am



thanks, linda, i am using 2001 user's guide reprinted in 2002. i will find a copy of the new one and use that. many thanks. 

samruddhi posted on Tuesday, September 06, 2005  10:46 am



Thanks, Linda, for your quick response re: WRMR. Chisquare is 113.157 (p=0.0000) but, with a sample size of 2444, my understanding is that chisquare is not a good measure. let me know if you think otherwise. could you, also, verify for me if the following values can be used to assess model fit: SRMR 250) TLI > 0.95 CFI > 0.95 RMSEA < 0.06 thanks much for your guidance. 

samruddhi posted on Tuesday, September 06, 2005  10:52 am



In the previous message, I meant SRMR 250) thanks. 

samruddhi posted on Tuesday, September 06, 2005  10:54 am



In the previous message, I meant SRMR <0.08 works for sample greater than 250. thanks. 


Chisquare can be sensitive to sample size but that is not a reason to ignore it. You can do a sensitivity study where you free parameters until chisquare is acceptable and see if in doing so the parameters in your original model stay the same. If they do, then the poor chisquare was probably due to its sensitivity. If the original parameters change a lot, then the model probably does not fit well. For acceptable cutoffs for fit measures, see the Hu and Bentler article from several years ago in Psych Methods and also the Yu dissertation on our website. 

Elke Pari posted on Saturday, October 01, 2005  7:01 am



Hi, I have a problem in getting factor scores. My dataset contains of 9 variables which are categorical. If I try to get factor scores through the command "SAVE=FSCORES" I get the following error message: "THE MODEL ESTIMATION TERMINATED NORMALLY THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL." What do you suggest could be wrong? 


It is difficult to say exactly what the problem is without more information. Please send your input, data, output, and license number to support@statmodel.com. 

Marco posted on Wednesday, November 16, 2005  5:25 am



Hello Drs. Muthén, I would like to conduct a multilevel regression analysis with Mplus. Since my indicators are not tauparallel or even tauequivalent (bad fit for a CFA with equality constraint on the factor loadings), the simple mean/sum of the indicators aren´t the best estimators of a person´s true score. I know that it is possible to estimate a multilevel model with latent variables, but I would prefer to keep it simple. Therefore... ...would it be advisable to use the factor scores as predictors? I found another estimator in a book from Roderick McDonald (Test Theory, 1999), which takes the factor loadings and the residual variance into account, but reduces the variance of the estimated true score dramatically (compared to the factor scores of Mplus). ...I have a hierarchical sample, so I will estimate the factor scores with TYPE=COMPLEX. Does this option affect the estimation of the factor scores? (Maybe it is problem to take the nonindependence twice into account: first within the CFA with TYPE=COMPLEX and afterwards within the multilevel regression analysis) Thanks a lot for your help! Marco 

Marco posted on Wednesday, November 16, 2005  5:41 am



Sorry, I forgot one question: It seems to be, that Mplus estimates factor scores even for cases with all missings on the corresponding indicators. How is that possible? (I used MLR with TYPE=COMPLEX MISSING H1) Thanks, Marco 

bmuthen posted on Wednesday, November 16, 2005  8:45 am



Perhaps you have covariates in the model. 

bmuthen posted on Wednesday, November 16, 2005  8:49 am



Regarding your 5:25 AM question, if you are going to use factor scores in a multilevel setting it is best if you base those on a multilevel factor analysis model. So you would use type=twolevel. For type = complex to be formally correct, you have to assume that the factor loadings are (approximately) the same on both the within and between level, which is often not the case. 

Marco posted on Wednesday, November 16, 2005  12:52 pm



Thanks a lot for your reply, but I'am not sure, whether I understand you correctly. Does that mean that the factor structure obtained by type=complex is in fact a mixture of two factor structures, one between and one withingroup? 

bmuthen posted on Wednesday, November 16, 2005  4:45 pm



Yes. Type = Complex estimates the same parameters as in a regular (singlelevel) factor analysis. The aim of Type = Complex is to give SEs and chisquare corrected for the nonindependence of the observations due to clustering. Sometimes this is not sufficient but Tyep = Twolevel is needed. 

Marco posted on Friday, November 18, 2005  12:38 am



Thanks for the clarification, but that produces a practical question. My grouplevel observations are limited to 38 clusters and there are 19 Indicators on the individuallevel (although most of them have little betweenvariance), so probably I will have to reduce the number of betweenparameters. Are there some practical guidelines about how to do this best? For example, is there a ruleofthumb about the minimum between variance needed? In which cases would it be reasonable to fix the residual variances on the betweenlevel to zero or constrain the loadings on both levels to be equal? I guess that this is a wide topic, so maybe you could provide a reference. Many thanks! 

bmuthen posted on Friday, November 18, 2005  5:29 am



To have success with your twolevel factor analysis, I would recommend reading and following the 5 steps of Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. (#55) It may be that you need only a 1factor model on between and perhaps zero betweenlevel residual variances. 


Hi! I've a question about the estimation of factor means within a multiple group analyses (model: 3 factors with 4 indicators each). As a default the factor means are fixed to zero in the first group, while in the second group factor means are free. I want to overrule the default and estimate the factor means of both groups by fixing the intercept of an observed dependent variable to zero (one for each factor). But it doesn't work. I hope you can help me solving this problem. thank you in advance, Jantine 


This should work and I am not sure what you mean when you say it does not work. I suspect that you are not freeing the factor mean in the classspecific MODEL command for the first group. If this does not help you solve your problem, please send your input, data, output, and license number to support@statmodel.com 

Jinseok Kim posted on Tuesday, January 03, 2006  10:25 pm



I am trying to estimate a latern interaction modeling in mplus. Schumacker introduced some approach by Joreskog that used "latent variable score" to estimate a sem with latent interaction modeling (http://www.ssicentral.com/lisrel/techdocs/lvscores.pdf). It seems to me attractive but his explanation is all in LISREL language. So, I was wondering if I can do the same modeling using mplus. Any of your thoughts and suggestions will be greatly appreciated. Thanks. 

bmuthen posted on Wednesday, January 04, 2006  8:53 am



Using estimated factor scores only leads to approximate solutions. A better alternative is to use the Mplus ML approach to latent variable interaction modeling which is in line with the KleinMoosbrugger method from Psychometrika. 


Hello! I am attempting to generate factor scores for a 2 factor model with combined continuous and categorical (binary) indicators. The model fits adequately. I have been using the save = fscores command, but for some reason the program only saves the raw data. It does not seem to be generating factor scores. Is there a problem generating factor scores with combine categorical and continuous indicators? Thank you in advance for your help! 


No, this is not a problem. The only reason factor scores would not be computed is if the model did not converge or there was a warning about negative residual variances or some other problem. The only way to know is to send your intput, data, output, and license number to support@statmodel.com. 


It looks like the residual variance for one of the categorical variables is negative. Is there a way to fix this? Or do I need to alter my model? Thanks! Karen 


I would suggest modifying the model. Did you do an EFA on these items? If not, that is a good place to start. 


Hi, it's huang. 1. I tryied to save the factor score, but the save data information in the ouput stated as below: "Factor scores were not computed. No data were saved." Folling is my syntax: data: file is mi1.dat; variable: names are cl edu inc gen bmi year at1at7 ba1ba5; usevariables are at1at7; categorical are at1at7; model: !set cfa due to varimax. !set reference due to loading. fat1 by at1*; fat1 by at2@1; fat1 by at3; fat2 by at4*; fat2 by at5; fat2 by at6@1; fat2 by at7; fat1 with fat2; output: sampstat standardized residual modindices (0) tech1 tech2 tech4; savedata: file is 0618withfc.dat; save=fscores; Did I do something wrong? 2. If I chang the model to modify the relations between indicators and/or factors, the factor score also have different value? 3.Could the factor score be interpret as every responser's value of the concept/factor? How is it produced with all the categorical observed values? 4.Besides, how do I read the MI to decide whether free the parameter or not? thanks for your time. 


For the last question point 2, I meant the covariance b/w factor and/or indicator's measurement error. 


This can be diagnosed if you send your output, data and license number to support@statmodel.com. Typically, factor scores are not computed when a model has inadmissible parameter estimate values. 


It's my carelessness. I find out one indicator(at2) (standardized) residual variance is negative value like this"AT2 0.433 Undefined 0.14327E+01". Is there any suggestion you can offer to solve this problem? thank you. 


To add the previous statement, the correlation(std of loading) b/w indicator at2 & factor fat1 is more than 1(i.e. 1.197). 


This suggests that the model is not appropriate for the data  so the model needs to be modified. Possible modifications can be suggested my modification indices for crossloadings and residual correlations. Also check that you get only 1 loading fixed to 1 for each of the 2 factors. 


Thanks for your suggestion. After modifying the relation b/w measurement errors, the problem is solved. 


Following the question of Paul Kim posted on Monday, August 09, 2004  9:42 pm MINIMIZATION FAILED WHILE COMPUTING FACTOR SCORES FOR THE FOLLOWING OBSERVATION(S) : Is there any way to solve this problem? Does modifying the relation b/w measurement errors mean anything about this problem? 


This means that for some observations, the pattern of values in contradictory, for example, a person who gets the easy items incorrect and the difficult items correct. There is nothing you can do other than change the model. I suggest using Version 4.1 if you are not. 


According to the subject of my research, I will not change the relations of loading between the factors & indicators, so I modeified the model by incorporating the measurement error covariance to the model due to the value of residual correlation matrix more than .1. After the operation step by step, the residual correlation is all less than .1, the CFI, RMSEA, WRMR is better & better, the number of observations failed to computing factor scores is from 8 to 1. So, I decide to delete this observation from the research sample. Is the process of my operation proper? thank you. 


I don't think adding residual covariances with no substantive reason to obtain factor scores for problematic individuals is definsible. Although your theory may say one thing, your data may not be valid measures of the constructs in your theory. I would do an EFA to see if the variables are behaving as expected. 


Due to my understanding of your statement about the minization failure, and the initially number of observations failed to computing factor scores is 8 and the sample size is 1357, if I directly delete these 8 observations and run the CFA to save the factor scores again, is it acceptable? Sorry for my strange question. 


I am sorry for this stupid quetion. After trying the method just mentioned, the observations failed to computing will change to others and I never note that. 


If you are not using Version 4.1, you should upgrade to that. If you need further assistance, you should send your input, data, output, and license number to support@statmodel.com. 

owen fisher posted on Friday, October 20, 2006  3:47 am



It's carey. Observing one indicator, every level of the indicator has it's own meaning. Does the value of the factor score mean anything if I try to do the distribution of it. 


Can you describe this in more detail. 

owen fisher posted on Friday, October 20, 2006  10:15 am



Sorry, and I should describe in this way. There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. If these loadings are probit regression coefficient, how do I inteprete the relation between the factor and the indicators. thank you. 

owen fisher posted on Friday, October 20, 2006  10:25 am



Adding to the previous question, the number of these three indicator levels are 3, 4, and 4. Hope to make the question clearer. 


If you are using WLSMV, the factor loadings are probit regresson coefficients. You can interpret their sign and significance. That is probably what is most important if they are used as factor indicators. 

owen fisher posted on Sunday, October 22, 2006  12:11 pm



Hi, it's carey again. I describe my question in this way, and sorry for my illiteracy. There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. By the robust weighted least square, these loadings are probit regression coefficient. My teacher told me to do the distributoion of the factor score, but I know (if I don't misunderstand the factor score), as a index, I can't find the meaningful cutoff points of the factor score. Like the binary logistic or order logit model, the factor score can only be calculated to the probability of being one of the level of the indicator. After read the "logit and probit model:order and multinominal analysis" published from Sage univisity, I got a rough idea of the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator. 

owen fisher posted on Sunday, October 22, 2006  12:15 pm



Following the previous statement. If the thing is like that, I will directly describe the distributions of these three indicators without causing mental fatigue to find the good cutoff points of the factor score. However, I still need your suggestion. Is the reason of my decision okay? By the way, first, how to calculate the probability of being one of the level of the indicator from the factor score? Second, how to calculate the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator? Thanks for your patience. 


Even when factor indicators are categorical, the factors are continuous. You should save the factor scores and plot them. See Chapter 13 for how to turn probit regression coefficients into probabilities. 

owen fisher posted on Monday, October 23, 2006  3:15 am



Thanks for the suggestion, but how about the cutoff points of the factor score? 


There is no set way that I know of to find cutoff points for factor scores. 

owen fisher posted on Tuesday, October 24, 2006  5:40 am



Thanks for your patience. 

Anja Weiß posted on Monday, April 16, 2007  4:02 am



Dear Mr. and Mrs. Muthen, I have a Confirmatory factor model with ordinal variables and 7 factors. With SAVEDATA I have saved the factor scores. How are these factor scores calculated? What ist the theoretical backround and where do I find it? kind regards anwe 


See Technical Appendix 11. 


I saw in previous posts and in the technical appendices that Mplus uses the regression method for estimating factor scores for categorical outcomes with WLSMV. If I output the factor scores using the FSCORES option of the SAVEDATA command, can they be interpreted as any factor score would (i.e. if a respondent had a higher value, it indicates they have a higher level on that given factor)? Are the factor scores included on the saved dataset standardized? Thank you for your help and patience. Jessica 


For both continuous as well as categorical outcomes with weighted least squares, the factor scores are obtained using the maximum of the posterior distribution. For continuous outcomes this approach has been given the name "Regression method", but I don't think that is used with categorical outcomes. For categorical outcomes, the method is iterative. The answers to your questions are yes and yes. 


Hi there, I am trying to output factor scores for one latent variable with 14 indicators. I am using type = complex missing h1. I have 13,570 in my data set. However, I am using the subpopulation command which specifies a subsample of 11,488. When examining the factor score output I have factor scores for 13,557 individuals. I thought I would have factor scores for only the 11,488. Is the missing command generating scores for those not within the subpopulation? I wish to have factor scores for only those within the subpopulation (and not based on the whole population). However, if I do not specify the missing option then I will kick out those individuals who are missing at least one of the indicator variables. I want to make sure that my SEs are correct for the subsample. Thanks, Courtney 


Hello, I did confirmatory factor analysis with mplus and computed factor scores. When I checked the score I found that lot of people have negative factor scores. And remaining had positive factor scores. Now my supervisor wants an explanation for the negative score. Do you think negative score is an error? What is the possible reason for the negative score? Many thanks Joanna 


Typically, the mean of a factor is set at zero so it is natural that some estimated factor scores are negative and some positive. It is not an error. 


This is a response to Courtney Bagge's post. The factors scores that you have obtained are based on the model estimated from the subpopulation (not the entire population). So you can just ignore the factors computed for elements not in the subpopulation. Tihomir Asparouhov 


hello, I computed factor score after CFA and used these factor score as independent variable in logit regression in the next stage of my analysis. Now my supervisor wants to know the procedure how Mplus computes factor score. Do you think you could expain to me the procedure in not too mathematical way? Also, when I compute quintiles of these factor scores I get 2 people in first quintile, 4 in second quintile, 30 in 3rd quintile, 120 in 4th quintile and 94 in 5th quintile. Now first three groups have very few people so does it make sense to combine first 3 groups in one and do regression with 3 groups instead of 5. I shall be most grateful for your help. Joanna 


Factor score estimation depends on the scale of the variables. The algorithms are described in Technical Appendix 11 which is on the website. Why are you using factor scores? Why do you not simply estimate the full model in one step. It is always preferable to do an analysis in one step if possible. 


Thanks, Well I have computed the asset index by CFA for the family level data and then did the logit regression (with cluster option) with child level variable as dependent variable and both child level and family level variables (including asset index) as explanatory variables. My supervisor wants me to use quintiles of factor score rather than just scores as computed by CFA. Hence I computed factor score in the first step and then did the second step with quintiles of factor scores. Now I am not certain if it is alright to compute quintiles of factor score as they are standardised. Do you think we could do the whole analysis in one step? Is it alright to compute quintiles of factor score? Thanks Joanna 


In general, if you can avoid the intermediate step of estimating factor scores, you are at an advantage. I'm not sure why you feel you need factor scores nor quintiles of them. 


Hello, I am trying to get factor scores for a Latent Difference Score Model (McArdle). Unfortunately Mplus doesn’t give me any output when using the SAVE IS FSCORES; line. I have already checked the path (savedata)as well as the data set. However, I get an output with all the estimates (and no error messages) without the FSCORES command. Any comments how to solve this problem would be highly appreciated. Thanks! 


Do you also use the FILE statement in the SAVEDATA command to specify where to save the factor scores? 


Yes, I use the FILE statement and the values of my observed variables are saved in this file, but after adding the FSCORE statement I don't even get an output (with an error message, i.e. that the factor scores were not saved). 


Please send your input, data, output, and license number to support@statmodel.com. 


I am using CFA model where i specified a single factor (F1) based on seven continuous manifest variables. After requesting the factor scores using the option SAVE=FS I observed that the mean scores are zero. How can i calculate the factor scores for each observation such that the mean is not zero but the actual estimated mean of F1? Thanks 


Typically, the mean parameter for the factor is fixed (standardized to) zero. Unless you have multiple groups. So there isn't a nonzero estimated factor mean and the estimated factor means having mean zero is then desirable. 


Thanks Bengt for your response. Actually in my case i do want to estimate the actual mean of the factor F1, and incorporate this into the factor scores for each observation. Actually I have a second factor (F2), with similar variables at time 2, and analogously i have the corresponding factor scores. Since, both scores have mean of zero I cannot compare both of them. My goal is to compare the mean scores from F1 and F2. I was thinking in adding the estimated intercepts from F1 to the scores, does it makes sense? Any other suggestions? 


If you impose measurement invariance across time for items that are the same at the two time points you can identify a factor mean difference for the two time points. You can fix the factor mean at zero for the first time point and free it and let it be estimated for the second time point. The estimated factor scores for the second time point will then take this nonzero estimated factor mean into account (this occurs automatically in the prior of the posterior computations for the estimated factor scores). 


Do the factor scores extracted from a CFA in Mplus have the same desirable properties as an IRT score? Would the factor score for each individual be equal to the IRT score for each individual (say from a graded response IRT model)? If not, is there a consensus on which type of score is superior? Thanks. 


Yes, they are the same. 


Hi, how can I export the saved factor scores in Mplus for further analysis, for example in stata? 


Use the FSCORES option of the SAVEDATA command. See the user's guide for further information. 

nina chien posted on Wednesday, November 04, 2009  10:22 am



Hi, I saved out factor scores for 2 factors, closeness and stress. The factor scores for closeness range from 2.51 to 1.08 (distribution is somewhat negatively skewed), and for stress from .71 to 2.86 (distribution is very positively skewed). But the original items are on a scale from 1 to 5. Did Mplus automatically center the factor scores? I see that each factor has a mean of 0.00. Thank you for your help. 


Factor scores are not centered. They need a metric and factor score estimation gives them a mean of zero. 


I wanted to learn more about the factor scores created from a CFA in MPLUS. Here are a few questions. Thanks! (1) Would it be fair to say that they contain "no error" the way we think of it when we model everything in an SEM framework? (2) How do we know they are generally better than using a regression method or summing/taking means of items to create a composite construct/score? Is there something that can be cited in general and perhaps in particular with regard to the (relatively better) properties of these scores. (3) Would you still refer to the construct represented by these scores as "latent?" (4) I have 2 correlated constructs for which I am generating the scores from the CFA, one is a 3item and the other is a 4item. As usual I constrain the variance of each factor to 1.0 and freely estimate each loading. Am I wrong that this would essentially standardize the factors? I am getting a mean of (essentially) zero, but a SD of around .91 no matter what I do. Is there any problem with standardizing them? I have read Appendix 11, but I was hoping to learn more about their general properties. I use Mplus very regularly and love it. This discussion board is a tremendous resource. Thank you very much. 


It has been established that estimated factor scores do not behave like factors. See for instance Skrondal, A. and Laake, P. (2001). Regression among factor scores. Psychometrika 66, 563575. This shows the distortion in the means, variances, and relations with other variables when using estimated factor scores. Especially with a small number of items I would recommend instead using SEM, which also makes it possible to test that the item sets are unidimensional. 


Thank you. I would prefer to use SEM but, among other things, there are crossclassified random effects in my models, which I am able to deal with in the "mixed model" framework. I used CFA to test (multi)dimensionality and to examine measurement invariance across 2 groups. What is the best way to get a factor score or an "observed" score for my constructs if I can't/don't use SEM? Is there a better way than the method used in Mplus? I looked over the article  both of my variables are explanatory and one is a DV in one case. I am not sure I have the resources to carry out the method described there  is it the best way? Perhaps two seprate IRT unidimensional GRM models (but I was under the impression that the CFA method in Mplus gives the same scores)? Thanks. 


I am not sure if your items are continuous or categorical. For cont. items, Mplus uses the regression method of factor score estimation, which is equivalent to the Maximum A Posteriori (MAP) method of IRT. For cat. items Mplus uses EAP (expected...) which is standard in IRT. With only 3 and 4 items you won't get good factor score estimates  IRT typically works with many more items per factor, say 20 items or more. With cont. items the problem shows up in terms of a low factor determinacy and with cat items it shows up in terms of poor information functions. I am not aware of literature comparing summed scores to factor scores with a small number of items, but it probably exists (although see my dissertation paper Muthén, B. (1977). Some results on using summed raw scores and factor scores from dichotomous items in the estimation of structural equation models. Unpublished Technical Report, University of Uppsala, Sweden. at http://www.gseis.ucla.edu/faculty/muthen/articles/Muthen_Unpublished_01.pdf Anyone else? So, it sounds like you have to make a compromise in terms of deciding which feature to ignore: measurement error or crossclassified random effects. 


Thank you much for the valuable input. Yes, that is the dilemma (CCREs vs. Measurement Error  not the first time or likely not the last). My scores are Likert (5pt). Most people who I have asked think that ignoring the CCREs is a more problematic offense. 


CCREs is on our future's list. 

CEKIC Sezen posted on Saturday, February 27, 2010  1:45 am



Hello, My problem is the following: I have to complete analyses already done i.e. calculate factorial scores from an CFA which has been carried out with a TYPE=COMPLEX and a CLUSTER=SUBJECT. My first question is the following: the number of observations used in the estimate of the model is 1936, although my initial database was composed of 1556 observations. 1. What are the criteria of mplus for eliminating observations? Then I’ve tried to obtain factorial scores relative to the analyses already done: 2. Is it possible to obtain factorial scores directly from an analysis CFA TYPE=COMPLEX with CLUSTER=SUBJECT? As I could not do it, I’ve redone the analysis (by keeping the same model than before), a CFA with TYPE=GENERAL and IDVARIABLE ARE SUBJECT. I’ve recounted the factorial scores on this last analysis. Unfortunately, the parameters, their standard deviations and the indices estimated by this last analysis don’t’ correspond exactly to the first analysis performed with TYPE=GENERAL and CLUSTER=SUBJECT. 3.if it is not possible to obtain factorial scores with an analysis of the type TYPE=COMPLEX and a CLUSTER=SUBJECT, can the factorial scores obtained thanks to the estimate TYPE=GENERAL and IDVARIABLE ARE SUBJECT be interpreted in the framework of the first analysis, even if the estimate of the two models is not exactly identical? I hope that my questions are clear and that you can help me. Cordially 


Mplus uses ML under "MAR" which is sometimes called "FIML" and means that all subjects who have data on any of the analysis variables are used in the analysis. So perhaps your 1556 observations are the listwise present group, while 1936 is what ML under MAR uses. You can get factor scores directly in a Type=complex, Cluster=subject analysis. If you still have problems, please send your input, output, data, and license number to support@statmodel.com. 


My question has to do with latent variable means in a multigroup CFA. Given that the default is for the latent variable means are set to zero in the first group and freely estimated in the second group, does that mean that I can't compare the means across groups using a chisquare diff test? Basically, I want to conduct analyses in the context of the multiple group model to test whether or not specific latent variable means are significantly different across the two groups. How do I do this and have an identified model? 


You fix the factor means to zero in the second group. The chi2 difference between the models will then test factor mean equality across groups. 


So, the unconstrained model has the factor means set to zero in only the first group (freely estimated in the second group), and the constrained model sets the factor means for both groups to zero. Is that correct? And the chisquare diff between those two models tests equality of means across groups? 


Right. 


Thanks! How do I get the estimated means for the latent variables in Group 1, then, if they are always set to zero? And the estimated means for Group 2 are only in reference to Group 1, correct? 


You don't need estimated factor means for Group 1. It is only the difference in factor means that is identified and meaningful to discuss. And that difference is captured by the group 2 factor means. Note that this does not imply that every person's factor value is zero in group 1. 

Daiwon Lee posted on Sunday, May 23, 2010  4:28 pm



Hello, My advisor wants me to run a model where I saved the factor scores and then merge the factor scores into a new data set. I think I know how to save factor scores, but I don't know how to merge with original data set to use them in the analysis. Please help me. Many thanks in advance. 


See the new merge options in the SAVEDATA command for Version 6. The most recent user's guide is on the website. 

Daiwon Lee posted on Monday, May 24, 2010  6:28 am



Hi Dr.Muthen, Thank you for the note. However, could you please tell me how to merge saved factor scores with original data in version 5.21? I also tried to save factor scores in "dta" to merge with original data using stata program but stata failed to read the mplus saved factor score file. Thanks a lot in advance. 


See the FSCORES option of the SAVEDATA command. If you want variables saved other than the analysis variables and the factor scores, use the AUXILIARY option to name these variables. 


Hello, I encountered some strange results when saving factor scores for a multigroup CFA. I have a very simple model, with 2 groups and 3 items loading (strongly; >.70) on one latent variable. Factor scores are saved using the SAVEDATA option. In the newly created dataset, however, I find nearzero correlations (<.10) between the obtained factor scores and the original items for the second group. For the first group, correlations between items and factor scores are as expected. I checked beforehand, and factor loadings are about equally strong in both groups. The same strange pattern is found irrespective of the the order of the groups, whether I imply or release equality constraints on loadings and/or intercepts, use different items... Am I doing something wrong? This is the syntax I use (for a model with equality constraints on factor loadings and intercepts). FYI: I am still using an older version of Mplus (version 4), maybe this has something to do with it...  variable: names are item1 item2 item3 country; usevariables are item1 item2 item3; grouping is country (1=BE 7=DK); model: Y by item1 item2 item3; output: standardized; modindices; savedata: file is testout.sav; save = fscores;  Thanks beforehand Bart 


Please send the files to support@statmodel.com. 

Li Lin posted on Monday, July 19, 2010  8:47 am



What are the "*"s in the saved factor scores data set? Thanks. 


The asterisk (*) is the missing value flag. 

Li Lin posted on Monday, July 19, 2010  1:46 pm



Thanks! Another question  What are the first few columns in the saved factor score data set for? For example, I had observed ordinal (value = 1, 2, 3, 4, and 5)variables x1 to x4, then the factor score data set includes 4 columns called "x1" to "x4" in front of id and factor score. Compare these columns to the original observed data, it appears that x1 in factor score data equals to the original observed x1 minus 1. 


Mplus requires the data to have the lowest value of zero so the data are automatically recoded to 0, 1, 2, 3, and 4, The recoded data are saved. See the CATEGORICAL option in the user's guide for more information about this recoding. 

Dallas posted on Saturday, July 31, 2010  8:53 am



Linda, Good morning. I have a question about the comment you make above in replying to James (James L. Lewis posted on Monday, September 28, 2009  6:27 pm). James asks two questions it seems to me. 1) Do factor scores have the same properties as IRT scores? And, 2) are IRT and factor scores the same. You indicate yes to both. For properties, this makes sense (assuming, of course, a factor model that corresponds to an IRT model). However, it doesn't seem true that factor scores EQUAL IRT scores. Correct? In other words, if we used EAP scoring and the loadings, etc. from the factor model, we'd get one set of factor scores. If we then converted the factor model parameters into IRT parameters, and again used EAP scoring to get IRT scores, it doesn't seem to me the factor scores would EQUAL the IRT scores. They would have similar measurement properties, but it doesn't seem like they'd be identical scores (in value). If I am right, can you or Bengt provide a formula to convert factor scores into IRT scores, like formulas do to convert the parameters? Thanks! 


IRT often refers to the 2PL model estimated under maximum likelihood. The 2PL model is the logistic model in Mplus for binary items and using a single factor where you free the factor loadings and fix the factor variance at one. With 2PL and ML estimation, Mplus gets the same loglikelihood as IRT programs. And it computes factor scores using the same EAP (expected a posteriori) method. So this is the same as in IRT and the same factor score values should be expected. ML probit is also possible, and again EAP is used. WLSMV probit is also possible, in which case MAP is used. 

Dallas posted on Saturday, August 07, 2010  5:51 am



Dr. Muthen. Thanks for your reply. You replied so quickly it took me a few days to notice! Yes, I was thinking about using probit and ML, and also thinking about WLSMV probit. In those cases, it seems one would have score with similar measurement properties (e.g., one would still inherit the properties of IRT models), but not identical scores with respect to the "traditionally" estimated IRT model. It seems, though, that a general formula for converting scores from the probit metric to the logistic metric does not exist? And, thanks for the nudge regarding logistic and ML. It does make sense that and I do agree that with logistic and ML (and appropriately identified model), one would achieve the same results as the IRT model. Thanks. 


When obtaining factor scores via the savedata command, is it possible to also output other variables (such as subject ID) that were not in the CFA? 


oops... never mind the post for crag neumann, just found the idvariable option. 


Are the factorscores Mplus estimates when using the MLR estimator corrected for measurement error? 


Yex. 


Hello, I have a question about obtaining factor scores. The way I understand it Mplus factor scores from a CFA with Likert items and the WLSMV estimator will give me factor scores that are equivalent to Graded Response Model (GRM) IRT scores (EAP estimation [with a normal prior I presume?]). 1. Is this correct? My other question is with regard to local independence and unidimensionality. 2. I have 15 Likert items (5 point) with which I am trying to measure a single construct. It is pretty clear to me, however, that there are local dependencies and maybe ultimately >1 factor among these items. If I appropriately specify a bifactor model or otherwise appropriately estimate correlations among the residuals however (to get rid of or "account for" the local dependencies), will my estimated factor scores for the GENERAL factor essentially be equal to GRM person scores (theta) from a model where the unidimensionality and local independence assumptions are satisfied? If I could I would just stay within the latent variable framework, but for my application here I really need the factor scores. I hope this makes sense. Any citations would be tremendous also. I love Mplus. Thanks. 


1. With WLSMV it is actually MAEP (maximum a posteriori) that is used. With ML it is EAP. For WLSMV factor score estimation, see the Technical Appendix for Version 2 on our web site. 2. Yes. I know of no citation for this. 


Thanks much. MAEP with a normal prior? I apologize, I cannot locate Technical Appendix Version 2 on the website, only Version 3. It looks like Appendix 11 in the latter only covers estimation of factor scores for continuous and binary data, not ordered categorical(?). Thanks. 


Yes, normal prior. You find it under Technical Appendices (see left margin of the home page) and once on that page, it is the first link which gets you to: http://www.statmodel.com/download/techappen.pdf Appendix 11  see (229) and below. 


Thanks much. I see it now. I have one final question. MAEP and EAP scoring of course require the specification of a prior distribution. I am a bit confused in that if I specify a normal prior (for either method), should I not expect that the resulting distribution of ThetaHat will be normal or close to it? particularly if the population distribution of Theta is indeed normal? What if the population distribution of Theta is not normal? I am getting conflicting reports on all this. Can you perhaps clarify. Thanks much. 


The posterior distribution can be quite nonnormal even with a normal prior. For example, if the items are too difficult or too easy we can't discriminate between people who are high/low and the posterior (thetahat) will be skewed. 

ywang posted on Friday, February 11, 2011  8:17 am



Dear Drs. Muthen, Does IRT model usually stand alone? I included IRT in the SEM, but cannot find the model fitness criteria for the SEM. What model fitness criteria can we get for the SEM with IRT? Also can you refer any paper that describes the model of SEM with IRT? Can it be described in similar way as the CFA model in SEM except for that the indicator variables are categorical? 


It sounds like you are using maximum likelihood estimation and categorical outcomes. You will not receive chisquare and related fit statistics in this case. If you use weighted least squares estimation, you will. IRT is CFA with categorical outcomes. There are many IRT books. We have some papers on our webiste under Papers/IRT. 

xstudylab posted on Wednesday, February 23, 2011  10:09 am



I just switched to Version 6.1 and now I can't get factor scores for my model... I get the error message 'FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE' but indicators and factors are only regressed on exogenous variables. I ran the same input using Version 5 and it gave me the same estimates as Version 6.1, but it produced factor scores without an error message. Is there something different about how Version 6.1 produces factor scores? 


Please send the output and your license number to support@statmodel.com. This may be a problem in Version 6.1. 

Dena Pastor posted on Wednesday, February 23, 2011  1:52 pm



I'm running a regular CFA model and a little uncertain how to interpret the information provided in the output under SAMPLE STATISTICS FOR ESTIMATED FACTOR SCORES obtained using PLOT3. 


When there are factors in the model, the PLOT command provides factor scores. The descriptive statistics are for the factor scores. 


I'm interested in buying MPlus Base because of the ability to handle binary and ordinal variables in CFA. I do have a question regarding the factor scores that I hope you can address. The scores are continuous, yet I need to convert them to either binary or ordinal  the same scales as the original variables used in the CFA. Does MPlus do this or is there a way to program this conversion (does MPlus, in fact, have programming capability?)? Any papers on converting? Technical Appendices? Notes? 


Mplus does not do this. There would need to be further information in order to define such a conversion. Mplus does not have a programming capability. There is an IRT literature on conversions such as that related to NAEP with writings by Mislevy and others. 

ywang posted on Wednesday, March 02, 2011  10:59 am



I am working on a SEM with a latent factor by IRT (3 dummy indicator variables) as an independent variable. I was asked by other researchers for description on the latent construct. They believe that the latent variable must have some sort of values and would like to describe the latent contruct in the way of range and distribution. It seems that the latent variable does not have a metric and it is not possible to be described in such a way as an indicator variable. I am wondering whether it is appropriate to describe the latent construct using factor score instead. However, I have some concerns since(1) estimated factor scores differ between the standalone IRT model and the SEM model, and (2) factor score is not exactly the latent construct and still has measurement error. Do you have any suggestions on how to describe the latent factor in the SEM? Thank you very much for your help! 


In a crosssectional model, a factor has a mean of zero and an estimated variance. 

ywang posted on Wednesday, March 02, 2011  12:32 pm



Thank you very much for the reply. I have a followup question. In the standalone IRT model, I got the factor variance as 0.122. However, when the IRT was included in the SEM, the factor variance was changed to 0.202. Which variance should I report? Is this inconsistency due to that the SEM does not fit the data well (CFI 0.873, TLI: 0.762)? 


If the factor is an independent variable in the SEM then the estimated variances should be close within their SEs. If not, as you say, the SEM may be illfitting. I would not use estimated factor scores here given that you have only 3 indicators. The factor metric of the SEM is clear: your model postulates a normal variable with a mean of zero and a certain variance (or you can fix the variance at 1 to get a z score, and then free the first loading). 

ywang posted on Wednesday, March 02, 2011  2:07 pm



Thanks a lot. Your reply greatly helped me. I have another question for factors by subgroups such as gender. If I have to list the mean and variance of factor score for males, for females and for all the sample in one table as well as the p value of difference of factor score between males and females, what should I do? In your previos discussion with other researchers, I understand that multigroup analyses should be used to compare whether the factor mean differs between males and females. In the model you previously mentioned, mean for the factor among one group (e.g. males) is fixed as 0 and the mean is freely estimated in the other group (e.g. females). For that table, I need to list mean and variance for both males, females, and overall sample. How can I relax the means for both groups in the multigroup analyses? Thanks! 


In multiple group analysis, a test of factor mean differences is a difference test between a model with factor mean zero in one group and free in the other groups versus a model with factor mean zero in all groups. Please see Slide 223 of the Topic 1 course handout. 


Dear DR. Muthen I am running a CFA with categorical indicators, i requested Mplus to compute the factor scores. The model fits well except that mplus cannot compute the factor scores. this is the message THE MODEL ESTIMATION TERMINATED NORMALLY FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE. is there a way to overcome this and get the factor scores thank you fernando 


I think you will find that factor scores are saved in spite of this message. Check that. There is an incorrect error check in Version 6.1 that produces this message but still gives valid factor scores. 


thank you, but i could not find the file with the scores. in the output i had this message: SAVEDATA INFORMATION Factor scores were not computed. No data were saved. this is the command i used to save factor scores savedata: file is E:\fandrade\pisa\paper1\revised soc of educ\scpfscores; save is fscores; 


Pleas send your input, data, output, and license number to support@statmodel.com. 

Jan Ivanouw posted on Thursday, March 31, 2011  12:52 am



Hi, I wonder if Mplus can give SE's for Factor Scores? 


Yes, for factors with continuous factor indicators. 

Helen Zhao posted on Tuesday, May 17, 2011  12:16 am



Hi, professors Muthen, I wonder is it possible to obtain factor scores for error terms in Mplus? Thanks, Helen 


You should be able to do this. See the FAQ on the website called Regressing on a residual. 


Hello, Reading over this post stream, I understand that SEM is generally preferable to using factor scores. I have a related question. When use of all items, or parcels, is prohibited because of sample size limitation, are there any advantages to using factor scores in lieu of total scores for first order factors in an SEM model? In other words, I was thinking of using factor scores to represent first order manifest scales within latent factors in an SEM model. I was hoping to reduce some secondorder factors to first order factors in this way. Are there any advantages of accuracy and reduced error when using factor scores in this way, as opposed to using total scores? Thanks in advance for your help. 


In your case, when you are using the factor scores or sum scores as factor indicators, measurement error is taken into account. I can't think of a reason why one would be preferable to the other. 


Hi. I am working on obtaining factors scores. The problem that I have run into is when I look at the output data set the values for my weight variable that are output are not the same as the original weights that I input. The cluster and strata values were not altered. Does Mplus alter the weights when it is using it? For example the original weight variable ranged from ~4 to ~3000. Now in the output dataset it ranges from 0 to ~4. I appreciate any thoughts. Thanks! Here is my code: WEIGHT IS W1; STRATIFICATION IS STR; CLUSTER IS NEWPSU; SUBPOPULATION IS EX EQ 1; MISSING ARE .; IDVARIABLE IS id; ANALYSIS: TYPE = GENERAL COMPLEX; MODEL: !MEASUREMENT MODEL F1 BY V1 V2 V3 V4 V5 ; F2 BY V6 V7 V8 V9 V10 ; 


If the sum of the sampling weights is not equal to the total number of observations in the analysis data set, the weights are rescaled so that they sum to the total number of observations. 


Dear Drs. Muthen, I tried to estimate an IRTModel, which works, yet it does not return factor scores, but strangely enough there is also no error message. It seems like this or some similar problem was addressed in the forum before, at least there is a thread from Goran Milacevic posted on Saturday, July 12, 2008  1:20 am which sounds similar. Can you reconstruct what the solution was back then? Thank you very much in advance for your advice! Michael 


Please send your input, data, output and license number to support@statmodel.com 


While I was waiting, I got a solution: Another trial with an increased number of integration points worked to return out the fscores output. Thanks, Michael 

Evelyn posted on Friday, October 28, 2011  3:33 pm



I have done an EFA and would like to save the factorscore and use them in a pathmodel (the N of the dataset is too low to include the measurement model in the path analysis) I've read it is not possible to use "save=fscores" with the EFA command, so I have used Analysis: Type = COMPLEX; !data from students in classes Model: f1f3 by es_vrothes_marry (*1); !EFA indicated 3 factor model Savedata: file is moroc_fscores.dat; save = fscores; I've compared the GEOMIN ROTATED LOADINGS of the EFA and the CFA and noticed slight differences. Why is that? Could it be problematic? Also I was wondering how I can name the factors for easy identification thank you 

Evelyn posted on Friday, October 28, 2011  3:38 pm



Apologies: when I compare the standardised loadings they are identical. 


Loadings and therefore factor scores are different in EFA than CFA because the models are different, with different degrees of freedom. EFA presents standardized loadings because a correlation matrix is analyzed whereas CFA uses a covariance matrix so that only the standardized solution is close to the EFA. You should watch the video of our Topic 1 course to learn more about these matters. 

Kerry Lee posted on Sunday, November 20, 2011  8:25 pm



Dear Drs Muthen, I am running a confirmatory CFA. Depending on the indicator to which the scale of the factor is fixed, the variance of the factor does not always attain significance. Specifically, when I fixed the metric of the latent to the indicator with the largest loading (i.e., when the factor variance was fixed at one), it failed to attain significance (p = .076). It attains significance when its metric is fixed to one of the other two indicators. To obtain more information on the distribution of the latent, I generated some histograms using the PLOT2 command. When I asked to "view descriptive statistics", the variance value (.133) differs from that in the text output (.147). Would you have some suggestions on why there is a discrepancy? My second question relates to the Kurtosis value. Has it been rescaled, with zero denoting no departure from normality? Sincerely, Kerry. 

Kerry Lee posted on Sunday, November 20, 2011  8:34 pm



Regarding my previous post, the histograms were generated using PLOT3, not 2. Kerry. 


I think what you are seeing is the difference between the estimated factor variance parameter and the variance of the estimated factor scores. They are not expected to be the same. Yes, Kurtosis zero is no such departure. 

Kerry Lee posted on Sunday, November 20, 2011  8:45 pm



Regarding my previous post, the histograms were generated using PLOT3, not 2. Kerry. 

Kerry Lee posted on Sunday, November 20, 2011  8:54 pm



Thanks very much for the quick reply. I want to say something about whether there is a significant amount of variance in the latent factor, should I report the value from the text output (I assume this is the estimated factor variance parameter). Kerry. 


Yes, use the estimated factor variance parameter and its SE in the printed output. 

nanda mooij posted on Wednesday, December 07, 2011  7:48 am



Dear Drs. Muthen, I have a model with first, second and third order factors, and I want to estimate the factor scores of the first and second order factors with the calculated item parameters of the third order factors. Now I am wondering if I could get these factor scores through putting the item parameters of the third order factors in the input. The items have 3 categories, so are polytomous and are nonordered. I saw the appendix about the estimation of factor scores, but that's only about dichotomous or continuous yvariables... So can I put item parameters in the input and if not, where can I find an appendix about estimating factor scores of categorical, nonordered, yvariables? Thanks in advance, Nanda 


You can obtain factor scores for all of the factors using the FSCORES option of the SAVEDATA command. See Technical Appendix 11 on the website. 

nanda mooij posted on Thursday, December 08, 2011  9:21 am



Dear Drs. Muthen, I know that I can obtain the factor scores for all the factors at once, but I actually want to obtain the factor scores of the total scale using the itemparameters of the subscales (so I first estimate the item parameters of the subscales per subscale, and I want to use these itemparameters to estimate the factor scores of the the total scale). So actually what I want to do is the way it is done in MULTILOG, where I must provide a file with the itemparameters I want to use in it. I'm wondering if this is also possible in Mplus to put a reference of the itemparameters in the input. I hope I'm explaining myself better now. Thanks a lot, Nanda 


You can fix the item parameters at the values you want in the MODEL command and only request estimation of the factor scores. 


We are running a CFA with categorical indicators in which some loadings and thresholds are fixed (and others are not). We want to obtain factor scores. Mplus tells us that "FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE". What does this mean? We have no regressions in the model, or other path structure, apart from a CFA. 


If you are not using Version 6.12, download it. This error message came out in error in an earlier version. 


We want to obtain factor scores from a CFA of categorical indicators to use in a subsequent path analysis because the full model, including the measurement model, would be too large for our sample size. We want to use a singleindicator SEM approach for the path analysis, however, so we want to estimate the reliabilities of the factor scores. We have found a very sensible formula on the internet in which we would divide the factor variance by the sum of the factor variance plus the factor score variance (based on the notion that the factor score variance estimates the error variance). That post on the web indicated that Mplus could give us output corresponding to the withinsubject factor score variance (which is constant across subjects). Is this correct? If so, how do we get it? 


If you use SAVE=FSCORES you will obtain factor scores and standard errors of the factor scores. 


This is Rick Zinbarg's graduate student, Alison, following up on the question that he asked about our analyses. While it is true that when you use SAVE=FSCORES for models of _continuous_ items, you obtain standard errors for the factor scores, we have specified that the items in our data set are categorical. When we do this and use the SAVE=FSCORES command, MPLUS still gives us factor scores but no longer gives us their standard errors. Is there a way to get the withinsubject factor score variance for categorical data? Our goal in getting this information is to calculate the reliabilities for the factor scores we are obtaining in order to use them in a singleindicator SEM approach for path analysis. Thanks for your help. 


With categorical outcomes, the standard errors vary as a function of the factor values. You need to take those values from the plot of the information function you get using PLOT2 in the PLOT command. The standard errors are computed as 1/square root of the information function value. 


Thanks for the info about getting the information functions. Unfortunately, we neglected to mention in our earlier posts that we are trying to obtain the withinsubject factor score variance for a group factor in a hierarchical model (e.g., one in which items load upon both group factors and a general factor). We are able to use the PLOT 2 option to get Mplus to give us the information functions for both a unidimensional model of the the items comprising one group factor and a unidimensional model of all items. However, is there any way to get MPLUS to give us information functions for each factor in a multidimensional model in which items are categorical? We are hoping to use this information to calculate the reliability of factor scores generated from our model. 


You have a choice of showing individual factors in PLOT2. Go through the windows and you will find one where you can choose which factor to plot. 


Hi, when I conduct CFA with categorical data, one part of the output also show IRT results. I am wondering the saved factor score can be used as IRT score? Thank you. 


Yes, the saved factor scores can be used as IRT scores. 


Thanks Lina. A follow up on creating IRT score. I have, for example, 4 waves data, and for each wave I create the IRT score according to saved factor score. I would like to use these four waves' saved score to fit a LGC model. How do I do this in Mplus? Thank you very much 


You can save the factor scores using the SAVEDATA command and then use the saved data to estimate the LGC model. Unless factor determinacy is one, the factor scores are not the same as using the factors in the model. I would suggest a multiple indicator growth model instead. 


Thanks Linda, perhaps I did not state my question clearly. I know how to save the factor scores. How do I combine saved scores for each waves? or should I run CFA for each wave at once and save scores. So, I will have only one file? 


If you want to use the factor scores from all of the waves in a single analysis, you need to save all of the factor scores in one file. This can be accomplished by running all four waves together and saving the factors scores. You would want to do this as a first step to determine measurement invariance across time. 


Dear Linda and Bengt, I need to calculate the standard errors of the factor scores for the CFA model with categorical outcomes. From your explanation above, it looks like in categorical CFA, std. errors can not be computed along with the factor scores in a single step but need to be further derived from the plot of the information function. This is where I am getting lost. Could you please describe the sequence of steps one needs to follow in order to produce the std. errors for the latent factor scores for each respondent (or refer me to an example). I also wonder if there would be a way to output these scores in a txt or dat format in order to append to the vector of the factor scores? Thank you! 


Factor score SEs have been implemented in more cases in Version 7. Still, WLSMV doesn't give them yet. The information function gives the inverse of the square of the SE at different factor score values. 


Dr. Muthen, Thank you for your prompt reply. Just to clarify, in v. 7, I would be able to get the errors if I used ML pr Bayes estimators instead of WLSMV, correct? Unfortunately, since I want to calculate S.E. for each unique value of the latent scores (n=3400), using the information function graph would be impractical. 


If you send Support your input, we can check if that case gives SEs. 


Hi, i am estimating a model using WLSMV and want to compute factor scores. Mplus (Version 6.1) gives me the following error message: FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE. I have four ordinal indicators. What does the error message mean and is there a way to solve this problem? Kind regards, Christoph 


You cannot obtain factor scores for your model using weighted least squares. You can use maximum likelihood instead. 


Thanks for this really quick reply! Could you maybe give me a short explanation what the error message means and why WLSMV cannot be used in such a case? I would like to understand why it is not working. Thanks in advance! Christoph 


You must have a situation where you regress something on a dependent variable. Factor scores have not been developed for this model. 


Hi, again thanks for the quick reply. I am not sure I understand you response. In my case the model is simply: Model: Y1 by f_oper99 f_kino99 f_sport99 f_freVerw99; Christoph 


Oh, I think there was a version where this message came out in error. You should use a newer version. 

Jason Bond posted on Tuesday, July 09, 2013  10:31 am



Hi, I had a question regarding estimated factor scores obtained from a CFA on polytomous factor indicators. For continuous factor indicators, my understanding is that the estimated factor scores are standardized to have mean 0 and variance 1. However, estimated factor scores from polytomous factor indicators have, in my case (a majority of the items are skewed), a negative mean which naturally raises the question with those who I am working with what the scale (or normalization) is for these factor scores. I found this question difficult to answer with Tech appendix 11. Thanks for an help, Jason 


Factor scores are not standardized. The can have any mean or variance. With categorical items that are skewed, you will see this in the factor scores. 

J.W. posted on Sunday, August 18, 2013  12:27 pm



To my knowledge, when items are continuous, Mplus uses the regression method to estimate factor scores; when items are categorical, Mplus uses Expected A Posteriori (EAP). I have a CFA with ordinal items measured on a 5point scale: 1) Treating item as continuous measures and save estimated factor scores in Mplus; 2) Treating item as categorical measures and save estimated factor scores in Mplus. Both model fits data very well, and the factor scores estimated from the two models are highly correlated (r=0.98). I am wondering if the model results indicate that I can simply treat my 5point Likert scale items as continuous measure for CFA? If so, any reference? It is much easier to use CFA with continuous items for measurement invariance testing. As I recall, the ALIGNMENT option is only available for continuous or binary items in the current version of Mplus. Will the option be available for categorical items in the future verion of Mplus? Your help will be appreciated! 


When items are categorical, WLSMV uses Maximum A Posteriori (MAP) and ML uses EAP. Typically, it is ok to approximate Likert scale variables as continuous and use linear models unless there are strong floor or ceiling effects. You asked for references  here are 2 classic ones: Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of nonnormal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 1930. Yes, alignment method is likely to be expanded in various directions. 

J.W. posted on Monday, August 19, 2013  8:15 am



Great help! Thank you so much! 


Hello, I am using the alignment method, new to version 7.11. I have attempted to use the SAVEDATA command to save the results/import the output to a data file, but no data file appears or is empty when I open it. Are there SAVEDATA options for the alignment method? I have thousands of pages of output and am trying to find ways to make it easier to interpret/manipulate/analyze the output. Any suggestions you have for this will be much appreciated. Thank you for your time. 


We are not aware of any specific problems involving the RESULTS option and alignment. Please send your input, data, output, and license number to support@statmodel.com. 

deana desa posted on Thursday, November 07, 2013  7:52 am



Hi, Is there any way to tell Mplus to keep cases with missing on all variables as the way there are in the input data, although these cases won't be used in the modeling when computing factor scores for other cases? 


There is no way to do this. The analysis data set is saved and it does not contain these cases. 

Li posted on Tuesday, November 19, 2013  2:37 pm



Dear Drs. Muthen, I have a question about factor scores derived from CFA. I ran a fourfactor CFA with 16 items. So each factor has only a few indicators (dichotomous, yes or no). The purpose of the CFA is mainly to confirm the factor structure for some followup multilevel analysis. I understand I can use onestep multilevel SEM for this, but I can’t do that due to sample size and other issues. I compared sum score and factor score for the four factors and found them very similar. So I used sum score for the followup multilevel regression analysis. A colleague thinks factor scores are superior to sum scores because they take into account of the correlation among the factors. BTW, the correlation among the four factors in our CFA ranges from .48 to .75. According to this colleague, if I use factor score in the followup analysis then multicollinearity is already dealt with. I read the Mplus posts and the technical appendix but still can’t tell whether the factor scores from CFA already incorporates the correlations among the factors or not. What do you think? Especially do you have a reference for this? I really appreciate it. Li 


The factor scores are computed based on the model that is estimated. If the model includes correlations among the factors, the factors scores incorporates them. 


I had a question about how Mplus calculates factor scores when cases have some missing data on continuous indicators. As I've read, I gather that Mplus uses "regression or Barlett methods" when estimating factor scores for continuous data, and it uses "all available data" to estimate these scores when data are missing. However, it's not very clear exactly how this is accomplished. Could you please say more about how Mplus manages to calculate a factor score when a case has missing data? Thanks, Aidan 


If a subject has missing data on all variables in the model, a factor score for that subject cannot be computed; in fact, that subject is not even included in the model estimation. But if the subject has some of the variables observed a factor score is estimated. For instance, if you have a longitudinal model with 1 factor at each of 3 time points and a subject is not present at time point 2, his factor score at time point 2 can be estimated because he has observations at time 1 and 3 and the estimated model says how much the time 2 factor correlates with the time 1 and time 3 factors. The SE of the estimated time 2 factor score for that person is going to be higher than for a person who had no missing. 


Thanks, Bengt. So to clarify, when there is partially missing data for an individual, Mplus uses the individual's available scores and the model estimated relationships among the variables to estimate the missing factor score. Right? And, if it were a 1factor model, with 4 items loading on it, and for an individual 1 is missing, then I assume that Mplus will use the model implied covariance among the items to estimate the score as if the value were present? Would you say this would result in similar values to using EM based imputation? Thanks for the clarification. 


I agree on the first paragraph. EMbased imputation typically concerns the missing data on the items, not the factor scores. 

Tom Booth posted on Tuesday, March 04, 2014  8:54 am



Linda/Bengt, I have conducted a CFA on 7 items with a binary response format based on data with a family structure. As such I have included clustering based on a family identifier in the model commands. I need to produce factor scores for subsequent analyses. I wanted to confirm that the clustering does not impact these scores? As I understand it corrects SE, and as a result would not impact on the computation of the score. However, I am not 100% confident on how the factor scores for binary responses are computed in order to be sure the above is correct. Thanks Tom 


It sounds like you use Type=Complex to take care of clustering. In which case you are right about the factor score estimates. If you use Type = Twolevel you can get factor scores on both levels. 

Tom Booth posted on Monday, March 17, 2014  6:32 am



Thanks Bengt 


Hi, I saw previously on this thread that there might be an issue with version 6.1 and reporting factor scores from a CFA. We're having that issue right now  we're running a CFA with categorical indicators and requested factor scores, but each time we're getting the same message: FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE. Is there a specific solution we should use? 


Yes, there was an error in a check that was introduced in Version 6.1. There is no workaround for this. 


Dear Dr. Muthen, We performed CFAs on several measures having from 3 to 9 indicators each, measured on three different scales: 13, 14, and 15. Within each measure, the scale is the same. We treated the measures as continuous. Our purpose is to perform a latent class analysis using the resulting factor scores. We have noticed that the range of the factor scores is wider than the original measurement scale. For example, for a measure with 4 indicators (scale 14), our factor scores range from 0.22 to 10.86. We rescaled our items so that the range is 1 to 3 for all of them. However, when we recalculated the factor scores using the rescaled indicators, the factor scores were identical to the FS calculated with the original metric. Given that cluster analysis is sensitive to the metric of the measures, we wanted to ask you how the factor scores are calculated in Mplus, and whether the metric of the indicators has an influence on the factor scores. Thank you. 


Please send the output and your license number to support@statmodel.com. 


Dear drs. Muthen, I did a multiple group CFA with equality constraint on the factor loadings and I computed the factor scores. When I checked the factor scores distribution, I found that they had a very small variance (0,015 for the first group and 0,018 for the second group) and a very reduced range of variation (between 0,346 and 0,342 for the first group). My questions are: Are the estimated factor scores standardized? How is the scale of the factor scores defined (mean and variance)? Thank you very much for your help. Sara 


Factor scores are not standardized. The scale of the factor score is determined by the estimated model. Please send your output including SAMPSTAT and your license number to support@statmodel.com. 


I observed that in multiple group models (with known group membership, specified with KNOWNCLASS option) with categorical dependent variables and full measurement invariance assumed (only expected values and variances of latent variable are set as free parameters in non reference groups), estimated with MLR in Mplus 7.2 I don't obtain standard errors of estimated factor scores. Is it planned to enable estimation of factor scores standard errors in such a case (and in multiple group models estimated with MLR in general) in future versions of Mplus? 


It is correct that standard errors for factor scores are not available in this cases. This is on our list but won't be in the next update. You can get the standard errors by the running one group at a time with parameters fixed at the estimated values. You can use the SVALUES option of the OUTPUT command to get input with the estimated values as starting values and fix the * to @. 


Dear Professors, I apologize for posting several questions (I posted one on a different topic yesterday). I am trying to obtain factor scores for a twolevel MSEM model with three factors on level 1 and two factors on level 2. However, the variables are all ordinal, so I am using the WLSMV estimator as you recommend elsewhere for ordinal data. The trouble is that Mplus gives me an error that "Factor scores cannot be computed for TYPE=TWOLEVEL with the estimators ULSMV, WLS, WLSM, and WLSMV." Is it possible to obtain factor scores for a twolevel model with ordinal data? Sincerely, Lance Rappaport 


You would need to use maximum likelihood to obtain factor scores for this model. 


Dear Drs. Muthen, I have a onefactor model with a mix of ordinal and continuous indicators, with a residual correlation between two continuous indicators, fit using WLSMV. USEVAR = x1 x2 x3 x4 x5 x6 x7; CATEGORICAL = x1 x2 x3 x4 x5; ANALYSIS: ESTIMATOR = WLSMV; MODEL: f BY x1 x2 x3 x4 x5 x6 x7; x6 WITH x7; I am interested in generating factor scores from this model. Will the factor scores in this case incorporate the correlation of the error terms of x6 and x7? I read in tech appendix 11 that when one indicator is binary/categorical, residual correlations are assumed to be zero in the computation of the MAP factor scores. In this case, the two items with residual correlation are continuous. I am hoping that the factor score computation, while assuming zero residual correlations for all the ordinal indicators, preserves the residual correlation between the two continuous variable. Thank you! 


The WLSMV factor score does not include x6 WITH x7, but you can do it using ML. Just replace x6 WITH x7; to instead capture the residual covariance as f BY x6 x7; f@1; 


Thank you so much! This is great to know. 

shaun goh posted on Monday, January 18, 2016  3:21 am



Dear Drs Muthen, Are saved factor scores from the following WLSMV onefactor model equivalent/or at least a reasonable proxy in scale to the scores estimated by IRT of theta? (i.e. a saved factor score of 1.2 would correspond to 1.2 SD of theta) model: f by u1* u2* u3* u4*; ! Where u1 to u4 are binary [f@0]; f@1 


Yes. It's just a different estimator and probit instead of logit link. Note that you can use ML to get the usual theta scores. 


Hi I have managed to save my factor scores with the following command savedata: file is fscores1.dat; save = fscores; And under the VARIABLE command in my model, I have identified which is the ID variable: IDVARIABLE IS mcsid; so that I can match up which factor score goes with which id. However, when I run my model, all the Mplus id variables are 0 (and a few are 1)  they don't match up to the mplus dataset like they should (and how they appear in my dataset inp file) (i.e. they should run from 1  5899) can you help? 


Please send the files and your license number to support@statmodel.com. 

Rick Borst posted on Monday, July 25, 2016  4:11 am



Dear drs. Muthen, I developed a measurement model first using CFA. Everything ran fine. Then I started to relate the variables towards oneanother (structural model) and I received the following message: THE MODEL ESTIMATION TERMINATED NORMALLY MINIMIZATION FAILED WHILE COMPUTING FACTOR SCORES FOR THE FOLLOWING OBSERVATION(S) : 1 FOR VARIABLE EFFIC1 2 FOR VARIABLE EFFIC4 3 FOR VARIABLE EFFIC1 4 FOR VARIABLE EFFIC1 5 FOR VARIABLE EFFIC1 6 FOR VARIABLE EFFIC1 7 FOR VARIABLE PSM2 8 FOR VARIABLE PROACT2 etc. Why did it ran properly when I did not related the factors to oneanother and now it does not? What can I do about it? Thanks in advance! 


Please send the two outputs and your license number to support@statmodel.com. 

Rick Borst posted on Tuesday, July 26, 2016  4:24 am



I Try to send you my datafiles but keep receiving the message through email: For the following reason: Mail size limit exceeded. However the size is merely 500 kb. Is there an alternative way to send the files? 


They have been received. It must be a problem with our mail server. It is being looked into. 

Rick Borst posted on Friday, August 19, 2016  1:39 am



Hello, I am trying to conduct moderation analysis. And I have a few questions: 1. The latent variable moderation with LOOP plot example (following UG ex 5.13) has two asterices beyond the indicators at the righthand side of the BY statements of the moderator and the independent variable. a) Why is that? b) Do all the indicators in the BY statement need an asterix or just one indicator of every variable? 2. I have a mediation analysis 5 IV's, 1 mediator and 2 DV's. They are all latent variables existing of categorical variables. I want to check whether the moderator (also a latent variable) influences the effect of 2 IV's on the mediator. I need the R square of the mediator (with and without the interactions) and after that the LOOP plot of both interactions (so two LOOP plots). Is this feasable? Because I get stuck all the time at the moment (Errors such as: too many dimensions, the model has reached a saddle point, the model estimation did not terminate normally due to a nonzero derivetative... check you starting values... the loglikelihood derivetative for parameter .. is 0.86 etc.). 

Rick Borst posted on Friday, August 19, 2016  1:40 am



I am sorry, this is in the wrong thread. I will ask it in another area. 


Answered in the other spot. 


Dear all, somewhere in this forum is noted that Mplus calculates factor scores even for cases that have missing data on some of the continuous indicators. I understand that Mplus uses information based on the other indicators to estimate the factorscores. How is this precisely calculated? More specific, I want to calculate the Bartlett factor scores. Given that mplus does not estimate these and I use matrix calculations. I always end up with missing factor scores for the subjects that have some missingness on their indicators. Thanks in advance. 


To estimate the f value for a subject, Mplus maximizes the likelihood function g(y  f) * g(f) where the first term splits up in a product of univariate y_j f and if that y_j is missing for a subject, it doesn't contribute to the f estimation. I haven't looked at how Bartlett would be done with missing. 

Luc Watrin posted on Thursday, October 13, 2016  1:09 am



Dear Drs. Muthen, is there a way to save factor scores with more than 3 decimal places? Thanks in advance! 


No. 


Dear Mplusteam, In the description for factor score estimation in mplus (Factor scores.pdf) it is mentioned that FS (Regression method) used as predictors will yield unbiased regression slopes. I'm testing a complex moderation model (... F1²*F2), where all fully latent approaches (LMS, unconstained product indicators, ...) show convergence problems (...). So I'm wondering if it would be reasonable to use FS for the predictors (and the product Terms) and use a measurement model for the dependent variable. Christoph Weber 


That result holds only for linear models where the bias in the nominator and denominator of the usual slope formula cancel out. 


Thanks a lot! I know that the estimation of the model is quite problematic. It seems to be a question of lesser bias. In this regard would it be preferable to use FS or a measurement model as dependet variable? 


If you can analyze in a single step, that is best. 

AT Jothees posted on Saturday, March 25, 2017  1:27 am



Dear all, I am very new to psychometrics and mplus. So, I am trying to understand the difference between latent score generated from factor analysis and IRT. In my understanding, there is no real difference in terms of range 3 to +3. Is this correct ? Thank you very much in advance. Regards, J 


No difference. 

Tom Clarke posted on Sunday, June 04, 2017  2:37 pm



Dear Professors, Many thanks for this forum it is a fantastically useful resource. Could I ask is there a way to obtain nonstandardised factor scores from MPLUS? I am running an analysis using categorical indicators were it would be useful to have raw factor scores outputted. Many thanks, Tom 


The factor scores are estimated from a model where you can choose the metric of the factor. For instance, the model's factor variance parameter need not be 1, although that seems a natural metric. The factor score estimates themselves are not standardized. 


To follow up on the question in the previous post  is there a way to scale factors scores back to the scale of indicators? The reason it would be beneficial is that the indicator scale has a meaning (e.g., 1 means Not at all, 4 means A Lot, etc.); thus, it's easier to interpret scores than that many standard deviations above/below the mean... Thank you in advance. 


Also, I wanted to ask about the latent variable scores regression coefficient matrix that I obtain through the CFA analysis  these are used to create composite scores as far as I understand. Is this matrix different from the factor score coefficient matrix that is produced by the EFA analysis? If I want to compute composite scores by hand using the factor score coefficient matrix from EFA, I can do so by multiplying coefficients by standardized item values and then adding them together. But this approach doesn't seem to work this way with the latent variable scores regression coefficient matrix from CFA. Why is that? 


Answer to 12:25 post: They refer to analogous things but the CFA coefficient matrix refers to the unstandardized (raw) data. 


Answer to the 12:19 post: Your model has a continuous factor so it doesn't give only a limited number of distinct values like your observed ordinal variables. Attempting to get back to the observed scale by some kind of categorization would throw away information. It is true, however, that some measurement instruments like NAEP (google it) brings factor values back to an ordinal scale (basic, etc) but that is a difficult process that involves understanding which items tend to exceed a certain threshold at which factor value. My advice  stick with SDs for the latent variable. 

Daria G. posted on Monday, August 21, 2017  5:28 pm



Thank you so much! That's really helpful. Just two more quick questions: 1. I know that I can obtain composite scores through EFA (via a number of methods, such as Regression, Bartlett, AndersonRubin, etc.) or through CFA. I am a little confused  which way should I choose? 2. I intend to do a cluster analysis on latent variables. Do I understand correctly that I can do that only using composite scores? In other words, there is no way to do the analysis "in one step"? (CFA + cluster without computing composite scores?) 


1. See our FAQ: Factor scores 2. You can do the clustering together with the CFA in a single analysis. We refer to that as factor mixture modeling  see the Papers section of our web site and also the Topic 5 short course handout and video on our website. 

Luo Wenshu posted on Friday, December 22, 2017  4:21 pm



Dear Dr. Muthen, I saved factor scores obtained in CFA for following regression analyses. I found that the correlations between factor scores are higher than the correlations between factors obtained in CFA? Can you help explain why? Thank you very much. 


See the FAQ on our website: Factor scores 

Luo Wenshu posted on Saturday, December 23, 2017  5:34 am



Thank you, Dr. Muthen. I have read the FAQ and relevant article. Does this mean that regression based on factor scores is never recommended due to potential bias in slope estimation? 


I would try to avoid it, except as an approximation when you have very good measurement of the factor. 

Back to top 