Hello! I have used Mplus to confirm a scale which I developed. Now I want to use my data and the scale to compare groups (demographics) according to the latent variables. I think what I need to do is to get a factor score for each of the observed variables using Mplus then transfer the data to SPSS and do the following:
Example: I have 9 Latent Variables. The first Latent Variable (FAC1) is comprised of the following observed variables: C7, C11, C20 and C24.
1. Total the OV factor scores for each of the OV that contribute to a particular LV. Example: (FS for C7) + (FS for C11) + (FS for C20) + (FS for C24) = FAC1 FS Sum 2. Divide each OV factor score by the total of OV factor scores to get a "proportional factor score" - so that all of the OV factor scores will equal 1 when summed - to give a % of the impact of each OV. Example: (FS for C7)/(FAC1 FS Sum) = PFS for C7, where PFS = Proportional factor score 3. Use the proportional factor scores in a weighted sum formula. This would allow me to give meaning to the values b/c they would be on the same scale as the responses, 1-5. Example: FAC1 = ((PFS for C7)*C7) + ((PFS for C11)*C11) + ((PFS for C20)*C20) + ((PFS for C24)*C24), where C7 is the respondent's answer to question C7 and FAC1 is now the calculated response for that person's beliefs about Factor 1 (Latent Variable 1)
Perhaps there is a better way to do this. Your advice would be greatly appreciated.
You can get the factor scores using the SAVEDATA command. See the user's guide for details. If you are interested in seeing the steps we use to test measurement invarinace and population heterogeneity using multiple groups, you could purchase the Day 1 handout from our short courses.
I have ordered the handout and will try that. In the meantime, I'd like to get those factor scores. But when I used the SAVEDATA command, it said that it couldn't find the data. Any suggestions on what I need to do.
*** ERROR in Savedata command Only sample correlation matrix may be saved when there is at least one categorical dependent variable.
Stephanie posted on Friday, May 14, 2004 - 12:39 pm
Latest error message:
THE MODEL ESTIMATION TERMINATED NORMALLY
WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F3.
THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL.
Which regression method is used to obtain factor scores in Mplus - anderson-rubin or Bartlett or is there some other?
bmuthen posted on Monday, March 21, 2005 - 5:39 pm
Mplus uses the regression method (see e.g. Lawley-Maxwell's FA book) - also known as the modal posterior estimator - for continuous outcomes and for categorical outcomes with WLSMV. In other cases, it uses the expected posterior distribution approach.
I used SAVEDATA: SAVE = FSCORES to save factor scores from the CFA, but I found all the factor scores in the output file are either 0 or 1. Can I ask what goes wrong here? Or if it is what I should get, how should I understand it?
Sorry Linda, please discard my last questionl. There are three columns that have values look like factor score. Is the first column the fs for the first factor, sceond column factor scores for the second factor, third column residual? Thank you! Shuang
I have one more follow-up question. The number of individuals in the factor score file is different from the number of individuals in the input file. Is there any way to match these two files to know which factor score is for which individual?
Thank you! Shuang
bmuthen posted on Wednesday, June 15, 2005 - 7:41 am
Anonymous posted on Wednesday, June 29, 2005 - 11:41 am
I have a question about the factor score obtained from CFA. I used save=fscore command. When I changed the sequence of indicators in the BY command, the factor score also changed. What’s happed? Which factor score should be used?
When you change the sequence of the factor indicators, a different factor loading is fixed to one to set the metric of the factor. This is why you get different factor scores. Both scores are valid and should correlate one with each other.
samruddhi posted on Tuesday, September 06, 2005 - 9:56 am
I have 10 categorical variables and am predicting one factor using CFA. here is the output: CFI/TLI CFI 0.971 TLI 0.986
Number of Free Parameters 10
RMSEA (Root Mean Square Error Of Approximation) Estimate 0.036
SRMR (Standardized Root Mean Square Residual) Value 0.041
WRMR (Weighted Root Mean Square Residual) Value 1.281
I see that all but the WRMR value shows a good fit. Could you, please, share with me your insight on how to interpret these results in light of WRMR value being so much higher than what indicates a good fit for my model?
Anonymous posted on Tuesday, September 06, 2005 - 9:58 am
can I get a factor score with Mplus if all my indicators are categorical? Mplus guide says this is not possible, any suggestions are most welcome.
Factor scores for factors with categorical indicators have been available for several years. You must be looking at a very old user's guide.
Anonymous posted on Tuesday, September 06, 2005 - 10:39 am
thanks, linda, i am using 2001 user's guide reprinted in 2002. i will find a copy of the new one and use that. many thanks.
samruddhi posted on Tuesday, September 06, 2005 - 10:46 am
Thanks, Linda, for your quick response re: WRMR. Chi-square is 113.157 (p=0.0000) but, with a sample size of 2444, my understanding is that chi-square is not a good measure. let me know if you think otherwise.
could you, also, verify for me if the following values can be used to assess model fit: SRMR 250) TLI > 0.95 CFI > 0.95 RMSEA < 0.06
thanks much for your guidance.
samruddhi posted on Tuesday, September 06, 2005 - 10:52 am
In the previous message, I meant SRMR 250)
samruddhi posted on Tuesday, September 06, 2005 - 10:54 am
In the previous message, I meant SRMR <0.08 works for sample greater than 250.
Chi-square can be sensitive to sample size but that is not a reason to ignore it. You can do a sensitivity study where you free parameters until chi-square is acceptable and see if in doing so the parameters in your original model stay the same. If they do, then the poor chi-square was probably due to its sensitivity. If the original parameters change a lot, then the model probably does not fit well.
For acceptable cutoffs for fit measures, see the Hu and Bentler article from several years ago in Psych Methods and also the Yu dissertation on our website.
Elke Pari posted on Saturday, October 01, 2005 - 7:01 am
Hi, I have a problem in getting factor scores. My dataset contains of 9 variables which are categorical. If I try to get factor scores through the command "SAVE=FSCORES" I get the following error message: "THE MODEL ESTIMATION TERMINATED NORMALLY THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL." What do you suggest could be wrong?
It is difficult to say exactly what the problem is without more information. Please send your input, data, output, and license number to email@example.com.
Marco posted on Wednesday, November 16, 2005 - 5:25 am
Hello Drs. Muthén,
I would like to conduct a multilevel regression analysis with Mplus. Since my indicators are not tau-parallel or even tau-equivalent (bad fit for a CFA with equality constraint on the factor loadings), the simple mean/sum of the indicators aren´t the best estimators of a person´s true score. I know that it is possible to estimate a multilevel model with latent variables, but I would prefer to keep it simple. Therefore... ...would it be advisable to use the factor scores as predictors? I found another estimator in a book from Roderick McDonald (Test Theory, 1999), which takes the factor loadings and the residual variance into account, but reduces the variance of the estimated true score dramatically (compared to the factor scores of Mplus). ...I have a hierarchical sample, so I will estimate the factor scores with TYPE=COMPLEX. Does this option affect the estimation of the factor scores? (Maybe it is problem to take the nonindependence twice into account: first within the CFA with TYPE=COMPLEX and afterwards within the multilevel regression analysis)
Thanks a lot for your help! Marco
Marco posted on Wednesday, November 16, 2005 - 5:41 am
Sorry, I forgot one question: It seems to be, that Mplus estimates factor scores even for cases with all missings on the corresponding indicators. How is that possible?
(I used MLR with TYPE=COMPLEX MISSING H1)
bmuthen posted on Wednesday, November 16, 2005 - 8:45 am
Perhaps you have covariates in the model.
bmuthen posted on Wednesday, November 16, 2005 - 8:49 am
Regarding your 5:25 AM question, if you are going to use factor scores in a multilevel setting it is best if you base those on a multilevel factor analysis model. So you would use type=twolevel. For type = complex to be formally correct, you have to assume that the factor loadings are (approximately) the same on both the within and between level, which is often not the case.
Marco posted on Wednesday, November 16, 2005 - 12:52 pm
Thanks a lot for your reply, but I'am not sure, whether I understand you correctly. Does that mean that the factor structure obtained by type=complex is in fact a mixture of two factor structures, one between- and one within-group?
bmuthen posted on Wednesday, November 16, 2005 - 4:45 pm
Yes. Type = Complex estimates the same parameters as in a regular (single-level) factor analysis. The aim of Type = Complex is to give SEs and chi-square corrected for the non-independence of the observations due to clustering. Sometimes this is not sufficient but Tyep = Twolevel is needed.
Marco posted on Friday, November 18, 2005 - 12:38 am
Thanks for the clarification, but that produces a practical question. My group-level observations are limited to 38 clusters and there are 19 Indicators on the individual-level (although most of them have little between-variance), so probably I will have to reduce the number of between-parameters. Are there some practical guidelines about how to do this best? For example, is there a rule-of-thumb about the minimum between- variance needed? In which cases would it be reasonable to fix the residual variances on the between-level to zero or constrain the loadings on both levels to be equal? I guess that this is a wide topic, so maybe you could provide a reference.
bmuthen posted on Friday, November 18, 2005 - 5:29 am
To have success with your two-level factor analysis, I would recommend reading and following the 5 steps of
Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398. (#55)
It may be that you need only a 1-factor model on between and perhaps zero between-level residual variances.
I've a question about the estimation of factor means within a multiple group analyses (model: 3 factors with 4 indicators each).
As a default the factor means are fixed to zero in the first group, while in the second group factor means are free. I want to overrule the default and estimate the factor means of both groups by fixing the intercept of an observed dependent variable to zero (one for each factor). But it doesn't work. I hope you can help me solving this problem.
This should work and I am not sure what you mean when you say it does not work. I suspect that you are not freeing the factor mean in the class-specific MODEL command for the first group. If this does not help you solve your problem, please send your input, data, output, and license number to firstname.lastname@example.org
Jinseok Kim posted on Tuesday, January 03, 2006 - 10:25 pm
I am trying to estimate a latern interaction modeling in mplus. Schumacker introduced some approach by Joreskog that used "latent variable score" to estimate a sem with latent interaction modeling (http://www.ssicentral.com/lisrel/techdocs/lvscores.pdf). It seems to me attractive but his explanation is all in LISREL language. So, I was wondering if I can do the same modeling using mplus. Any of your thoughts and suggestions will be greatly appreciated. Thanks.
bmuthen posted on Wednesday, January 04, 2006 - 8:53 am
Using estimated factor scores only leads to approximate solutions. A better alternative is to use the Mplus ML approach to latent variable interaction modeling which is in line with the Klein-Moosbrugger method from Psychometrika.
I am attempting to generate factor scores for a 2 factor model with combined continuous and categorical (binary) indicators. The model fits adequately. I have been using the save = fscores command, but for some reason the program only saves the raw data. It does not seem to be generating factor scores. Is there a problem generating factor scores with combine categorical and continuous indicators?
No, this is not a problem. The only reason factor scores would not be computed is if the model did not converge or there was a warning about negative residual variances or some other problem. The only way to know is to send your intput, data, output, and license number to email@example.com.
1. I tryied to save the factor score, but the save data information in the ouput stated as below:
"Factor scores were not computed. No data were saved."
Folling is my syntax:
data: file is mi1.dat; variable: names are cl edu inc gen bmi year at1-at7 ba1-ba5; usevariables are at1-at7; categorical are at1-at7; model: !set cfa due to varimax. !set reference due to loading. fat1 by at1*; fat1 by at2@1; fat1 by at3; fat2 by at4*; fat2 by at5; fat2 by at6@1; fat2 by at7; fat1 with fat2; output: sampstat standardized residual modindices (0) tech1 tech2 tech4; savedata: file is 0618withfc.dat; save=fscores;
Did I do something wrong?
2. If I chang the model to modify the relations between indicators and/or factors, the factor score also have different value?
3.Could the factor score be interpret as every responser's value of the concept/factor? How is it produced with all the categorical observed values?
4.Besides, how do I read the MI to decide whether free the parameter or not?
It's my carelessness. I find out one indicator(at2) (standardized) residual variance is negative value like this"AT2 -0.433 Undefined 0.14327E+01". Is there any suggestion you can offer to solve this problem? thank you.
This suggests that the model is not appropriate for the data - so the model needs to be modified. Possible modifications can be suggested my modification indices for cross-loadings and residual correlations.
Also check that you get only 1 loading fixed to 1 for each of the 2 factors.
This means that for some observations, the pattern of values in contradictory, for example, a person who gets the easy items incorrect and the difficult items correct. There is nothing you can do other than change the model. I suggest using Version 4.1 if you are not.
According to the subject of my research, I will not change the relations of loading between the factors & indicators, so I modeified the model by incorporating the measurement error covariance to the model due to the value of residual correlation matrix more than .1. After the operation step by step, the residual correlation is all less than .1, the CFI, RMSEA, WRMR is better & better, the number of observations failed to computing factor scores is from 8 to 1. So, I decide to delete this observation from the research sample. Is the process of my operation proper? thank you.
I don't think adding residual covariances with no substantive reason to obtain factor scores for problematic individuals is definsible. Although your theory may say one thing, your data may not be valid measures of the constructs in your theory. I would do an EFA to see if the variables are behaving as expected.
Due to my understanding of your statement about the minization failure, and the initially number of observations failed to computing factor scores is 8 and the sample size is 1357, if I directly delete these 8 observations and run the CFA to save the factor scores again, is it acceptable? Sorry for my strange question.
owen fisher posted on Friday, October 20, 2006 - 10:15 am
Sorry, and I should describe in this way. There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. If these loadings are probit regression coefficient, how do I inteprete the relation between the factor and the indicators. thank you.
owen fisher posted on Friday, October 20, 2006 - 10:25 am
Adding to the previous question, the number of these three indicator levels are 3, 4, and 4. Hope to make the question clearer.
If you are using WLSMV, the factor loadings are probit regresson coefficients. You can interpret their sign and significance. That is probably what is most important if they are used as factor indicators.
owen fisher posted on Sunday, October 22, 2006 - 12:11 pm
Hi, it's carey again.
I describe my question in this way, and sorry for my illiteracy.
There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. By the robust weighted least square, these loadings are probit regression coefficient.
My teacher told me to do the distributoion of the factor score, but I know (if I don't misunderstand the factor score), as a index, I can't find the meaningful cutoff points of the factor score. Like the binary logistic or order logit model, the factor score can only be calculated to the probability of being one of the level of the indicator.
After read the "logit and probit model:order and multinominal analysis" published from Sage univisity, I got a rough idea of the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator.
owen fisher posted on Sunday, October 22, 2006 - 12:15 pm
Following the previous statement.
If the thing is like that, I will directly describe the distributions of these three indicators without causing mental fatigue to find the good cutoff points of the factor score. However, I still need your suggestion.
Is the reason of my decision okay?
By the way, first, how to calculate the probability of being one of the level of the indicator from the factor score? Second, how to calculate the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator?
There is no set way that I know of to find cutoff points for factor scores.
owen fisher posted on Tuesday, October 24, 2006 - 5:40 am
Thanks for your patience.
Anja Weiß posted on Monday, April 16, 2007 - 4:02 am
Dear Mr. and Mrs. Muthen,
I have a Confirmatory factor model with ordinal variables and 7 factors. With SAVEDATA I have saved the factor scores. How are these factor scores calculated? What ist the theoretical backround and where do I find it?
I saw in previous posts and in the technical appendices that Mplus uses the regression method for estimating factor scores for categorical outcomes with WLSMV. If I output the factor scores using the FSCORES option of the SAVEDATA command, can they be interpreted as any factor score would (i.e. if a respondent had a higher value, it indicates they have a higher level on that given factor)? Are the factor scores included on the saved dataset standardized? Thank you for your help and patience. Jessica
For both continuous as well as categorical outcomes with weighted least squares, the factor scores are obtained using the maximum of the posterior distribution. For continuous outcomes this approach has been given the name "Regression method", but I don't think that is used with categorical outcomes. For categorical outcomes, the method is iterative.
I am trying to output factor scores for one latent variable with 14 indicators. I am using type = complex missing h1. I have 13,570 in my data set. However, I am using the subpopulation command which specifies a subsample of 11,488. When examining the factor score output I have factor scores for 13,557 individuals. I thought I would have factor scores for only the 11,488.
Is the missing command generating scores for those not within the subpopulation? I wish to have factor scores for only those within the subpopulation (and not based on the whole population). However, if I do not specify the missing option then I will kick out those individuals who are missing at least one of the indicator variables. I want to make sure that my SEs are correct for the subsample.
Hello, I did confirmatory factor analysis with mplus and computed factor scores. When I checked the score I found that lot of people have negative factor scores. And remaining had positive factor scores. Now my supervisor wants an explanation for the negative score. Do you think negative score is an error? What is the possible reason for the negative score? Many thanks Joanna
This is a response to Courtney Bagge's post. The factors scores that you have obtained are based on the model estimated from the subpopulation (not the entire population). So you can just ignore the factors computed for elements not in the subpopulation.
hello, I computed factor score after CFA and used these factor score as independent variable in logit regression in the next stage of my analysis. Now my supervisor wants to know the procedure how Mplus computes factor score. Do you think you could expain to me the procedure in not too mathematical way? Also, when I compute quintiles of these factor scores I get 2 people in first quintile, 4 in second quintile, 30 in 3rd quintile, 120 in 4th quintile and 94 in 5th quintile. Now first three groups have very few people so does it make sense to combine first 3 groups in one and do regression with 3 groups instead of 5. I shall be most grateful for your help. Joanna
Thanks, Well I have computed the asset index by CFA for the family level data and then did the logit regression (with cluster option) with child level variable as dependent variable and both child level and family level variables (including asset index) as explanatory variables. My supervisor wants me to use quintiles of factor score rather than just scores as computed by CFA. Hence I computed factor score in the first step and then did the second step with quintiles of factor scores. Now I am not certain if it is alright to compute quintiles of factor score as they are standardised. Do you think we could do the whole analysis in one step? Is it alright to compute quintiles of factor score? Thanks Joanna
Hello, I am trying to get factor scores for a Latent Difference Score Model (McArdle). Unfortunately Mplus doesn’t give me any output when using the SAVE IS FSCORES; line. I have already checked the path (savedata)as well as the data set. However, I get an output with all the estimates (and no error messages) without the FSCORES command. Any comments how to solve this problem would be highly appreciated.
Yes, I use the FILE statement and the values of my observed variables are saved in this file, but after adding the FSCORE statement I don't even get an output (with an error message, i.e. that the factor scores were not saved).
I am using CFA model where i specified a single factor (F1) based on seven continuous manifest variables. After requesting the factor scores using the option SAVE=FS I observed that the mean scores are zero. How can i calculate the factor scores for each observation such that the mean is not zero but the actual estimated mean of F1? Thanks
Typically, the mean parameter for the factor is fixed (standardized to) zero. Unless you have multiple groups. So there isn't a non-zero estimated factor mean and the estimated factor means having mean zero is then desirable.
Thanks Bengt for your response. Actually in my case i do want to estimate the actual mean of the factor F1, and incorporate this into the factor scores for each observation. Actually I have a second factor (F2), with similar variables at time 2, and analogously i have the corresponding factor scores. Since, both scores have mean of zero I cannot compare both of them. My goal is to compare the mean scores from F1 and F2. I was thinking in adding the estimated intercepts from F1 to the scores, does it makes sense? Any other suggestions?
If you impose measurement invariance across time for items that are the same at the two time points you can identify a factor mean difference for the two time points. You can fix the factor mean at zero for the first time point and free it and let it be estimated for the second time point. The estimated factor scores for the second time point will then take this non-zero estimated factor mean into account (this occurs automatically in the prior of the posterior computations for the estimated factor scores).
Do the factor scores extracted from a CFA in Mplus have the same desirable properties as an IRT score? Would the factor score for each individual be equal to the IRT score for each individual (say from a graded response IRT model)? If not, is there a consensus on which type of score is superior? Thanks.
Use the FSCORES option of the SAVEDATA command. See the user's guide for further information.
nina chien posted on Wednesday, November 04, 2009 - 10:22 am
I saved out factor scores for 2 factors, closeness and stress. The factor scores for closeness range from -2.51 to 1.08 (distribution is somewhat negatively skewed), and for stress from -.71 to 2.86 (distribution is very positively skewed). But the original items are on a scale from 1 to 5. Did Mplus automatically center the factor scores? I see that each factor has a mean of 0.00. Thank you for your help.
I wanted to learn more about the factor scores created from a CFA in MPLUS. Here are a few questions. Thanks!
(1) Would it be fair to say that they contain "no error" the way we think of it when we model everything in an SEM framework?
(2) How do we know they are generally better than using a regression method or summing/taking means of items to create a composite construct/score? Is there something that can be cited in general and perhaps in particular with regard to the (relatively better) properties of these scores.
(3) Would you still refer to the construct represented by these scores as "latent?"
(4) I have 2 correlated constructs for which I am generating the scores from the CFA, one is a 3-item and the other is a 4-item. As usual I constrain the variance of each factor to 1.0 and freely estimate each loading. Am I wrong that this would essentially standardize the factors? I am getting a mean of (essentially) zero, but a SD of around .91 no matter what I do. Is there any problem with standardizing them?
I have read Appendix 11, but I was hoping to learn more about their general properties.
I use Mplus very regularly and love it. This discussion board is a tremendous resource. Thank you very much.
It has been established that estimated factor scores do not behave like factors. See for instance
Skrondal, A. and Laake, P. (2001). Regression among factor scores. Psychometrika 66, 563-575.
This shows the distortion in the means, variances, and relations with other variables when using estimated factor scores. Especially with a small number of items I would recommend instead using SEM, which also makes it possible to test that the item sets are unidimensional.
I would prefer to use SEM but, among other things, there are cross-classified random effects in my models, which I am able to deal with in the "mixed model" framework. I used CFA to test (multi)dimensionality and to examine measurement invariance across 2 groups. What is the best way to get a factor score or an "observed" score for my constructs if I can't/don't use SEM? Is there a better way than the method used in Mplus? I looked over the article - both of my variables are explanatory and one is a DV in one case. I am not sure I have the resources to carry out the method described there - is it the best way? Perhaps two seprate IRT unidimensional GRM models (but I was under the impression that the CFA method in Mplus gives the same scores)? Thanks.
I am not sure if your items are continuous or categorical. For cont. items, Mplus uses the regression method of factor score estimation, which is equivalent to the Maximum A Posteriori (MAP) method of IRT. For cat. items Mplus uses EAP (expected...) which is standard in IRT. With only 3 and 4 items you won't get good factor score estimates - IRT typically works with many more items per factor, say 20 items or more. With cont. items the problem shows up in terms of a low factor determinacy and with cat items it shows up in terms of poor information functions. I am not aware of literature comparing summed scores to factor scores with a small number of items, but it probably exists (although see my dissertation paper
Muthén, B. (1977). Some results on using summed raw scores and factor scores from dichotomous items in the estimation of structural equation models. Unpublished Technical Report, University of Uppsala, Sweden.
Thank you much for the valuable input. Yes, that is the dilemma (CCREs vs. Measurement Error - not the first time or likely not the last). My scores are Likert (5pt). Most people who I have asked think that ignoring the CCREs is a more problematic offense.
CEKIC Sezen posted on Saturday, February 27, 2010 - 1:45 am
Hello, My problem is the following: I have to complete analyses already done i.e. calculate factorial scores from an CFA which has been carried out with a TYPE=COMPLEX and a CLUSTER=SUBJECT. My first question is the following: the number of observations used in the estimate of the model is 1936, although my initial database was composed of 1556 observations. 1. What are the criteria of mplus for eliminating observations? Then I’ve tried to obtain factorial scores relative to the analyses already done: 2. Is it possible to obtain factorial scores directly from an analysis CFA TYPE=COMPLEX with CLUSTER=SUBJECT? As I could not do it, I’ve redone the analysis (by keeping the same model than before), a CFA with TYPE=GENERAL and IDVARIABLE ARE SUBJECT. I’ve recounted the factorial scores on this last analysis. Unfortunately, the parameters, their standard deviations and the indices estimated by this last analysis don’t’ correspond exactly to the first analysis performed with TYPE=GENERAL and CLUSTER=SUBJECT.
3.if it is not possible to obtain factorial scores with an analysis of the type TYPE=COMPLEX and a CLUSTER=SUBJECT, can the factorial scores obtained thanks to the estimate TYPE=GENERAL and IDVARIABLE ARE SUBJECT be interpreted in the framework of the first analysis, even if the estimate of the two models is not exactly identical?
I hope that my questions are clear and that you can help me. Cordially
Mplus uses ML under "MAR" which is sometimes called "FIML" and means that all subjects who have data on any of the analysis variables are used in the analysis. So perhaps your 1556 observations are the listwise present group, while 1936 is what ML under MAR uses.
You can get factor scores directly in a Type=complex, Cluster=subject analysis.
If you still have problems, please send your input, output, data, and license number to firstname.lastname@example.org.
My question has to do with latent variable means in a multi-group CFA. Given that the default is for the latent variable means are set to zero in the first group and freely estimated in the second group, does that mean that I can't compare the means across groups using a chi-square diff test? Basically, I want to conduct analyses in the context of the multiple group model to test whether or not specific latent variable means are significantly different across the two groups. How do I do this and have an identified model?
So, the unconstrained model has the factor means set to zero in only the first group (freely estimated in the second group), and the constrained model sets the factor means for both groups to zero. Is that correct? And the chi-square diff between those two models tests equality of means across groups?
You don't need estimated factor means for Group 1. It is only the difference in factor means that is identified and meaningful to discuss. And that difference is captured by the group 2 factor means.
Note that this does not imply that every person's factor value is zero in group 1.
Daiwon Lee posted on Sunday, May 23, 2010 - 4:28 pm
My advisor wants me to run a model where I saved the factor scores and then merge the factor scores into a new data set. I think I know how to save factor scores, but I don't know how to merge with original data set to use them in the analysis. Please help me. Many thanks in advance.
See the new merge options in the SAVEDATA command for Version 6. The most recent user's guide is on the website.
Daiwon Lee posted on Monday, May 24, 2010 - 6:28 am
Thank you for the note. However, could you please tell me how to merge saved factor scores with original data in version 5.21? I also tried to save factor scores in "dta" to merge with original data using stata program but stata failed to read the mplus saved factor score file.
I encountered some strange results when saving factor scores for a multi-group CFA.
I have a very simple model, with 2 groups and 3 items loading (strongly; >.70) on one latent variable. Factor scores are saved using the SAVEDATA option. In the newly created dataset, however, I find near-zero correlations (<.10) between the obtained factor scores and the original items for the second group. For the first group, correlations between items and factor scores are as expected.
I checked beforehand, and factor loadings are about equally strong in both groups. The same strange pattern is found irrespective of the the order of the groups, whether I imply or release equality constraints on loadings and/or intercepts, use different items...
Am I doing something wrong? This is the syntax I use (for a model with equality constraints on factor loadings and intercepts). FYI: I am still using an older version of Mplus (version 4), maybe this has something to do with it...
----------------------------- variable: names are item1 item2 item3 country; usevariables are item1 item2 item3; grouping is country (1=BE 7=DK);
model: Y by item1 item2 item3;
output: standardized; modindices;
savedata: file is testout.sav; save = fscores; ----------------------------------
Thanks! Another question - What are the first few columns in the saved factor score data set for? For example, I had observed ordinal (value = 1, 2, 3, 4, and 5)variables x1 to x4, then the factor score data set includes 4 columns called "x1" to "x4" in front of id and factor score. Compare these columns to the original observed data, it appears that x1 in factor score data equals to the original observed x1 minus 1.
Mplus requires the data to have the lowest value of zero so the data are automatically recoded to 0, 1, 2, 3, and 4, The recoded data are saved. See the CATEGORICAL option in the user's guide for more information about this recoding.
Dallas posted on Saturday, July 31, 2010 - 8:53 am
Good morning. I have a question about the comment you make above in replying to James (James L. Lewis posted on Monday, September 28, 2009 - 6:27 pm).
James asks two questions it seems to me. 1) Do factor scores have the same properties as IRT scores? And, 2) are IRT and factor scores the same.
You indicate yes to both. For properties, this makes sense (assuming, of course, a factor model that corresponds to an IRT model).
However, it doesn't seem true that factor scores EQUAL IRT scores. Correct? In other words, if we used EAP scoring and the loadings, etc. from the factor model, we'd get one set of factor scores. If we then converted the factor model parameters into IRT parameters, and again used EAP scoring to get IRT scores, it doesn't seem to me the factor scores would EQUAL the IRT scores. They would have similar measurement properties, but it doesn't seem like they'd be identical scores (in value).
If I am right, can you or Bengt provide a formula to convert factor scores into IRT scores, like formulas do to convert the parameters?
IRT often refers to the 2PL model estimated under maximum likelihood. The 2PL model is the logistic model in Mplus for binary items and using a single factor where you free the factor loadings and fix the factor variance at one. With 2PL and ML estimation, Mplus gets the same loglikelihood as IRT programs. And it computes factor scores using the same EAP (expected a posteriori) method. So this is the same as in IRT and the same factor score values should be expected.
ML probit is also possible, and again EAP is used.
WLSMV probit is also possible, in which case MAP is used.
Dallas posted on Saturday, August 07, 2010 - 5:51 am
Dr. Muthen. Thanks for your reply. You replied so quickly it took me a few days to notice! Yes, I was thinking about using probit and ML, and also thinking about WLSMV probit. In those cases, it seems one would have score with similar measurement properties (e.g., one would still inherit the properties of IRT models), but not identical scores with respect to the "traditionally" estimated IRT model. It seems, though, that a general formula for converting scores from the probit metric to the logistic metric does not exist?
And, thanks for the nudge regarding logistic and ML. It does make sense that and I do agree that with logistic and ML (and appropriately identified model), one would achieve the same results as the IRT model.
Hello, I have a question about obtaining factor scores. The way I understand it Mplus factor scores from a CFA with Likert items and the WLSMV estimator will give me factor scores that are equivalent to Graded Response Model (GRM) IRT scores (EAP estimation [with a normal prior I presume?]).
1. Is this correct?
My other question is with regard to local independence and unidimensionality.
2. I have 15 Likert items (5 point) with which I am trying to measure a single construct. It is pretty clear to me, however, that there are local dependencies and maybe ultimately >1 factor among these items. If I appropriately specify a bi-factor model or otherwise appropriately estimate correlations among the residuals however (to get rid of or "account for" the local dependencies), will my estimated factor scores for the GENERAL factor essentially be equal to GRM person scores (theta) from a model where the unidimensionality and local independence assumptions are satisfied?
If I could I would just stay within the latent variable framework, but for my application here I really need the factor scores.
I hope this makes sense. Any citations would be tremendous also. I love Mplus. Thanks.
I apologize, I cannot locate Technical Appendix Version 2 on the website, only Version 3. It looks like Appendix 11 in the latter only covers estimation of factor scores for continuous and binary data, not ordered categorical(?).
MAEP and EAP scoring of course require the specification of a prior distribution. I am a bit confused in that if I specify a normal prior (for either method), should I not expect that the resulting distribution of Theta-Hat will be normal or close to it?-- particularly if the population distribution of Theta is indeed normal? What if the population distribution of Theta is not normal? I am getting conflicting reports on all this. Can you perhaps clarify. Thanks much.
The posterior distribution can be quite non-normal even with a normal prior. For example, if the items are too difficult or too easy we can't discriminate between people who are high/low and the posterior (theta-hat) will be skewed.
ywang posted on Friday, February 11, 2011 - 8:17 am
Dear Drs. Muthen,
Does IRT model usually stand alone? I included IRT in the SEM, but cannot find the model fitness criteria for the SEM. What model fitness criteria can we get for the SEM with IRT? Also can you refer any paper that describes the model of SEM with IRT? Can it be described in similar way as the CFA model in SEM except for that the indicator variables are categorical?
It sounds like you are using maximum likelihood estimation and categorical outcomes. You will not receive chi-square and related fit statistics in this case. If you use weighted least squares estimation, you will. IRT is CFA with categorical outcomes. There are many IRT books. We have some papers on our webiste under Papers/IRT.
xstudylab posted on Wednesday, February 23, 2011 - 10:09 am
I just switched to Version 6.1 and now I can't get factor scores for my model... I get the error message 'FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE' but indicators and factors are only regressed on exogenous variables.
I ran the same input using Version 5 and it gave me the same estimates as Version 6.1, but it produced factor scores without an error message.
Is there something different about how Version 6.1 produces factor scores?
I'm interested in buying MPlus Base because of the ability to handle binary and ordinal variables in CFA. I do have a question regarding the factor scores that I hope you can address. The scores are continuous, yet I need to convert them to either binary or ordinal -- the same scales as the original variables used in the CFA. Does MPlus do this or is there a way to program this conversion (does MPlus, in fact, have programming capability?)? Any papers on converting? Technical Appendices? Notes?
Mplus does not do this. There would need to be further information in order to define such a conversion. Mplus does not have a programming capability. There is an IRT literature on conversions such as that related to NAEP with writings by Mislevy and others.
ywang posted on Wednesday, March 02, 2011 - 10:59 am
I am working on a SEM with a latent factor by IRT (3 dummy indicator variables) as an independent variable. I was asked by other researchers for description on the latent construct. They believe that the latent variable must have some sort of values and would like to describe the latent contruct in the way of range and distribution.
It seems that the latent variable does not have a metric and it is not possible to be described in such a way as an indicator variable. I am wondering whether it is appropriate to describe the latent construct using factor score instead. However, I have some concerns since(1) estimated factor scores differ between the stand-alone IRT model and the SEM model, and (2) factor score is not exactly the latent construct and still has measurement error.
Do you have any suggestions on how to describe the latent factor in the SEM?
In a cross-sectional model, a factor has a mean of zero and an estimated variance.
ywang posted on Wednesday, March 02, 2011 - 12:32 pm
Thank you very much for the reply. I have a follow-up question. In the stand-alone IRT model, I got the factor variance as 0.122. However, when the IRT was included in the SEM, the factor variance was changed to 0.202. Which variance should I report? Is this inconsistency due to that the SEM does not fit the data well (CFI 0.873, TLI: 0.762)?
If the factor is an independent variable in the SEM then the estimated variances should be close within their SEs. If not, as you say, the SEM may be ill-fitting.
I would not use estimated factor scores here given that you have only 3 indicators. The factor metric of the SEM is clear: your model postulates a normal variable with a mean of zero and a certain variance (or you can fix the variance at 1 to get a z score, and then free the first loading).
ywang posted on Wednesday, March 02, 2011 - 2:07 pm
Thanks a lot. Your reply greatly helped me. I have another question for factors by subgroups such as gender. If I have to list the mean and variance of factor score for males, for females and for all the sample in one table as well as the p value of difference of factor score between males and females, what should I do?
In your previos discussion with other researchers, I understand that multi-group analyses should be used to compare whether the factor mean differs between males and females. In the model you previously mentioned, mean for the factor among one group (e.g. males) is fixed as 0 and the mean is freely estimated in the other group (e.g. females). For that table, I need to list mean and variance for both males, females, and overall sample. How can I relax the means for both groups in the multi-group analyses? Thanks!
In multiple group analysis, a test of factor mean differences is a difference test between a model with factor mean zero in one group and free in the other groups versus a model with factor mean zero in all groups. Please see Slide 223 of the Topic 1 course handout.
Dear DR. Muthen I am running a CFA with categorical indicators, i requested Mplus to compute the factor scores. The model fits well except that mplus cannot compute the factor scores. this is the message
THE MODEL ESTIMATION TERMINATED NORMALLY
FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE.
is there a way to overcome this and get the factor scores thank you fernando
I think you will find that factor scores are saved in spite of this message. Check that. There is an incorrect error check in Version 6.1 that produces this message but still gives valid factor scores.
Reading over this post stream, I understand that SEM is generally preferable to using factor scores. I have a related question.
When use of all items, or parcels, is prohibited because of sample size limitation, are there any advantages to using factor scores in lieu of total scores for first order factors in an SEM model? In other words, I was thinking of using factor scores to represent first order manifest scales within latent factors in an SEM model. I was hoping to reduce some second-order factors to first order factors in this way. Are there any advantages of accuracy and reduced error when using factor scores in this way, as opposed to using total scores?
Hi. I am working on obtaining factors scores. The problem that I have run into is when I look at the output data set the values for my weight variable that are output are not the same as the original weights that I input. The cluster and strata values were not altered. Does Mplus alter the weights when it is using it?
For example the original weight variable ranged from ~4 to ~3000. Now in the output dataset it ranges from 0 to ~4.
I appreciate any thoughts.
Here is my code: WEIGHT IS W1;
STRATIFICATION IS STR;
CLUSTER IS NEWPSU;
SUBPOPULATION IS EX EQ 1;
MISSING ARE .;
IDVARIABLE IS id;
TYPE = GENERAL COMPLEX;
!MEASUREMENT MODEL F1 BY V1 V2 V3 V4 V5 ; F2 BY V6 V7 V8 V9 V10 ;
I tried to estimate an IRT-Model, which works, yet it does not return factor scores, but strangely enough there is also no error message. It seems like this or some similar problem was addressed in the forum before, at least there is a thread from Goran Milacevic posted on Saturday, July 12, 2008 - 1:20 am which sounds similar. Can you reconstruct what the solution was back then?
Thank you very much in advance for your advice! Michael
Loadings and therefore factor scores are different in EFA than CFA because the models are different, with different degrees of freedom. EFA presents standardized loadings because a correlation matrix is analyzed whereas CFA uses a covariance matrix so that only the standardized solution is close to the EFA.
You should watch the video of our Topic 1 course to learn more about these matters.
Kerry Lee posted on Sunday, November 20, 2011 - 8:25 pm
Dear Drs Muthen,
I am running a confirmatory CFA. Depending on the indicator to which the scale of the factor is fixed, the variance of the factor does not always attain significance. Specifically, when I fixed the metric of the latent to the indicator with the largest loading (i.e., when the factor variance was fixed at one), it failed to attain significance (p = .076). It attains significance when its metric is fixed to one of the other two indicators. To obtain more information on the distribution of the latent, I generated some histograms using the PLOT2 command. When I asked to "view descriptive statistics", the variance value (.133) differs from that in the text output (.147). Would you have some suggestions on why there is a discrepancy? My second question relates to the Kurtosis value. Has it been rescaled, with zero denoting no departure from normality?
Kerry Lee posted on Sunday, November 20, 2011 - 8:34 pm
Regarding my previous post, the histograms were generated using PLOT3, not 2.
I think what you are seeing is the difference between the estimated factor variance parameter and the variance of the estimated factor scores. They are not expected to be the same. Yes, Kurtosis zero is no such departure.
Kerry Lee posted on Sunday, November 20, 2011 - 8:45 pm
Regarding my previous post, the histograms were generated using PLOT3, not 2.
Kerry Lee posted on Sunday, November 20, 2011 - 8:54 pm
Thanks very much for the quick reply. I want to say something about whether there is a significant amount of variance in the latent factor, should I report the value from the text output (I assume this is the estimated factor variance parameter).
Yes, use the estimated factor variance parameter and its SE in the printed output.
nanda mooij posted on Wednesday, December 07, 2011 - 7:48 am
Dear Drs. Muthen,
I have a model with first, second and third order factors, and I want to estimate the factor scores of the first and second order factors with the calculated item parameters of the third order factors. Now I am wondering if I could get these factor scores through putting the item parameters of the third order factors in the input. The items have 3 categories, so are polytomous and are non-ordered. I saw the appendix about the estimation of factor scores, but that's only about dichotomous or continuous y-variables... So can I put item parameters in the input and if not, where can I find an appendix about estimating factor scores of categorical, non-ordered, y-variables?
You can obtain factor scores for all of the factors using the FSCORES option of the SAVEDATA command. See Technical Appendix 11 on the website.
nanda mooij posted on Thursday, December 08, 2011 - 9:21 am
Dear Drs. Muthen,
I know that I can obtain the factor scores for all the factors at once, but I actually want to obtain the factor scores of the total scale using the itemparameters of the subscales (so I first estimate the item parameters of the subscales per subscale, and I want to use these itemparameters to estimate the factor scores of the the total scale). So actually what I want to do is the way it is done in MULTILOG, where I must provide a file with the itemparameters I want to use in it. I'm wondering if this is also possible in Mplus to put a reference of the itemparameters in the input. I hope I'm explaining myself better now.
We are running a CFA with categorical indicators in which some loadings and thresholds are fixed (and others are not). We want to obtain factor scores. Mplus tells us that "FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE". What does this mean? We have no regressions in the model, or other path structure, apart from a CFA.
We want to obtain factor scores from a CFA of categorical indicators to use in a subsequent path analysis because the full model, including the measurement model, would be too large for our sample size. We want to use a single-indicator SEM approach for the path analysis, however, so we want to estimate the reliabilities of the factor scores. We have found a very sensible formula on the internet in which we would divide the factor variance by the sum of the factor variance plus the factor score variance (based on the notion that the factor score variance estimates the error variance). That post on the web indicated that Mplus could give us output corresponding to the within-subject factor score variance (which is constant across subjects). Is this correct? If so, how do we get it?
With categorical outcomes, the standard errors vary as a function of the factor values. You need to take those values from the plot of the information function you get using PLOT2 in the PLOT command. The standard errors are computed as 1/square root of the information function value.
Thanks for the info about getting the information functions.
Unfortunately, we neglected to mention in our earlier posts that we are trying to obtain the within-subject factor score variance for a group factor in a hierarchical model (e.g., one in which items load upon both group factors and a general factor).
We are able to use the PLOT 2 option to get Mplus to give us the information functions for both a unidimensional model of the the items comprising one group factor and a unidimensional model of all items.
However, is there any way to get MPLUS to give us information functions for each factor in a multi-dimensional model in which items are categorical?
We are hoping to use this information to calculate the reliability of factor scores generated from our model.
Thanks Lina. A follow up on creating IRT score. I have, for example, 4 waves data, and for each wave I create the IRT score according to saved factor score. I would like to use these four waves' saved score to fit a LGC model. How do I do this in Mplus? Thank you very much
You can save the factor scores using the SAVEDATA command and then use the saved data to estimate the LGC model. Unless factor determinacy is one, the factor scores are not the same as using the factors in the model. I would suggest a multiple indicator growth model instead.
Thanks Linda, perhaps I did not state my question clearly. I know how to save the factor scores. How do I combine saved scores for each waves? or should I run CFA for each wave at once and save scores. So, I will have only one file?
If you want to use the factor scores from all of the waves in a single analysis, you need to save all of the factor scores in one file. This can be accomplished by running all four waves together and saving the factors scores. You would want to do this as a first step to determine measurement invariance across time.
I need to calculate the standard errors of the factor scores for the CFA model with categorical outcomes. From your explanation above, it looks like in categorical CFA, std. errors can not be computed along with the factor scores in a single step but need to be further derived from the plot of the information function. This is where I am getting lost.
Could you please describe the sequence of steps one needs to follow in order to produce the std. errors for the latent factor scores for each respondent (or refer me to an example). I also wonder if there would be a way to output these scores in a txt or dat format in order to append to the vector of the factor scores?
Thanks for this really quick reply! Could you maybe give me a short explanation what the error message means and why WLSMV cannot be used in such a case? I would like to understand why it is not working.
Oh, I think there was a version where this message came out in error. You should use a newer version.
Jason Bond posted on Tuesday, July 09, 2013 - 10:31 am
I had a question regarding estimated factor scores obtained from a CFA on polytomous factor indicators. For continuous factor indicators, my understanding is that the estimated factor scores are standardized to have mean 0 and variance 1. However, estimated factor scores from polytomous factor indicators have, in my case (a majority of the items are skewed), a negative mean which naturally raises the question with those who I am working with what the scale (or normalization) is for these factor scores. I found this question difficult to answer with Tech appendix 11. Thanks for an help,
To my knowledge, when items are continuous, Mplus uses the regression method to estimate factor scores; when items are categorical, Mplus uses Expected A Posteriori (EAP). I have a CFA with ordinal items measured on a 5-point scale: 1) Treating item as continuous measures and save estimated factor scores in Mplus; 2) Treating item as categorical measures and save estimated factor scores in Mplus.
Both model fits data very well, and the factor scores estimated from the two models are highly correlated (r=0.98). I am wondering if the model results indicate that I can simply treat my 5-point Likert scale items as continuous measure for CFA? If so, any reference? It is much easier to use CFA with continuous items for measurement invariance testing. As I recall, the ALIGNMENT option is only available for continuous or binary items in the current version of Mplus. Will the option be available for categorical items in the future verion of Mplus? Your help will be appreciated!
When items are categorical, WLSMV uses Maximum A Posteriori (MAP) and ML uses EAP.
Typically, it is ok to approximate Likert scale variables as continuous and use linear models unless there are strong floor or ceiling effects. You asked for references - here are 2 classic ones:
Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
Yes, alignment method is likely to be expanded in various directions.
Hello, I am using the alignment method, new to version 7.11. I have attempted to use the SAVEDATA command to save the results/import the output to a data file, but no data file appears or is empty when I open it. Are there SAVEDATA options for the alignment method? I have thousands of pages of output and am trying to find ways to make it easier to interpret/manipulate/analyze the output. Any suggestions you have for this will be much appreciated. Thank you for your time.
We are not aware of any specific problems involving the RESULTS option and alignment. Please send your input, data, output, and license number to email@example.com.
deana desa posted on Thursday, November 07, 2013 - 7:52 am
Is there any way to tell Mplus to keep cases with missing on all variables as the way there are in the input data, although these cases won't be used in the modeling when computing factor scores for other cases?
Dear Drs. Muthen, I have a question about factor scores derived from CFA. I ran a four-factor CFA with 16 items. So each factor has only a few indicators (dichotomous, yes or no). The purpose of the CFA is mainly to confirm the factor structure for some follow-up multilevel analysis. I understand I can use one-step multilevel SEM for this, but I can’t do that due to sample size and other issues.
I compared sum score and factor score for the four factors and found them very similar. So I used sum score for the follow-up multilevel regression analysis. A colleague thinks factor scores are superior to sum scores because they take into account of the correlation among the factors. BTW, the correlation among the four factors in our CFA ranges from .48 to .75. According to this colleague, if I use factor score in the follow-up analysis then multicollinearity is already dealt with.
I read the Mplus posts and the technical appendix but still can’t tell whether the factor scores from CFA already incorporates the correlations among the factors or not. What do you think? Especially do you have a reference for this? I really appreciate it. Li
I had a question about how Mplus calculates factor scores when cases have some missing data on continuous indicators. As I've read, I gather that Mplus uses "regression or Barlett methods" when estimating factor scores for continuous data, and it uses "all available data" to estimate these scores when data are missing. However, it's not very clear exactly how this is accomplished. Could you please say more about how Mplus manages to calculate a factor score when a case has missing data?
If a subject has missing data on all variables in the model, a factor score for that subject cannot be computed; in fact, that subject is not even included in the model estimation. But if the subject has some of the variables observed a factor score is estimated. For instance, if you have a longitudinal model with 1 factor at each of 3 time points and a subject is not present at time point 2, his factor score at time point 2 can be estimated because he has observations at time 1 and 3 and the estimated model says how much the time 2 factor correlates with the time 1 and time 3 factors. The SE of the estimated time 2 factor score for that person is going to be higher than for a person who had no missing.
Thanks, Bengt. So to clarify, when there is partially missing data for an individual, Mplus uses the individual's available scores and the model estimated relationships among the variables to estimate the missing factor score. Right? And, if it were a 1-factor model, with 4 items loading on it, and for an individual 1 is missing, then I assume that Mplus will use the model implied covariance among the items to estimate the score as if the value were present?
Would you say this would result in similar values to using EM based imputation?
I agree on the first paragraph. EM-based imputation typically concerns the missing data on the items, not the factor scores.
Tom Booth posted on Tuesday, March 04, 2014 - 8:54 am
I have conducted a CFA on 7 items with a binary response format based on data with a family structure. As such I have included clustering based on a family identifier in the model commands.
I need to produce factor scores for subsequent analyses. I wanted to confirm that the clustering does not impact these scores? As I understand it corrects SE, and as a result would not impact on the computation of the score. However, I am not 100% confident on how the factor scores for binary responses are computed in order to be sure the above is correct.
I saw previously on this thread that there might be an issue with version 6.1 and reporting factor scores from a CFA. We're having that issue right now - we're running a CFA with categorical indicators and requested factor scores, but each time we're getting the same message: FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE. Is there a specific solution we should use?
We performed CFAs on several measures having from 3 to 9 indicators each, measured on three different scales: 1-3, 1-4, and 1-5. Within each measure, the scale is the same. We treated the measures as continuous. Our purpose is to perform a latent class analysis using the resulting factor scores. We have noticed that the range of the factor scores is wider than the original measurement scale. For example, for a measure with 4 indicators (scale 1-4), our factor scores range from -0.22 to 10.86. We rescaled our items so that the range is 1 to 3 for all of them. However, when we re-calculated the factor scores using the re-scaled indicators, the factor scores were identical to the FS calculated with the original metric. Given that cluster analysis is sensitive to the metric of the measures, we wanted to ask you how the factor scores are calculated in Mplus, and whether the metric of the indicators has an influence on the factor scores. Thank you.
Dear drs. Muthen, I did a multiple group CFA with equality constraint on the factor loadings and I computed the factor scores. When I checked the factor scores distribution, I found that they had a very small variance (0,015 for the first group and 0,018 for the second group) and a very reduced range of variation (between -0,346 and 0,342 for the first group). My questions are: Are the estimated factor scores standardized? How is the scale of the factor scores defined (mean and variance)?
I observed that in multiple group models (with known group membership, specified with KNOWNCLASS option) with categorical dependent variables and full measurement invariance assumed (only expected values and variances of latent variable are set as free parameters in non reference groups), estimated with MLR in Mplus 7.2 I don't obtain standard errors of estimated factor scores.
Is it planned to enable estimation of factor scores standard errors in such a case (and in multiple group models estimated with MLR in general) in future versions of Mplus?
It is correct that standard errors for factor scores are not available in this cases. This is on our list but won't be in the next update.
You can get the standard errors by the running one group at a time with parameters fixed at the estimated values. You can use the SVALUES option of the OUTPUT command to get input with the estimated values as starting values and fix the * to @.
I apologize for posting several questions (I posted one on a different topic yesterday). I am trying to obtain factor scores for a two-level MSEM model with three factors on level 1 and two factors on level 2. However, the variables are all ordinal, so I am using the WLSMV estimator as you recommend elsewhere for ordinal data.
The trouble is that Mplus gives me an error that "Factor scores cannot be computed for TYPE=TWOLEVEL with the estimators ULSMV, WLS, WLSM, and WLSMV." Is it possible to obtain factor scores for a two-level model with ordinal data?
I am interested in generating factor scores from this model. Will the factor scores in this case incorporate the correlation of the error terms of x6 and x7? I read in tech appendix 11 that when one indicator is binary/categorical, residual correlations are assumed to be zero in the computation of the MAP factor scores. In this case, the two items with residual correlation are continuous. I am hoping that the factor score computation, while assuming zero residual correlations for all the ordinal indicators, preserves the residual correlation between the two continuous variable.
shaun goh posted on Monday, January 18, 2016 - 3:21 am
Dear Drs Muthen,
Are saved factor scores from the following WLSMV one-factor model equivalent/or at least a reasonable proxy in scale to the scores estimated by IRT of theta? (i.e. a saved factor score of 1.2 would correspond to 1.2 SD of theta)
model: f by u1* u2* u3* u4*; ! Where u1 to u4 are binary [f@0]; f@1
I have managed to save my factor scores with the following command
savedata: file is fscores1.dat; save = fscores;
And under the VARIABLE command in my model, I have identified which is the ID variable:
IDVARIABLE IS mcsid;
so that I can match up which factor score goes with which id. However, when I run my model, all the Mplus id variables are 0 (and a few are 1) - they don't match up to the mplus dataset like they should (and how they appear in my dataset inp file) (i.e. they should run from 1 - 5899)
They have been received. It must be a problem with our mail server. It is being looked into.
Rick Borst posted on Friday, August 19, 2016 - 1:39 am
I am trying to conduct moderation analysis. And I have a few questions:
1. The latent variable moderation with LOOP plot example (following UG ex 5.13) has two asterices beyond the indicators at the righthand side of the BY statements of the moderator and the independent variable. a) Why is that? b) Do all the indicators in the BY statement need an asterix or just one indicator of every variable?
2. I have a mediation analysis 5 IV's, 1 mediator and 2 DV's. They are all latent variables existing of categorical variables. I want to check whether the moderator (also a latent variable) influences the effect of 2 IV's on the mediator. I need the R square of the mediator (with and without the interactions) and after that the LOOP plot of both interactions (so two LOOP plots). Is this feasable? Because I get stuck all the time at the moment (Errors such as: too many dimensions, the model has reached a saddle point, the model estimation did not terminate normally due to a non-zero derivetative... check you starting values... the loglikelihood derivetative for parameter .. is -0.86 etc.).
Rick Borst posted on Friday, August 19, 2016 - 1:40 am
I am sorry, this is in the wrong thread. I will ask it in another area.
somewhere in this forum is noted that Mplus calculates factor scores even for cases that have missing data on some of the continuous indicators. I understand that Mplus uses information based on the other indicators to estimate the factorscores.
How is this precisely calculated?
More specific, I want to calculate the Bartlett factor scores. Given that mplus does not estimate these and I use matrix calculations. I always end up with missing factor scores for the subjects that have some missingness on their indicators.
To estimate the f value for a subject, Mplus maximizes the likelihood function
g(y | f) * g(f)
where the first term splits up in a product of univariate y_j |f and if that y_j is missing for a subject, it doesn't contribute to the f estimation. I haven't looked at how Bartlett would be done with missing.
Luc Watrin posted on Thursday, October 13, 2016 - 1:09 am
Dear Drs. Muthen,
is there a way to save factor scores with more than 3 decimal places?
Dear Mplus-team, In the description for factor score estimation in mplus (Factor scores.pdf) it is mentioned that FS (Regression method) used as predictors will yield unbiased regression slopes.
I'm testing a complex moderation model (... F1²*F2), where all fully latent approaches (LMS, unconstained product indicators, ...) show convergence problems (...). So I'm wondering if it would be reasonable to use FS for the predictors (and the product Terms) and use a measurement model for the dependent variable.
Thanks a lot! I know that the estimation of the model is quite problematic. It seems to be a question of lesser bias. In this regard would it be preferable to use FS or a measurement model as dependet variable?
Tom Clarke posted on Sunday, June 04, 2017 - 2:37 pm
Many thanks for this forum it is a fantastically useful resource. Could I ask is there a way to obtain non-standardised factor scores from MPLUS? I am running an analysis using categorical indicators were it would be useful to have raw factor scores outputted.
The factor scores are estimated from a model where you can choose the metric of the factor. For instance, the model's factor variance parameter need not be 1, although that seems a natural metric. The factor score estimates themselves are not standardized.
To follow up on the question in the previous post -- is there a way to scale factors scores back to the scale of indicators? The reason it would be beneficial is that the indicator scale has a meaning (e.g., 1 means Not at all, 4 means A Lot, etc.); thus, it's easier to interpret scores than that many standard deviations above/below the mean... Thank you in advance.
Also, I wanted to ask about the latent variable scores regression coefficient matrix that I obtain through the CFA analysis -- these are used to create composite scores as far as I understand. Is this matrix different from the factor score coefficient matrix that is produced by the EFA analysis?
If I want to compute composite scores by hand using the factor score coefficient matrix from EFA, I can do so by multiplying coefficients by standardized item values and then adding them together. But this approach doesn't seem to work this way with the latent variable scores regression coefficient matrix from CFA. Why is that?
Your model has a continuous factor so it doesn't give only a limited number of distinct values like your observed ordinal variables. Attempting to get back to the observed scale by some kind of categorization would throw away information. It is true, however, that some measurement instruments like NAEP (google it) brings factor values back to an ordinal scale (basic, etc) but that is a difficult process that involves understanding which items tend to exceed a certain threshold at which factor value.
My advice - stick with SDs for the latent variable.
Daria G. posted on Monday, August 21, 2017 - 5:28 pm
Thank you so much! That's really helpful.
Just two more quick questions:
1. I know that I can obtain composite scores through EFA (via a number of methods, such as Regression, Bartlett, Anderson-Rubin, etc.) or through CFA. I am a little confused -- which way should I choose?
2. I intend to do a cluster analysis on latent variables. Do I understand correctly that I can do that only using composite scores? In other words, there is no way to do the analysis "in one step"? (CFA + cluster without computing composite scores?)
2. You can do the clustering together with the CFA in a single analysis. We refer to that as factor mixture modeling - see the Papers section of our web site and also the Topic 5 short course handout and video on our website.
Luo Wenshu posted on Friday, December 22, 2017 - 4:21 pm
Dear Dr. Muthen,
I saved factor scores obtained in CFA for following regression analyses. I found that the correlations between factor scores are higher than the correlations between factors obtained in CFA? Can you help explain why? Thank you very much.
Hi, I have generated factor scores for teacher-level CFAs adjusting for school clustering. I did this instead of traditional SEL because I wanted to create school-level values (school n=27) from the teacher factor scores and then use these values in a MLM to predict student-level outcomes, which I can only link to schools (not teachers/classrooms). The CFA items are likert scales that I'm treating as ordinal. I have two questions:
(1) Is this a reasonable way to handle such data/questions?
(2) I noticed that the factor score mean does not equal zero. Do you know why this is the case?
Here's an example of a CFA model:
USEVAR u1 u2 u3 u4; CATEGORICAL ARE u1 u2 u3 u4; CLUSTER=School; SUBPOP=(TRole EQ 2); ANALYSIS: TYPE=COMPLEX; PROCESSOR = 2; MODEL: f BY u1 u2 u3 u4 OUTPUT: SAMP STDYX; SAVEDATA: FILE IS TEACHERS.dat; SAVE = FSCORES;
Thank you for clarifying the Type=Complex issue. I previously considered the 2-level approach you mentioned but thought that with a between sample size of 27, it might be too small of a number to generate trustworthy factor scores for a between CFA. My thinking was that doing the CFA at the teacher-level (where I had larger sample size) and aggregating to schools would be more appropriate. But it sounds like taking the two-level approach is reasonable with this size. Is that correct? Would using Bayesian estimation also help with the small size?
Lastly, If I take the two level approach and use the factor scores as predictors, do I interpret the unstandardized output as if it were standardized?
Hi, I am doing CFA for a rating scale with 16 items and 5 factors. I need to allow correlation of errors between two of the items to get acceptable fit of the model. When i try to generate factor scores using this model, I get the following message:
THE MODEL CONTAINS A NON-ZERO CORRELATION BETWEEN DEPENDENT VARIABLES. SUCH CORRELATIONS ARE IGNORED IN THE COMPUTATION OF THE FACTOR SCORES. THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL.
Dr. Muthen, I am estimating a model using WLSMV and want to compute factors scores.Mplus (version 7.4) output showed:
Factor scores were not computed. No data were saved.
Here is the command (the model reported favorable fit):
Title: THIS IS AN EXAMPLE OF matched sample; Data: FILE IS 396ANE.dat; Variable: Name are NO GENDER ps CARF JOBF STAY WPV OE OI rebo1-rebo22; IDVARIABLE = NO; Usev are NO GENDER JOBF STAY WPV OE OI E1-E3; Categorical=STAY; Missing=All(-9); DEFINE: E1=(rebo1+rebo2+rebo3)/3; E2=(rebo6+rebo8+rebo13)/3; E3=(rebo14+rebo16+rebo20)/3; Model: ANE by WPV OE OI; EE by E1 E2 E3; STAY on JOBF EE GENDER; JOBF on EE GENDER; EE on ANE GENDER; ANE on GENDER; Model indirect: STAY ind ANE; JOBF ind ANE; output: SAMPSTAT TECH1 TECH4 Stdyx Mod; SAVEDATA: FILE IS ANE.sav; SAVE IS fscores; FORMAT IS free;
Anything wrong with my "SAVEDATA" command? How to compute factors scores with such a WLSMV estimated model? Any advice will be pretty appreciated.
This maybe due to negative residual variances (Heywood Cases) or factor correlation bigger than 1. If that is the case simplify the model to avoid the problem. If that is not the case send you example to firstname.lastname@example.org
Send your output to Support along with your license number.
mboer posted on Monday, November 05, 2018 - 9:11 am
Dear prof. Muthén,
I have a longitudinal model with 1 factor at each of three time points (wide format), where there is drop-out. The factor has categorical items.
I want to calculate factor scores from the measures in all three time points using the 'save=fscores' command, and I understand that factor scores are being calculated for drop-out cases when using ML, based on information from previous waves.
However, I noticed that factor scores for drop-out cases are also computed when WLSMV is used. My question is, how are factor scores computed with WLSMV? I know that WLSMV uses pairwise present in dealing with missing data, but I don't understand how pairwise present information can accomodate factor scores for drop-out cases.
WLSMV uses pairwise present data to estimate the sample statistics (see Muthen 1984 in Psychometrika) to which the model is fitted. But the factor scores are based on the estimated model parameters (and the data), not on these pairwise sample statistics.
I have five uncorrelated factors (varimax rotated) and am trying to get factor scores that are similarly uncorrelated. I tried putting the items into a CFA fixing the factor loadings to the values from the EFA output and requesting Fscores. However, I'm still getting factor scores that are correlated > .90!
I'm confused because when I use the factor scoring in stata, the scores are uncorrelated (r = <.1). Why, when I fix the loadings in mplus are the scores now so highly correlated??
Thank you for your response. To clarify, I'm talking about factor scores, not factors, and the correlations between scores go from <.1>.9 in Mplus when I fix the factor loadings to the EFA output values. Would cross-loaders be responsible for such a huge change?
I wish to use factor scores obtained from scalar models of 8 of my constructs. I have exported the factor scores onto SPSS. I noticed that the factor scores for the constructs are negative. If I have to use them as ON statements, Mplus will not be recognising negative scores. How then can I use the factor scores in my model?