Hello! I have used Mplus to confirm a scale which I developed. Now I want to use my data and the scale to compare groups (demographics) according to the latent variables. I think what I need to do is to get a factor score for each of the observed variables using Mplus then transfer the data to SPSS and do the following:
Example: I have 9 Latent Variables. The first Latent Variable (FAC1) is comprised of the following observed variables: C7, C11, C20 and C24.
1. Total the OV factor scores for each of the OV that contribute to a particular LV. Example: (FS for C7) + (FS for C11) + (FS for C20) + (FS for C24) = FAC1 FS Sum 2. Divide each OV factor score by the total of OV factor scores to get a "proportional factor score" - so that all of the OV factor scores will equal 1 when summed - to give a % of the impact of each OV. Example: (FS for C7)/(FAC1 FS Sum) = PFS for C7, where PFS = Proportional factor score 3. Use the proportional factor scores in a weighted sum formula. This would allow me to give meaning to the values b/c they would be on the same scale as the responses, 1-5. Example: FAC1 = ((PFS for C7)*C7) + ((PFS for C11)*C11) + ((PFS for C20)*C20) + ((PFS for C24)*C24), where C7 is the respondent's answer to question C7 and FAC1 is now the calculated response for that person's beliefs about Factor 1 (Latent Variable 1)
Perhaps there is a better way to do this. Your advice would be greatly appreciated.
You can get the factor scores using the SAVEDATA command. See the user's guide for details. If you are interested in seeing the steps we use to test measurement invarinace and population heterogeneity using multiple groups, you could purchase the Day 1 handout from our short courses.
I have ordered the handout and will try that. In the meantime, I'd like to get those factor scores. But when I used the SAVEDATA command, it said that it couldn't find the data. Any suggestions on what I need to do.
*** ERROR in Savedata command Only sample correlation matrix may be saved when there is at least one categorical dependent variable.
Stephanie posted on Friday, May 14, 2004 - 12:39 pm
Latest error message:
THE MODEL ESTIMATION TERMINATED NORMALLY
WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F3.
THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL.
Which regression method is used to obtain factor scores in Mplus - anderson-rubin or Bartlett or is there some other?
bmuthen posted on Monday, March 21, 2005 - 5:39 pm
Mplus uses the regression method (see e.g. Lawley-Maxwell's FA book) - also known as the modal posterior estimator - for continuous outcomes and for categorical outcomes with WLSMV. In other cases, it uses the expected posterior distribution approach.
I used SAVEDATA: SAVE = FSCORES to save factor scores from the CFA, but I found all the factor scores in the output file are either 0 or 1. Can I ask what goes wrong here? Or if it is what I should get, how should I understand it?
Sorry Linda, please discard my last questionl. There are three columns that have values look like factor score. Is the first column the fs for the first factor, sceond column factor scores for the second factor, third column residual? Thank you! Shuang
I have one more follow-up question. The number of individuals in the factor score file is different from the number of individuals in the input file. Is there any way to match these two files to know which factor score is for which individual?
Thank you! Shuang
bmuthen posted on Wednesday, June 15, 2005 - 7:41 am
Anonymous posted on Wednesday, June 29, 2005 - 11:41 am
I have a question about the factor score obtained from CFA. I used save=fscore command. When I changed the sequence of indicators in the BY command, the factor score also changed. What’s happed? Which factor score should be used?
When you change the sequence of the factor indicators, a different factor loading is fixed to one to set the metric of the factor. This is why you get different factor scores. Both scores are valid and should correlate one with each other.
samruddhi posted on Tuesday, September 06, 2005 - 9:56 am
I have 10 categorical variables and am predicting one factor using CFA. here is the output: CFI/TLI CFI 0.971 TLI 0.986
Number of Free Parameters 10
RMSEA (Root Mean Square Error Of Approximation) Estimate 0.036
SRMR (Standardized Root Mean Square Residual) Value 0.041
WRMR (Weighted Root Mean Square Residual) Value 1.281
I see that all but the WRMR value shows a good fit. Could you, please, share with me your insight on how to interpret these results in light of WRMR value being so much higher than what indicates a good fit for my model?
Anonymous posted on Tuesday, September 06, 2005 - 9:58 am
can I get a factor score with Mplus if all my indicators are categorical? Mplus guide says this is not possible, any suggestions are most welcome.
Factor scores for factors with categorical indicators have been available for several years. You must be looking at a very old user's guide.
Anonymous posted on Tuesday, September 06, 2005 - 10:39 am
thanks, linda, i am using 2001 user's guide reprinted in 2002. i will find a copy of the new one and use that. many thanks.
samruddhi posted on Tuesday, September 06, 2005 - 10:46 am
Thanks, Linda, for your quick response re: WRMR. Chi-square is 113.157 (p=0.0000) but, with a sample size of 2444, my understanding is that chi-square is not a good measure. let me know if you think otherwise.
could you, also, verify for me if the following values can be used to assess model fit: SRMR 250) TLI > 0.95 CFI > 0.95 RMSEA < 0.06
thanks much for your guidance.
samruddhi posted on Tuesday, September 06, 2005 - 10:52 am
In the previous message, I meant SRMR 250)
samruddhi posted on Tuesday, September 06, 2005 - 10:54 am
In the previous message, I meant SRMR <0.08 works for sample greater than 250.
Chi-square can be sensitive to sample size but that is not a reason to ignore it. You can do a sensitivity study where you free parameters until chi-square is acceptable and see if in doing so the parameters in your original model stay the same. If they do, then the poor chi-square was probably due to its sensitivity. If the original parameters change a lot, then the model probably does not fit well.
For acceptable cutoffs for fit measures, see the Hu and Bentler article from several years ago in Psych Methods and also the Yu dissertation on our website.
Elke Pari posted on Saturday, October 01, 2005 - 7:01 am
Hi, I have a problem in getting factor scores. My dataset contains of 9 variables which are categorical. If I try to get factor scores through the command "SAVE=FSCORES" I get the following error message: "THE MODEL ESTIMATION TERMINATED NORMALLY THE MODEL COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. FACTOR SCORES WILL NOT BE COMPUTED. CHECK YOUR MODEL." What do you suggest could be wrong?
It is difficult to say exactly what the problem is without more information. Please send your input, data, output, and license number to email@example.com.
Marco posted on Wednesday, November 16, 2005 - 5:25 am
Hello Drs. Muthén,
I would like to conduct a multilevel regression analysis with Mplus. Since my indicators are not tau-parallel or even tau-equivalent (bad fit for a CFA with equality constraint on the factor loadings), the simple mean/sum of the indicators aren´t the best estimators of a person´s true score. I know that it is possible to estimate a multilevel model with latent variables, but I would prefer to keep it simple. Therefore... ...would it be advisable to use the factor scores as predictors? I found another estimator in a book from Roderick McDonald (Test Theory, 1999), which takes the factor loadings and the residual variance into account, but reduces the variance of the estimated true score dramatically (compared to the factor scores of Mplus). ...I have a hierarchical sample, so I will estimate the factor scores with TYPE=COMPLEX. Does this option affect the estimation of the factor scores? (Maybe it is problem to take the nonindependence twice into account: first within the CFA with TYPE=COMPLEX and afterwards within the multilevel regression analysis)
Thanks a lot for your help! Marco
Marco posted on Wednesday, November 16, 2005 - 5:41 am
Sorry, I forgot one question: It seems to be, that Mplus estimates factor scores even for cases with all missings on the corresponding indicators. How is that possible?
(I used MLR with TYPE=COMPLEX MISSING H1)
bmuthen posted on Wednesday, November 16, 2005 - 8:45 am
Perhaps you have covariates in the model.
bmuthen posted on Wednesday, November 16, 2005 - 8:49 am
Regarding your 5:25 AM question, if you are going to use factor scores in a multilevel setting it is best if you base those on a multilevel factor analysis model. So you would use type=twolevel. For type = complex to be formally correct, you have to assume that the factor loadings are (approximately) the same on both the within and between level, which is often not the case.
Marco posted on Wednesday, November 16, 2005 - 12:52 pm
Thanks a lot for your reply, but I'am not sure, whether I understand you correctly. Does that mean that the factor structure obtained by type=complex is in fact a mixture of two factor structures, one between- and one within-group?
bmuthen posted on Wednesday, November 16, 2005 - 4:45 pm
Yes. Type = Complex estimates the same parameters as in a regular (single-level) factor analysis. The aim of Type = Complex is to give SEs and chi-square corrected for the non-independence of the observations due to clustering. Sometimes this is not sufficient but Tyep = Twolevel is needed.
Marco posted on Friday, November 18, 2005 - 12:38 am
Thanks for the clarification, but that produces a practical question. My group-level observations are limited to 38 clusters and there are 19 Indicators on the individual-level (although most of them have little between-variance), so probably I will have to reduce the number of between-parameters. Are there some practical guidelines about how to do this best? For example, is there a rule-of-thumb about the minimum between- variance needed? In which cases would it be reasonable to fix the residual variances on the between-level to zero or constrain the loadings on both levels to be equal? I guess that this is a wide topic, so maybe you could provide a reference.
bmuthen posted on Friday, November 18, 2005 - 5:29 am
To have success with your two-level factor analysis, I would recommend reading and following the 5 steps of
Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398. (#55)
It may be that you need only a 1-factor model on between and perhaps zero between-level residual variances.
I've a question about the estimation of factor means within a multiple group analyses (model: 3 factors with 4 indicators each).
As a default the factor means are fixed to zero in the first group, while in the second group factor means are free. I want to overrule the default and estimate the factor means of both groups by fixing the intercept of an observed dependent variable to zero (one for each factor). But it doesn't work. I hope you can help me solving this problem.
This should work and I am not sure what you mean when you say it does not work. I suspect that you are not freeing the factor mean in the class-specific MODEL command for the first group. If this does not help you solve your problem, please send your input, data, output, and license number to firstname.lastname@example.org
Jinseok Kim posted on Tuesday, January 03, 2006 - 10:25 pm
I am trying to estimate a latern interaction modeling in mplus. Schumacker introduced some approach by Joreskog that used "latent variable score" to estimate a sem with latent interaction modeling (http://www.ssicentral.com/lisrel/techdocs/lvscores.pdf). It seems to me attractive but his explanation is all in LISREL language. So, I was wondering if I can do the same modeling using mplus. Any of your thoughts and suggestions will be greatly appreciated. Thanks.
bmuthen posted on Wednesday, January 04, 2006 - 8:53 am
Using estimated factor scores only leads to approximate solutions. A better alternative is to use the Mplus ML approach to latent variable interaction modeling which is in line with the Klein-Moosbrugger method from Psychometrika.
I am attempting to generate factor scores for a 2 factor model with combined continuous and categorical (binary) indicators. The model fits adequately. I have been using the save = fscores command, but for some reason the program only saves the raw data. It does not seem to be generating factor scores. Is there a problem generating factor scores with combine categorical and continuous indicators?
No, this is not a problem. The only reason factor scores would not be computed is if the model did not converge or there was a warning about negative residual variances or some other problem. The only way to know is to send your intput, data, output, and license number to email@example.com.
1. I tryied to save the factor score, but the save data information in the ouput stated as below:
"Factor scores were not computed. No data were saved."
Folling is my syntax:
data: file is mi1.dat; variable: names are cl edu inc gen bmi year at1-at7 ba1-ba5; usevariables are at1-at7; categorical are at1-at7; model: !set cfa due to varimax. !set reference due to loading. fat1 by at1*; fat1 by at2@1; fat1 by at3; fat2 by at4*; fat2 by at5; fat2 by at6@1; fat2 by at7; fat1 with fat2; output: sampstat standardized residual modindices (0) tech1 tech2 tech4; savedata: file is 0618withfc.dat; save=fscores;
Did I do something wrong?
2. If I chang the model to modify the relations between indicators and/or factors, the factor score also have different value?
3.Could the factor score be interpret as every responser's value of the concept/factor? How is it produced with all the categorical observed values?
4.Besides, how do I read the MI to decide whether free the parameter or not?
It's my carelessness. I find out one indicator(at2) (standardized) residual variance is negative value like this"AT2 -0.433 Undefined 0.14327E+01". Is there any suggestion you can offer to solve this problem? thank you.
This suggests that the model is not appropriate for the data - so the model needs to be modified. Possible modifications can be suggested my modification indices for cross-loadings and residual correlations.
Also check that you get only 1 loading fixed to 1 for each of the 2 factors.
This means that for some observations, the pattern of values in contradictory, for example, a person who gets the easy items incorrect and the difficult items correct. There is nothing you can do other than change the model. I suggest using Version 4.1 if you are not.
According to the subject of my research, I will not change the relations of loading between the factors & indicators, so I modeified the model by incorporating the measurement error covariance to the model due to the value of residual correlation matrix more than .1. After the operation step by step, the residual correlation is all less than .1, the CFI, RMSEA, WRMR is better & better, the number of observations failed to computing factor scores is from 8 to 1. So, I decide to delete this observation from the research sample. Is the process of my operation proper? thank you.
I don't think adding residual covariances with no substantive reason to obtain factor scores for problematic individuals is definsible. Although your theory may say one thing, your data may not be valid measures of the constructs in your theory. I would do an EFA to see if the variables are behaving as expected.
Due to my understanding of your statement about the minization failure, and the initially number of observations failed to computing factor scores is 8 and the sample size is 1357, if I directly delete these 8 observations and run the CFA to save the factor scores again, is it acceptable? Sorry for my strange question.
owen fisher posted on Friday, October 20, 2006 - 10:15 am
Sorry, and I should describe in this way. There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. If these loadings are probit regression coefficient, how do I inteprete the relation between the factor and the indicators. thank you.
owen fisher posted on Friday, October 20, 2006 - 10:25 am
Adding to the previous question, the number of these three indicator levels are 3, 4, and 4. Hope to make the question clearer.
If you are using WLSMV, the factor loadings are probit regresson coefficients. You can interpret their sign and significance. That is probably what is most important if they are used as factor indicators.
owen fisher posted on Sunday, October 22, 2006 - 12:11 pm
Hi, it's carey again.
I describe my question in this way, and sorry for my illiteracy.
There is one factor and three categorical indicators, and the loadings are .455, 1, and .386. By the robust weighted least square, these loadings are probit regression coefficient.
My teacher told me to do the distributoion of the factor score, but I know (if I don't misunderstand the factor score), as a index, I can't find the meaningful cutoff points of the factor score. Like the binary logistic or order logit model, the factor score can only be calculated to the probability of being one of the level of the indicator.
After read the "logit and probit model:order and multinominal analysis" published from Sage univisity, I got a rough idea of the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator.
owen fisher posted on Sunday, October 22, 2006 - 12:15 pm
Following the previous statement.
If the thing is like that, I will directly describe the distributions of these three indicators without causing mental fatigue to find the good cutoff points of the factor score. However, I still need your suggestion.
Is the reason of my decision okay?
By the way, first, how to calculate the probability of being one of the level of the indicator from the factor score? Second, how to calculate the cutoff point of the factor score corresponding to the boundary between one level to the next or last of the indicator?
There is no set way that I know of to find cutoff points for factor scores.
owen fisher posted on Tuesday, October 24, 2006 - 5:40 am
Thanks for your patience.
Anja Weiß posted on Monday, April 16, 2007 - 4:02 am
Dear Mr. and Mrs. Muthen,
I have a Confirmatory factor model with ordinal variables and 7 factors. With SAVEDATA I have saved the factor scores. How are these factor scores calculated? What ist the theoretical backround and where do I find it?
I saw in previous posts and in the technical appendices that Mplus uses the regression method for estimating factor scores for categorical outcomes with WLSMV. If I output the factor scores using the FSCORES option of the SAVEDATA command, can they be interpreted as any factor score would (i.e. if a respondent had a higher value, it indicates they have a higher level on that given factor)? Are the factor scores included on the saved dataset standardized? Thank you for your help and patience. Jessica
For both continuous as well as categorical outcomes with weighted least squares, the factor scores are obtained using the maximum of the posterior distribution. For continuous outcomes this approach has been given the name "Regression method", but I don't think that is used with categorical outcomes. For categorical outcomes, the method is iterative.
I am trying to output factor scores for one latent variable with 14 indicators. I am using type = complex missing h1. I have 13,570 in my data set. However, I am using the subpopulation command which specifies a subsample of 11,488. When examining the factor score output I have factor scores for 13,557 individuals. I thought I would have factor scores for only the 11,488.
Is the missing command generating scores for those not within the subpopulation? I wish to have factor scores for only those within the subpopulation (and not based on the whole population). However, if I do not specify the missing option then I will kick out those individuals who are missing at least one of the indicator variables. I want to make sure that my SEs are correct for the subsample.
Hello, I did confirmatory factor analysis with mplus and computed factor scores. When I checked the score I found that lot of people have negative factor scores. And remaining had positive factor scores. Now my supervisor wants an explanation for the negative score. Do you think negative score is an error? What is the possible reason for the negative score? Many thanks Joanna
This is a response to Courtney Bagge's post. The factors scores that you have obtained are based on the model estimated from the subpopulation (not the entire population). So you can just ignore the factors computed for elements not in the subpopulation.
hello, I computed factor score after CFA and used these factor score as independent variable in logit regression in the next stage of my analysis. Now my supervisor wants to know the procedure how Mplus computes factor score. Do you think you could expain to me the procedure in not too mathematical way? Also, when I compute quintiles of these factor scores I get 2 people in first quintile, 4 in second quintile, 30 in 3rd quintile, 120 in 4th quintile and 94 in 5th quintile. Now first three groups have very few people so does it make sense to combine first 3 groups in one and do regression with 3 groups instead of 5. I shall be most grateful for your help. Joanna
Thanks, Well I have computed the asset index by CFA for the family level data and then did the logit regression (with cluster option) with child level variable as dependent variable and both child level and family level variables (including asset index) as explanatory variables. My supervisor wants me to use quintiles of factor score rather than just scores as computed by CFA. Hence I computed factor score in the first step and then did the second step with quintiles of factor scores. Now I am not certain if it is alright to compute quintiles of factor score as they are standardised. Do you think we could do the whole analysis in one step? Is it alright to compute quintiles of factor score? Thanks Joanna
Hello, I am trying to get factor scores for a Latent Difference Score Model (McArdle). Unfortunately Mplus doesn’t give me any output when using the SAVE IS FSCORES; line. I have already checked the path (savedata)as well as the data set. However, I get an output with all the estimates (and no error messages) without the FSCORES command. Any comments how to solve this problem would be highly appreciated.
Yes, I use the FILE statement and the values of my observed variables are saved in this file, but after adding the FSCORE statement I don't even get an output (with an error message, i.e. that the factor scores were not saved).
I am using CFA model where i specified a single factor (F1) based on seven continuous manifest variables. After requesting the factor scores using the option SAVE=FS I observed that the mean scores are zero. How can i calculate the factor scores for each observation such that the mean is not zero but the actual estimated mean of F1? Thanks
Typically, the mean parameter for the factor is fixed (standardized to) zero. Unless you have multiple groups. So there isn't a non-zero estimated factor mean and the estimated factor means having mean zero is then desirable.
Thanks Bengt for your response. Actually in my case i do want to estimate the actual mean of the factor F1, and incorporate this into the factor scores for each observation. Actually I have a second factor (F2), with similar variables at time 2, and analogously i have the corresponding factor scores. Since, both scores have mean of zero I cannot compare both of them. My goal is to compare the mean scores from F1 and F2. I was thinking in adding the estimated intercepts from F1 to the scores, does it makes sense? Any other suggestions?
If you impose measurement invariance across time for items that are the same at the two time points you can identify a factor mean difference for the two time points. You can fix the factor mean at zero for the first time point and free it and let it be estimated for the second time point. The estimated factor scores for the second time point will then take this non-zero estimated factor mean into account (this occurs automatically in the prior of the posterior computations for the estimated factor scores).
Do the factor scores extracted from a CFA in Mplus have the same desirable properties as an IRT score? Would the factor score for each individual be equal to the IRT score for each individual (say from a graded response IRT model)? If not, is there a consensus on which type of score is superior? Thanks.
Use the FSCORES option of the SAVEDATA command. See the user's guide for further information.
nina chien posted on Wednesday, November 04, 2009 - 10:22 am
I saved out factor scores for 2 factors, closeness and stress. The factor scores for closeness range from -2.51 to 1.08 (distribution is somewhat negatively skewed), and for stress from -.71 to 2.86 (distribution is very positively skewed). But the original items are on a scale from 1 to 5. Did Mplus automatically center the factor scores? I see that each factor has a mean of 0.00. Thank you for your help.
I wanted to learn more about the factor scores created from a CFA in MPLUS. Here are a few questions. Thanks!
(1) Would it be fair to say that they contain "no error" the way we think of it when we model everything in an SEM framework?
(2) How do we know they are generally better than using a regression method or summing/taking means of items to create a composite construct/score? Is there something that can be cited in general and perhaps in particular with regard to the (relatively better) properties of these scores.
(3) Would you still refer to the construct represented by these scores as "latent?"
(4) I have 2 correlated constructs for which I am generating the scores from the CFA, one is a 3-item and the other is a 4-item. As usual I constrain the variance of each factor to 1.0 and freely estimate each loading. Am I wrong that this would essentially standardize the factors? I am getting a mean of (essentially) zero, but a SD of around .91 no matter what I do. Is there any problem with standardizing them?
I have read Appendix 11, but I was hoping to learn more about their general properties.
I use Mplus very regularly and love it. This discussion board is a tremendous resource. Thank you very much.
It has been established that estimated factor scores do not behave like factors. See for instance
Skrondal, A. and Laake, P. (2001). Regression among factor scores. Psychometrika 66, 563-575.
This shows the distortion in the means, variances, and relations with other variables when using estimated factor scores. Especially with a small number of items I would recommend instead using SEM, which also makes it possible to test that the item sets are unidimensional.
I would prefer to use SEM but, among other things, there are cross-classified random effects in my models, which I am able to deal with in the "mixed model" framework. I used CFA to test (multi)dimensionality and to examine measurement invariance across 2 groups. What is the best way to get a factor score or an "observed" score for my constructs if I can't/don't use SEM? Is there a better way than the method used in Mplus? I looked over the article - both of my variables are explanatory and one is a DV in one case. I am not sure I have the resources to carry out the method described there - is it the best way? Perhaps two seprate IRT unidimensional GRM models (but I was under the impression that the CFA method in Mplus gives the same scores)? Thanks.
I am not sure if your items are continuous or categorical. For cont. items, Mplus uses the regression method of factor score estimation, which is equivalent to the Maximum A Posteriori (MAP) method of IRT. For cat. items Mplus uses EAP (expected...) which is standard in IRT. With only 3 and 4 items you won't get good factor score estimates - IRT typically works with many more items per factor, say 20 items or more. With cont. items the problem shows up in terms of a low factor determinacy and with cat items it shows up in terms of poor information functions. I am not aware of literature comparing summed scores to factor scores with a small number of items, but it probably exists (although see my dissertation paper
Muthén, B. (1977). Some results on using summed raw scores and factor scores from dichotomous items in the estimation of structural equation models. Unpublished Technical Report, University of Uppsala, Sweden.
Thank you much for the valuable input. Yes, that is the dilemma (CCREs vs. Measurement Error - not the first time or likely not the last). My scores are Likert (5pt). Most people who I have asked think that ignoring the CCREs is a more problematic offense.
CEKIC Sezen posted on Saturday, February 27, 2010 - 1:45 am
Hello, My problem is the following: I have to complete analyses already done i.e. calculate factorial scores from an CFA which has been carried out with a TYPE=COMPLEX and a CLUSTER=SUBJECT. My first question is the following: the number of observations used in the estimate of the model is 1936, although my initial database was composed of 1556 observations. 1. What are the criteria of mplus for eliminating observations? Then I’ve tried to obtain factorial scores relative to the analyses already done: 2. Is it possible to obtain factorial scores directly from an analysis CFA TYPE=COMPLEX with CLUSTER=SUBJECT? As I could not do it, I’ve redone the analysis (by keeping the same model than before), a CFA with TYPE=GENERAL and IDVARIABLE ARE SUBJECT. I’ve recounted the factorial scores on this last analysis. Unfortunately, the parameters, their standard deviations and the indices estimated by this last analysis don’t’ correspond exactly to the first analysis performed with TYPE=GENERAL and CLUSTER=SUBJECT.
3.if it is not possible to obtain factorial scores with an analysis of the type TYPE=COMPLEX and a CLUSTER=SUBJECT, can the factorial scores obtained thanks to the estimate TYPE=GENERAL and IDVARIABLE ARE SUBJECT be interpreted in the framework of the first analysis, even if the estimate of the two models is not exactly identical?
I hope that my questions are clear and that you can help me. Cordially
Mplus uses ML under "MAR" which is sometimes called "FIML" and means that all subjects who have data on any of the analysis variables are used in the analysis. So perhaps your 1556 observations are the listwise present group, while 1936 is what ML under MAR uses.
You can get factor scores directly in a Type=complex, Cluster=subject analysis.
If you still have problems, please send your input, output, data, and license number to firstname.lastname@example.org.
My question has to do with latent variable means in a multi-group CFA. Given that the default is for the latent variable means are set to zero in the first group and freely estimated in the second group, does that mean that I can't compare the means across groups using a chi-square diff test? Basically, I want to conduct analyses in the context of the multiple group model to test whether or not specific latent variable means are significantly different across the two groups. How do I do this and have an identified model?
So, the unconstrained model has the factor means set to zero in only the first group (freely estimated in the second group), and the constrained model sets the factor means for both groups to zero. Is that correct? And the chi-square diff between those two models tests equality of means across groups?
You don't need estimated factor means for Group 1. It is only the difference in factor means that is identified and meaningful to discuss. And that difference is captured by the group 2 factor means.
Note that this does not imply that every person's factor value is zero in group 1.
Daiwon Lee posted on Sunday, May 23, 2010 - 4:28 pm
My advisor wants me to run a model where I saved the factor scores and then merge the factor scores into a new data set. I think I know how to save factor scores, but I don't know how to merge with original data set to use them in the analysis. Please help me. Many thanks in advance.
See the new merge options in the SAVEDATA command for Version 6. The most recent user's guide is on the website.
Daiwon Lee posted on Monday, May 24, 2010 - 6:28 am
Thank you for the note. However, could you please tell me how to merge saved factor scores with original data in version 5.21? I also tried to save factor scores in "dta" to merge with original data using stata program but stata failed to read the mplus saved factor score file.
I encountered some strange results when saving factor scores for a multi-group CFA.
I have a very simple model, with 2 groups and 3 items loading (strongly; >.70) on one latent variable. Factor scores are saved using the SAVEDATA option. In the newly created dataset, however, I find near-zero correlations (<.10) between the obtained factor scores and the original items for the second group. For the first group, correlations between items and factor scores are as expected.
I checked beforehand, and factor loadings are about equally strong in both groups. The same strange pattern is found irrespective of the the order of the groups, whether I imply or release equality constraints on loadings and/or intercepts, use different items...
Am I doing something wrong? This is the syntax I use (for a model with equality constraints on factor loadings and intercepts). FYI: I am still using an older version of Mplus (version 4), maybe this has something to do with it...
----------------------------- variable: names are item1 item2 item3 country; usevariables are item1 item2 item3; grouping is country (1=BE 7=DK);
model: Y by item1 item2 item3;
output: standardized; modindices;
savedata: file is testout.sav; save = fscores; ----------------------------------
Thanks! Another question - What are the first few columns in the saved factor score data set for? For example, I had observed ordinal (value = 1, 2, 3, 4, and 5)variables x1 to x4, then the factor score data set includes 4 columns called "x1" to "x4" in front of id and factor score. Compare these columns to the original observed data, it appears that x1 in factor score data equals to the original observed x1 minus 1.
Mplus requires the data to have the lowest value of zero so the data are automatically recoded to 0, 1, 2, 3, and 4, The recoded data are saved. See the CATEGORICAL option in the user's guide for more information about this recoding.
Dallas posted on Saturday, July 31, 2010 - 8:53 am
Good morning. I have a question about the comment you make above in replying to James (James L. Lewis posted on Monday, September 28, 2009 - 6:27 pm).
James asks two questions it seems to me. 1) Do factor scores have the same properties as IRT scores? And, 2) are IRT and factor scores the same.
You indicate yes to both. For properties, this makes sense (assuming, of course, a factor model that corresponds to an IRT model).
However, it doesn't seem true that factor scores EQUAL IRT scores. Correct? In other words, if we used EAP scoring and the loadings, etc. from the factor model, we'd get one set of factor scores. If we then converted the factor model parameters into IRT parameters, and again used EAP scoring to get IRT scores, it doesn't seem to me the factor scores would EQUAL the IRT scores. They would have similar measurement properties, but it doesn't seem like they'd be identical scores (in value).
If I am right, can you or Bengt provide a formula to convert factor scores into IRT scores, like formulas do to convert the parameters?
IRT often refers to the 2PL model estimated under maximum likelihood. The 2PL model is the logistic model in Mplus for binary items and using a single factor where you free the factor loadings and fix the factor variance at one. With 2PL and ML estimation, Mplus gets the same loglikelihood as IRT programs. And it computes factor scores using the same EAP (expected a posteriori) method. So this is the same as in IRT and the same factor score values should be expected.
ML probit is also possible, and again EAP is used.
WLSMV probit is also possible, in which case MAP is used.
Dallas posted on Saturday, August 07, 2010 - 5:51 am
Dr. Muthen. Thanks for your reply. You replied so quickly it took me a few days to notice! Yes, I was thinking about using probit and ML, and also thinking about WLSMV probit. In those cases, it seems one would have score with similar measurement properties (e.g., one would still inherit the properties of IRT models), but not identical scores with respect to the "traditionally" estimated IRT model. It seems, though, that a general formula for converting scores from the probit metric to the logistic metric does not exist?
And, thanks for the nudge regarding logistic and ML. It does make sense that and I do agree that with logistic and ML (and appropriately identified model), one would achieve the same results as the IRT model.
Hello, I have a question about obtaining factor scores. The way I understand it Mplus factor scores from a CFA with Likert items and the WLSMV estimator will give me factor scores that are equivalent to Graded Response Model (GRM) IRT scores (EAP estimation [with a normal prior I presume?]).
1. Is this correct?
My other question is with regard to local independence and unidimensionality.
2. I have 15 Likert items (5 point) with which I am trying to measure a single construct. It is pretty clear to me, however, that there are local dependencies and maybe ultimately >1 factor among these items. If I appropriately specify a bi-factor model or otherwise appropriately estimate correlations among the residuals however (to get rid of or "account for" the local dependencies), will my estimated factor scores for the GENERAL factor essentially be equal to GRM person scores (theta) from a model where the unidimensionality and local independence assumptions are satisfied?
If I could I would just stay within the latent variable framework, but for my application here I really need the factor scores.
I hope this makes sense. Any citations would be tremendous also. I love Mplus. Thanks.
I apologize, I cannot locate Technical Appendix Version 2 on the website, only Version 3. It looks like Appendix 11 in the latter only covers estimation of factor scores for continuous and binary data, not ordered categorical(?).
MAEP and EAP scoring of course require the specification of a prior distribution. I am a bit confused in that if I specify a normal prior (for either method), should I not expect that the resulting distribution of Theta-Hat will be normal or close to it?-- particularly if the population distribution of Theta is indeed normal? What if the population distribution of Theta is not normal? I am getting conflicting reports on all this. Can you perhaps clarify. Thanks much.
The posterior distribution can be quite non-normal even with a normal prior. For example, if the items are too difficult or too easy we can't discriminate between people who are high/low and the posterior (theta-hat) will be skewed.
ywang posted on Friday, February 11, 2011 - 8:17 am
Dear Drs. Muthen,
Does IRT model usually stand alone? I included IRT in the SEM, but cannot find the model fitness criteria for the SEM. What model fitness criteria can we get for the SEM with IRT? Also can you refer any paper that describes the model of SEM with IRT? Can it be described in similar way as the CFA model in SEM except for that the indicator variables are categorical?
It sounds like you are using maximum likelihood estimation and categorical outcomes. You will not receive chi-square and related fit statistics in this case. If you use weighted least squares estimation, you will. IRT is CFA with categorical outcomes. There are many IRT books. We have some papers on our webiste under Papers/IRT.
xstudylab posted on Wednesday, February 23, 2011 - 10:09 am
I just switched to Version 6.1 and now I can't get factor scores for my model... I get the error message 'FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE' but indicators and factors are only regressed on exogenous variables.
I ran the same input using Version 5 and it gave me the same estimates as Version 6.1, but it produced factor scores without an error message.
Is there something different about how Version 6.1 produces factor scores?
I'm interested in buying MPlus Base because of the ability to handle binary and ordinal variables in CFA. I do have a question regarding the factor scores that I hope you can address. The scores are continuous, yet I need to convert them to either binary or ordinal -- the same scales as the original variables used in the CFA. Does MPlus do this or is there a way to program this conversion (does MPlus, in fact, have programming capability?)? Any papers on converting? Technical Appendices? Notes?
Mplus does not do this. There would need to be further information in order to define such a conversion. Mplus does not have a programming capability. There is an IRT literature on conversions such as that related to NAEP with writings by Mislevy and others.
ywang posted on Wednesday, March 02, 2011 - 10:59 am
I am working on a SEM with a latent factor by IRT (3 dummy indicator variables) as an independent variable. I was asked by other researchers for description on the latent construct. They believe that the latent variable must have some sort of values and would like to describe the latent contruct in the way of range and distribution.
It seems that the latent variable does not have a metric and it is not possible to be described in such a way as an indicator variable. I am wondering whether it is appropriate to describe the latent construct using factor score instead. However, I have some concerns since(1) estimated factor scores differ between the stand-alone IRT model and the SEM model, and (2) factor score is not exactly the latent construct and still has measurement error.
Do you have any suggestions on how to describe the latent factor in the SEM?
In a cross-sectional model, a factor has a mean of zero and an estimated variance.
ywang posted on Wednesday, March 02, 2011 - 12:32 pm
Thank you very much for the reply. I have a follow-up question. In the stand-alone IRT model, I got the factor variance as 0.122. However, when the IRT was included in the SEM, the factor variance was changed to 0.202. Which variance should I report? Is this inconsistency due to that the SEM does not fit the data well (CFI 0.873, TLI: 0.762)?
If the factor is an independent variable in the SEM then the estimated variances should be close within their SEs. If not, as you say, the SEM may be ill-fitting.
I would not use estimated factor scores here given that you have only 3 indicators. The factor metric of the SEM is clear: your model postulates a normal variable with a mean of zero and a certain variance (or you can fix the variance at 1 to get a z score, and then free the first loading).
ywang posted on Wednesday, March 02, 2011 - 2:07 pm
Thanks a lot. Your reply greatly helped me. I have another question for factors by subgroups such as gender. If I have to list the mean and variance of factor score for males, for females and for all the sample in one table as well as the p value of difference of factor score between males and females, what should I do?
In your previos discussion with other researchers, I understand that multi-group analyses should be used to compare whether the factor mean differs between males and females. In the model you previously mentioned, mean for the factor among one group (e.g. males) is fixed as 0 and the mean is freely estimated in the other group (e.g. females). For that table, I need to list mean and variance for both males, females, and overall sample. How can I relax the means for both groups in the multi-group analyses? Thanks!
In multiple group analysis, a test of factor mean differences is a difference test between a model with factor mean zero in one group and free in the other groups versus a model with factor mean zero in all groups. Please see Slide 223 of the Topic 1 course handout.
Dear DR. Muthen I am running a CFA with categorical indicators, i requested Mplus to compute the factor scores. The model fits well except that mplus cannot compute the factor scores. this is the message
THE MODEL ESTIMATION TERMINATED NORMALLY
FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE.
is there a way to overcome this and get the factor scores thank you fernando
I think you will find that factor scores are saved in spite of this message. Check that. There is an incorrect error check in Version 6.1 that produces this message but still gives valid factor scores.
Reading over this post stream, I understand that SEM is generally preferable to using factor scores. I have a related question.
When use of all items, or parcels, is prohibited because of sample size limitation, are there any advantages to using factor scores in lieu of total scores for first order factors in an SEM model? In other words, I was thinking of using factor scores to represent first order manifest scales within latent factors in an SEM model. I was hoping to reduce some second-order factors to first order factors in this way. Are there any advantages of accuracy and reduced error when using factor scores in this way, as opposed to using total scores?
Hi. I am working on obtaining factors scores. The problem that I have run into is when I look at the output data set the values for my weight variable that are output are not the same as the original weights that I input. The cluster and strata values were not altered. Does Mplus alter the weights when it is using it?
For example the original weight variable ranged from ~4 to ~3000. Now in the output dataset it ranges from 0 to ~4.
I appreciate any thoughts.
Here is my code: WEIGHT IS W1;
STRATIFICATION IS STR;
CLUSTER IS NEWPSU;
SUBPOPULATION IS EX EQ 1;
MISSING ARE .;
IDVARIABLE IS id;
TYPE = GENERAL COMPLEX;
!MEASUREMENT MODEL F1 BY V1 V2 V3 V4 V5 ; F2 BY V6 V7 V8 V9 V10 ;
I tried to estimate an IRT-Model, which works, yet it does not return factor scores, but strangely enough there is also no error message. It seems like this or some similar problem was addressed in the forum before, at least there is a thread from Goran Milacevic posted on Saturday, July 12, 2008 - 1:20 am which sounds similar. Can you reconstruct what the solution was back then?
Thank you very much in advance for your advice! Michael
Loadings and therefore factor scores are different in EFA than CFA because the models are different, with different degrees of freedom. EFA presents standardized loadings because a correlation matrix is analyzed whereas CFA uses a covariance matrix so that only the standardized solution is close to the EFA.
You should watch the video of our Topic 1 course to learn more about these matters.
Kerry Lee posted on Sunday, November 20, 2011 - 8:25 pm
Dear Drs Muthen,
I am running a confirmatory CFA. Depending on the indicator to which the scale of the factor is fixed, the variance of the factor does not always attain significance. Specifically, when I fixed the metric of the latent to the indicator with the largest loading (i.e., when the factor variance was fixed at one), it failed to attain significance (p = .076). It attains significance when its metric is fixed to one of the other two indicators. To obtain more information on the distribution of the latent, I generated some histograms using the PLOT2 command. When I asked to "view descriptive statistics", the variance value (.133) differs from that in the text output (.147). Would you have some suggestions on why there is a discrepancy? My second question relates to the Kurtosis value. Has it been rescaled, with zero denoting no departure from normality?
Kerry Lee posted on Sunday, November 20, 2011 - 8:34 pm
Regarding my previous post, the histograms were generated using PLOT3, not 2.
I think what you are seeing is the difference between the estimated factor variance parameter and the variance of the estimated factor scores. They are not expected to be the same. Yes, Kurtosis zero is no such departure.
Kerry Lee posted on Sunday, November 20, 2011 - 8:45 pm
Regarding my previous post, the histograms were generated using PLOT3, not 2.
Kerry Lee posted on Sunday, November 20, 2011 - 8:54 pm
Thanks very much for the quick reply. I want to say something about whether there is a significant amount of variance in the latent factor, should I report the value from the text output (I assume this is the estimated factor variance parameter).
Yes, use the estimated factor variance parameter and its SE in the printed output.
nanda mooij posted on Wednesday, December 07, 2011 - 7:48 am
Dear Drs. Muthen,
I have a model with first, second and third order factors, and I want to estimate the factor scores of the first and second order factors with the calculated item parameters of the third order factors. Now I am wondering if I could get these factor scores through putting the item parameters of the third order factors in the input. The items have 3 categories, so are polytomous and are non-ordered. I saw the appendix about the estimation of factor scores, but that's only about dichotomous or continuous y-variables... So can I put item parameters in the input and if not, where can I find an appendix about estimating factor scores of categorical, non-ordered, y-variables?
You can obtain factor scores for all of the factors using the FSCORES option of the SAVEDATA command. See Technical Appendix 11 on the website.
nanda mooij posted on Thursday, December 08, 2011 - 9:21 am
Dear Drs. Muthen,
I know that I can obtain the factor scores for all the factors at once, but I actually want to obtain the factor scores of the total scale using the itemparameters of the subscales (so I first estimate the item parameters of the subscales per subscale, and I want to use these itemparameters to estimate the factor scores of the the total scale). So actually what I want to do is the way it is done in MULTILOG, where I must provide a file with the itemparameters I want to use in it. I'm wondering if this is also possible in Mplus to put a reference of the itemparameters in the input. I hope I'm explaining myself better now.
We are running a CFA with categorical indicators in which some loadings and thresholds are fixed (and others are not). We want to obtain factor scores. Mplus tells us that "FACTOR SCORES CAN NOT BE COMPUTED FOR THIS MODEL DUE TO A REGRESSION ON A DEPENDENT VARIABLE". What does this mean? We have no regressions in the model, or other path structure, apart from a CFA.
We want to obtain factor scores from a CFA of categorical indicators to use in a subsequent path analysis because the full model, including the measurement model, would be too large for our sample size. We want to use a single-indicator SEM approach for the path analysis, however, so we want to estimate the reliabilities of the factor scores. We have found a very sensible formula on the internet in which we would divide the factor variance by the sum of the factor variance plus the factor score variance (based on the notion that the factor score variance estimates the error variance). That post on the web indicated that Mplus could give us output corresponding to the within-subject factor score variance (which is constant across subjects). Is this correct? If so, how do we get it?
With categorical outcomes, the standard errors vary as a function of the factor values. You need to take those values from the plot of the information function you get using PLOT2 in the PLOT command. The standard errors are computed as 1/square root of the information function value.
Thanks for the info about getting the information functions.
Unfortunately, we neglected to mention in our earlier posts that we are trying to obtain the within-subject factor score variance for a group factor in a hierarchical model (e.g., one in which items load upon both group factors and a general factor).
We are able to use the PLOT 2 option to get Mplus to give us the information functions for both a unidimensional model of the the items comprising one group factor and a unidimensional model of all items.
However, is there any way to get MPLUS to give us information functions for each factor in a multi-dimensional model in which items are categorical?
We are hoping to use this information to calculate the reliability of factor scores generated from our model.
Thanks Lina. A follow up on creating IRT score. I have, for example, 4 waves data, and for each wave I create the IRT score according to saved factor score. I would like to use these four waves' saved score to fit a LGC model. How do I do this in Mplus? Thank you very much
You can save the factor scores using the SAVEDATA command and then use the saved data to estimate the LGC model. Unless factor determinacy is one, the factor scores are not the same as using the factors in the model. I would suggest a multiple indicator growth model instead.
Thanks Linda, perhaps I did not state my question clearly. I know how to save the factor scores. How do I combine saved scores for each waves? or should I run CFA for each wave at once and save scores. So, I will have only one file?
If you want to use the factor scores from all of the waves in a single analysis, you need to save all of the factor scores in one file. This can be accomplished by running all four waves together and saving the factors scores. You would want to do this as a first step to determine measurement invariance across time.
I need to calculate the standard errors of the factor scores for the CFA model with categorical outcomes. From your explanation above, it looks like in categorical CFA, std. errors can not be computed along with the factor scores in a single step but need to be further derived from the plot of the information function. This is where I am getting lost.
Could you please describe the sequence of steps one needs to follow in order to produce the std. errors for the latent factor scores for each respondent (or refer me to an example). I also wonder if there would be a way to output these scores in a txt or dat format in order to append to the vector of the factor scores?