I would like to estimate a CFA model with categorical variables that included one general factor (all ten items loading on this one) and two specific factors (items 1-5 loading on the first one and items 6-10 loading on the second one). How should I specify this sort of general-factor, specific-factor model?
bmuthen posted on Monday, October 14, 2002 - 2:43 pm
Thanks. This is how I thought it should be. However, I keep running into convergence problems. What might be the problem here? I have two data sets, both with dichotomous variables, but the outcome is the same for both of them: no convergence.
bmuthen posted on Tuesday, October 15, 2002 - 12:39 pm
That can sometimes be a sign that the factor variance for the specific factor is almost zero (so there is no specific factor). Sometimes the modeling is helped by having some items that are influenced only by the general factor (in cognitive applications, these are for example reasoning items). But perhaps you want to send your input and data to email@example.com for a detailed diagnosis.
The low factor variance for specific factors was indeed the reason for non-convergence.
Now that I finally have found the best fitting measurement model, I would like to introduce some predictor variables into the model. This I would do by specifying a MIMIC model with 3 exogenous variables (x1-x3) influencing the endogenous (latent) variables identified in the above analysis. However, I would also like to test the influence of one interaction term (x2x3), but I'm not sure how to evaluate the extent of its effect. I'm using the WLSMV estimator so I can't use the chi-square difference test to compare the main effect model with the one including the interaction term. Should I just compare the changes in other fit indexes (e.g., CFI and RMSEA) or could (should) changes in latent factor R-squares be used here?
bmuthen posted on Monday, October 21, 2002 - 12:17 am
You want to see if this interaction variable (1) has significant effects on the factors and (2) doesn't worsen the overall model fit. (1) may be the case even if overall model fit is worsened.
I wouldn't handle the interaction effect different from the other x variables. I would just include all these x variables and see if the model fit is adequate. If it is not, I would include direct effects from x's to y's. Once the model fit is adequate I would simply see if the interaction is significant by its t value.
Ok. I intended to conduct the two-step approach mainly for interpretative reasons (since main effects become conditional when included with their product term). In my case the model with main effects had the following fit (with WLSMV, that is): c2(145)=206.889, p=.0006, CFI=.945, TLI=.956, RMSEA=.036, WRMR=.980, while for the model with both main effects and the interaction term, the fit was: c2(149)=216.961, p=.0002, CFI=.938, TLI=.950, RMSEA=.037, WRMR=.993. So, if I follow your advice correctly, the few significant effects for the interaction term should be explicitly considered, even though the fit for that model was slightly worse?
A couple of additional questions: - In my preliminary analyses, one of the y's seemed to flag for DIF. In the MIMIC framework, could this bias be approached by including a direct path from the relevant grouping variable to the y? - As I mentioned, few significant effects for the product term emerged. How could I use the output to create a plot illustrating the interaction? - Do you have a reference available on WLSMV?
Thank you very much!
bmuthen posted on Monday, October 21, 2002 - 2:37 pm
You can certainly report the fit of both models as you do above - this gives the reader the impression that there is not an important worsening of fit, in a descriptive sense, when adding the interactions.
Yes, add a direct path to that y.
You can do an ANOVA-like mean plot to show the interaction. The means now refer to the latent variables. So for example, with an interaction between 2 binary x variables, you need to get the latent variable means at the 4 different combinations of those x values. Perhaps evaluated at the mean of the other x's. The mean of a factor f is
E(f) = alpha + gamma1*E(x1) + ...+ gammaq*E(xq)
where E(x) is estimated by the sample mean.
A WLSMV reference is
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika. (#75)
Still another question concerning the above discussion: Of the exogenous variables included in the model two are dichotomous (grouping variables x1 and x2) and one is continuous (x3). The interaction terms are x1x2, x1x3 and x2x3. Thus, one of the interactions is based on categorical variables. Now, what coding should I use for x1 and x2? x3 will be centered, but should I use contrasts (e.g., -1 and 1) or simple dummy coding (e.g., 0 and 1) for the dichotomous variables? The coding naturally influences the main effect coefficients in the full model, but it also seems to influence c2/dfs (and thus fit indices), which is more troublesome.
bmuthen posted on Thursday, October 24, 2002 - 12:40 pm
My own choice is to use dummy coding for easy interpretation. The choice should not influence the WLS chi-square but the WLSMV might be influenced by such data features although the p value should remain about the same.
Well, this certainly is a bit puzzling. Below are my results from three identical models (using WSLMV) with three different codings for grouping variables: (-1 and 1): c2(129)=1215.097BASELINE; c2(152)=214.606MODEL, p<.0006; CFI=.942; TLI=.951; RMSEA=.035 (0 and 1): c2(125)=1022.673BASELINE; c2(152)=215.294MODEL, p<.0006; CFI=.929; TLI=.942; RMSEA=.035 (1 and 2): c2(120)=931.367BASELINE; c2(145)=202.741MODEL, p<.0011; CFI=.929; TLI=.941; RMSEA=.035
Since these stats do not differ when using e.g., ML, it would be nice to know what exactly are the key reasons underlying these differences. After all, they do impact the outcome quite importantly. Any help on this?
bmuthen posted on Friday, October 25, 2002 - 3:25 pm
Did you check to see if the 3 choices affected WLS? My prediction was that it might influence WLSMV. WLSMV tries to make a better chi-square approximation and is influenced by changes in the raw data (the 3 scorings change the raw data). The results are, however, approximately the same in my view. We should not expect a more precise picture than this. For example, the CFI difference between 0.929 and 0.942 should be viewed as small and not lead to different conclusions. Also, the p values are all approximately 0.001 - we should not make anything out of small differences such as 0.0006 and 0.0011.
I was not able to run the analysis with WLS due to a non-positive definite matrix. In any case, I didn't mean to claim that the results were that different - just that differences occured in important outcomes. I do agree that the conclusions drawn based on the fit stats are still the same. I just hoped I could understand better why this variation took place in order to take it into account in future work.
Mukadder posted on Friday, March 04, 2011 - 12:18 pm
In the Topic 7 videos a general factor is taken into consideration within two-level CFA. How is this general factor interpreted in conjunction with the within and between level models? In other words, what is the logic underlying the inclusion of a general factor along with other latent factors in a model? The first thing came up to my mind was for example the general factor has to do with the unidimensionality of a multiple-choice test; but I couldn't be sure. I would appreciate if you give an extended viewpoint on the general factor.
If you are considering a general factor on the within level, a general factor represents a general construct such as test-taking ability. On the between level where cluster may be classroom, it represents a general construct such as the teacher's contribution to test-taking ability.
Kati Aus posted on Monday, October 29, 2012 - 7:08 am
Hi! We are running bifactor models where we have specified one general factor (with 6 indicators) and two specific factors (with 3 indicators each). Correlations are set to be zero among the factors. We are running into estimation problems. Is it even viable to conduct bifactor modeling with so few items and factors (as our models are underidentified unless we set certain constraints)?