Jen Bailey posted on Monday, April 19, 2004 - 4:39 pm
Dear Dr.s Muthen:
I am using Version 3, and trying to run a CFA in preparation for looking at the stability of general, latent substance use within individuals across 3 time points (high school, early adulthood, and age 27). My indicators are either zero-inflated Poisson (e.g., frequency bing drank in past month) or ordered categorical. At each time point, I'm trying to create a latent factor from cigarette use (categ), binge drinking frequency (z-ip), and pot use frequency (z-ip). I have 2 questions. First, I hypothesize autocorrelation in substance specific residuals across time (i.e., high school marijuana use residual will be correlated with age 27 marijuana use residual). I read in the manual, however, that residuals are not calculated for count variables. How can I specify the hypothesized autocorrelations? In my syntax below, you can see some of the ways I've tried to do this.
My second question concerns large fully standardized estimates for the means of my inflation variables. Several of the estimates are 999.00. I am wondering if this could be due to the fact that my count variables have a much larger variance than my other variables (e.g., 32 vs 1). Any other ideas as to why these values might be showing up? I've checked the data file for out of range values, unspecified "missing" values, etc., and found nothing out of order.
I would greatly appreciate any thoughts on either question. Thank you!
DATA: FILE IS C:\JENNIFER\PSUBEH3.DAT; TYPE IS INDIVIDUAL; FORMAT IS 40F8.2;
!HSPTFR#1 WITH SSPT27#1; - using this line gives the error message "hsptfr#1 is not observed." ![HSPTFR#1 WITH SSPT27#1]; - using this line gives the error message "unknown variable [ ." !HSPTFR WITH SSPT27; - using this line gives the error message "interaction problem." !SSPT27#1 ON HSPTFR#1; - using this line gives the error message "sspt27#1 is not observed." !SSPT27 ON HSPTFR; - using this line causes a fully standardized loading of sspt27 on its factor that is greater than 1.
The estimates of 999 are most likely caused by negative residual variances of the categorical outcomes or factor correlations greater than one.
See Example 7.16 to see how to estimate a residual covariance for a categorical outcome using maximum likelihood. You define a factor that influences the two outcomes for which you want a residual covariance. You can use the same approach for a count outcome. Note that you are using numerical integration which can be computationally heavy when there are many factors.
Jen Bailey posted on Tuesday, April 20, 2004 - 10:41 am
Thanks very much. The example you pointed me to was quite helpful!
sara perry posted on Saturday, May 19, 2007 - 10:58 am
I have a couple of related questions to this one.
We want to run a measurement model in MPLUS of a latent variable with factor indicators that are all count data, in a zero-inflated negative binomial distribution. We have a few questions about this:
1) Should we consider the latent variable as a count variable as well, since all the factors are counts? If so, is this considered mixture modeling, and is there a section in the manual for the code to do that?
2) Regardless, we know we need to tell MPLUS that the factor indicators are count data. We tried to do this using the code in the manual and came across an error regarding residual variance. What is the limitation of estimating residual variance in a measurement model w/ count factor indicators? Is there something else we should be considering regarding residual variance/covariance that is different from a normal measurement model?
When testing measurment invariance on a CFA with categorical indicators, the delat parametrisation allow us to work with scale factors and the theta parametrisation allow us to work with residuals (scales and residuals beeing a function of one another, both cannot be estimated simultaneously). Is that it?
Then, my question is how in Mplus can we estimate intercepts in CFA with categorical indicators ? I read somewhere that it is possible but did not found how ? I might be wrong but are not intercepts directly related to scale factors and residuals?
If you want to estimate intercepts, you need to put a factor behind each observed variable such that the factor is equivalent to the observed variable. There is no relationship between intercepts and scale factors and residuals.
If I understand correctly, that would give a higher order CFA model in which each item is loaded on a factor (1 item per factor) and then all factors are themselves loaded on the "higher order" factors of interest ? Or would that give a kind of bifactor model in which each item is simultaneously loaded on two factors ? I believe the first possibility is the right one but the "behind each observed variables) got me confused.
Then the invariance would be at the level of the relationship between the lower and higher order factors (treated as continuous) : baseline, loadings, intercepts, residuals, etc. Without the possibility of testing the invariance of scale factors ?
Then, if my original model is: f1 BY wp1 wp2 wp7 ; f2 BY wp3 wp4 wp9 ; I will have to redefine it as: f10 BY wp1@1; wp1@0; f11 BY wp2@1; wp2@0; f12 BY wp7@1; wp7@0; f13 BY wp3@1; wp3@0; f14 BY wp4@1; wp4@0; f15 BY wp9@1; wp9@0; f1 BY f10 f11 f12; f2 BY f13 f14 f15; And then for the tests of invariance, I only work with the F10-f15 "factors" for which I can constrain or relax, as with the wp1-wp9 in the preceeding model: thresholds, loadings, residuals/scale (theta/delta), intercepts, etc. In other words, I then conduct my tests of invariance as I did (lets suppose I did it right before) but replacing wp1-wp9 by f10-f15? Or do I, for instance, constrain thresholds on wp1-wp9 and loadings/residuals/intercepts on f10-f15 ? And, I you have time, a follow up question: why is the nu matrix not opened by default in Mplus ? Thank you very much for taking time to answer our questions!
Thank you again, However this means I cant be right completely... If the nu and tau matrix cannot be identified at the same time, it means that opening the nu matrix closes the tau matrix and that the invariance of thresholds cannot be estimated in such a model? Or is it still possible to evaluate thresholds invariance at the level of the "pseudo factors" ?
Using pseudo factors your nu's will be put into alpha (the factor means) and you will in this way have access to both nu and tau (the thresholds). The question is which restrictions do you have to place on the model to identify nu and tau parameters. Roger Millsap has written (in MBR?) about parameterizations different from the Mplus defaults where nu and tau are used, but with certain restrictions on other parameters.
Thank you very much, Yes, I was planning to work from the Millsap paper (2004, MBR). In this case, if I'm right, I will have to work with the thresholds at the items level (wp1-wp9) and on the other parameters at the pseudo factors level (f10-f15: residuals, intercepts, loadings) ?
I am using a zero-inflated Poisson model for onset of cigarette smoking, using measures of inattention to predict whether or not an adolescent has started smoking and the level of smoking. My confusion is that when I look at the 3 different standardized models, I see different results and p-values for the relationship of the predictor to smoking. Can you tell me which of the standardizations I should be using? Here is some of the output: MODEL RESULTS
STDYX Standardization CIGDAYT6 ON INATTEN 1.000 0.000 ********* 0.000
CIGDAYT6#1 ON INATTEN 0.076 0.366 0.207 0.836
YSRATTNT WITH SRINAT45 0.478 0.045 10.687 0.000
CIGDAYT6 ON INATTEN 1.000 0.000 ********* 0.000
CIGDAYT6#1 ON INATTEN 0.076 0.366 0.207 0.836
YSRATTNT WITH SRINAT45 0.478 0.045 10.687 0.000
STD Standardization CIGDAYT6 ON INATTEN 1.864 0.532 3.502 0.000
We have observed several instances where WRMR does not seem to work well. It did, however, work well in most of the simulations of the Yu dissertation - see our web site. If most of the other fit indices are good, I think you should ignore WRMR.
Hi I am testing a proposed 5-factor measurement model using CFA. The observed variables are questionnaire items of an ordinal categorical form, 5 cats, coded 0-4, that would often be treated as continuous. However...
Whilst half of them (which ask about likelihood of behaving well in different aspects of one's job) have a roughly uniform distn across the 5 categories, the other half, (which ask about likelihood of behaving badly) have a very skewed distribution; with 70-80% of cases selecting category 0.
As such, the most honest way of defining these vars seemed to be to treat the positive items as continuous, and the negative as count data, since whilst they don't actually represent a count, they are a measure of the occurence of rare events.
I therefore ran the model in Mplus 5 - wWhilst the model runs OK, there are very limited measures of model fit given in the output; just the chi-sq (1910.744 on 49947df!), AIC and BIC.
What are your thoughts re: the allocation of variables type; would it be better to treat all items as continuous or all as categorical?
And my choices are correct, how can i get an indication of model fit; the chi-sq statistic above looks very strange with a huge df with respect to the actual chi-sq figure.
I would not treat ordered categorical variables as count variables. I would use the CATEGORICAL option for all of the ordered categorical variables both those with and without floor effects. The default estimator in this situation is weighted least squares which gives you chi-square and the related fit measures that you are used to.
The Pearson and Likelihood Chi-squares statistics that you obtain with count data are for the frequency table.
I am wondering whether to use EFA before conducting CFA (e.g., to see cross-loadings) or to start with CFA right away. The problem is that when I conduct CFA, the model (4-factor) has a good fit (and loadings are high). However, when I conduct EFA, several items seem to be actually loading on several factors (and none of the loadings are particularly high). And, when I start taking items out, the model seems to be unstable (one of the residual variances becomes very high and negative. so...a 3-factor solution might be actually enough). So, I am a bit puzzled. Should my choice be based on whether we have a good theory vs. whether we want to explore the number of factors underlying the variables?
If the variables are not behaving as you expect, this points to them not being valid measures. You should think about why this might be the case. Ultimately, the meaning of the factors based on theory needs to be considered.
I an running a CFA with categorical indicators. I get a message that some of the bivariate tables have an empty cell. When I remove these problematic items, the model fit is pretty much the same. Is it absolutely necessary to remove these items?
I have count indicators for a latent factor with missing. 1) Is FIML applicable to this Problem? 2) For correction of non-normality i want to use MLR, is this correct? 3) can you offer a reference dealing with the missing problem for count data?
1. Yes. 2. MLR is robust against non-normality of continuous outcomes. For count outcomes, the statistical model takes into account the nature of the data. In this case, MLR can help with model misspecification, for example, using a Poisson model when a negative binomial model is needed. 3. It is the same for all variable types. See the Little and Rubin reference in the user's guide.
Mplus provides both censored-normal and censored-normal inflated modeling for such variables.
Cecily Na posted on Thursday, June 14, 2012 - 12:33 pm
Hello Professors, I have a latent variable with several indicators which are either count or categorical. Should I convert the count indicator into a categorical indicator to perform CFA? Does it matter? Note that the count indicator has a range of 0-200, but the categorical indicator has a range of 0-6.
I'm testing for measurement invariance on a depression measure between 4 ethnic groups. The measure has 20 items scored on a 4-point scale. I have two questions regarding fitting the unconstrained factor loading and unconstrained thesholds models.
1. I first ran separate CFA's for each of the 4 ethnic groups and all of the models fit well. I next ran the fully constrained multi-group model, which ran fine and also fit the data well. When I tried to run the model with unconstrained thresholds, I do not get fit indices because the model is underidentified. I am not sure if there is something wrong in my syntax or if having 3 thresholds for 20 items is too many? I am wondering, 1. would one solution be to fix or constrain some of the model parameters? and 2. are there certain parameters that I should try and fix/constrain?
2. My second issue occurred when trying to run the unconstrained factor loadings model with MLR estimation (I did this using a LCA framework). I was able to get fit indices when I first ran the WLSMV model first. However, when I tried to run the MLR model in order to get AIC and BIC, my model is not converging. I am confused as to why it would converge using one estimator versus another - and am wondering if you have any suggestions for how to get it to converge.
Good morning! I have a question about the scale of a latent variable with categorical indicator variables. I am conducting an ESEM analysis and using categorical indicator variables. I understand that when I have predictor variables with direct paths to my observed variables, I can interpret their effects as the log of odds. However, I am not clear on what the scale is for my latent variable underlying the observed categorical variables. My model fits best with a single factor, and I have significant effects from my predictor variables to the single factor. I understand in CFA with continuous variables, the scale is determined by the indicators if I set one path to equal 1. However, I am unclear on the scale for my latent variable with categorical indicators. Should I attempt an interpretation at all, or is it better to just report if the effects are significant? I greatly appreciate your guidance on this matter.
Thank you very much for your response. I appreciate it very much. In terms of reporting results for the effect of predictors on the latent variable, is standardization a wise option? Or is it better to just report the effects as significant or not significant only? Again, I greatly appreciate your time.