Message/Author 

Jen Bailey posted on Monday, April 19, 2004  4:39 pm



Dear Dr.s Muthen: I am using Version 3, and trying to run a CFA in preparation for looking at the stability of general, latent substance use within individuals across 3 time points (high school, early adulthood, and age 27). My indicators are either zeroinflated Poisson (e.g., frequency bing drank in past month) or ordered categorical. At each time point, I'm trying to create a latent factor from cigarette use (categ), binge drinking frequency (zip), and pot use frequency (zip). I have 2 questions. First, I hypothesize autocorrelation in substance specific residuals across time (i.e., high school marijuana use residual will be correlated with age 27 marijuana use residual). I read in the manual, however, that residuals are not calculated for count variables. How can I specify the hypothesized autocorrelations? In my syntax below, you can see some of the ways I've tried to do this. My second question concerns large fully standardized estimates for the means of my inflation variables. Several of the estimates are 999.00. I am wondering if this could be due to the fact that my count variables have a much larger variance than my other variables (e.g., 32 vs 1). Any other ideas as to why these values might be showing up? I've checked the data file for out of range values, unspecified "missing" values, etc., and found nothing out of order. I would greatly appreciate any thoughts on either question. Thank you! DATA: FILE IS C:\JENNIFER\PSUBEH3.DAT; TYPE IS INDIVIDUAL; FORMAT IS 40F8.2; VARIABLE: NAMES ARE STUID SEX G1AFRQ7 G1AFRQ8 G1BINGE7 G1BINGE8 G1ALAMT8 G1CIG7 G1CIG8 G1POT7 G1POT8 G1POTFR7 G1POTFR8 HSALQT HSBIN HSALFR HSCIG HSTOB HSPTQTY HSPTFR EAALQTY EABIN EAALFR EACIG EATOB EAPTQTY EAPTFR SSAL27 SSBN27 SSALQTY SSCG27 SSTOB27 Q364 SSPT27 G2CDt G2ODDt G2ADHDt CDT ODDT ADHDT ; USEVARIABLES G2CDt G2ODDT G2ADHDT SSBN27 SSPT27 HSCIG HSBIN HSPTFR SSCG27 ; MISSING =BLANK; CATEGORICAL ARE SSCG27 HSCIG; COUNT ARE SSBN27 (I) SSPT27 (I) HSBIN (I) HSPTFR (I); MODEL: G2PROB BY G2CDt* G2ODDT* G2ADHDT* ; G2ASU BY HSBIN* HSCIG* HSPTFR*; !EASU BY EABIN* EACIG* EAPTFR*; G2SU27 BY SSBN27* SSPT27* SSCG27*; G2PROB@1; G2ASU@1; !EASU@1; G2SU27@1; !HSPTFR#1 WITH SSPT27#1;  using this line gives the error message "hsptfr#1 is not observed." ![HSPTFR#1 WITH SSPT27#1];  using this line gives the error message "unknown variable [ ." !HSPTFR WITH SSPT27;  using this line gives the error message "interaction problem." !SSPT27#1 ON HSPTFR#1;  using this line gives the error message "sspt27#1 is not observed." !SSPT27 ON HSPTFR;  using this line causes a fully standardized loading of sspt27 on its factor that is greater than 1. ANALYSIS: ESTIMATOR= ML; TYPE = MEANSTRUCTURE; !ADAPTIVE = OFF; !MITERATIONS = 1000; !MCITERATIONS = 5; !MUITERATIONS = 5; ITERATIONS=30000; !MATRIX= COVA; OUTPUT: STAND sampstat TECH1 TECH8; 


The estimates of 999 are most likely caused by negative residual variances of the categorical outcomes or factor correlations greater than one. See Example 7.16 to see how to estimate a residual covariance for a categorical outcome using maximum likelihood. You define a factor that influences the two outcomes for which you want a residual covariance. You can use the same approach for a count outcome. Note that you are using numerical integration which can be computationally heavy when there are many factors. 

Jen Bailey posted on Tuesday, April 20, 2004  10:41 am



Thanks very much. The example you pointed me to was quite helpful! 

sara perry posted on Saturday, May 19, 2007  10:58 am



I have a couple of related questions to this one. We want to run a measurement model in MPLUS of a latent variable with factor indicators that are all count data, in a zeroinflated negative binomial distribution. We have a few questions about this: 1) Should we consider the latent variable as a count variable as well, since all the factors are counts? If so, is this considered mixture modeling, and is there a section in the manual for the code to do that? 2) Regardless, we know we need to tell MPLUS that the factor indicators are count data. We tried to do this using the code in the manual and came across an error regarding residual variance. What is the limitation of estimating residual variance in a measurement model w/ count factor indicators? Is there something else we should be considering regarding residual variance/covariance that is different from a normal measurement model? Thanks in advance! 


Mplus does not handle zeroinflated negtive binomial modeling. Instead it uses zeroinflated Poisson modeling with or without mixtures. If your factor indicators are count variables, the factor itself is continuous. Possion regression does not include residual variance parameters. 


Greetings, When testing measurment invariance on a CFA with categorical indicators, the delat parametrisation allow us to work with scale factors and the theta parametrisation allow us to work with residuals (scales and residuals beeing a function of one another, both cannot be estimated simultaneously). Is that it? Then, my question is how in Mplus can we estimate intercepts in CFA with categorical indicators ? I read somewhere that it is possible but did not found how ? I might be wrong but are not intercepts directly related to scale factors and residuals? Thank you in advance 


The answer to your first paragraph is yes. If you want to estimate intercepts, you need to put a factor behind each observed variable such that the factor is equivalent to the observed variable. There is no relationship between intercepts and scale factors and residuals. 


Thank you! I was not too far. If I understand correctly, that would give a higher order CFA model in which each item is loaded on a factor (1 item per factor) and then all factors are themselves loaded on the "higher order" factors of interest ? Or would that give a kind of bifactor model in which each item is simultaneously loaded on two factors ? I believe the first possibility is the right one but the "behind each observed variables) got me confused. Then the invariance would be at the level of the relationship between the lower and higher order factors (treated as continuous) : baseline, loadings, intercepts, residuals, etc. Without the possibility of testing the invariance of scale factors ? 


When you put a factor behind each observed variable as follows, you are simply making the factor identical to the observed variable: f1 BY y1@1; y1@0; This simply opens up the alpha matrix so that intercepts can be estimated because the nu matrix is not opened with categorical outcomes. It is a trick. 


Then, if my original model is: f1 BY wp1 wp2 wp7 ; f2 BY wp3 wp4 wp9 ; I will have to redefine it as: f10 BY wp1@1; wp1@0; f11 BY wp2@1; wp2@0; f12 BY wp7@1; wp7@0; f13 BY wp3@1; wp3@0; f14 BY wp4@1; wp4@0; f15 BY wp9@1; wp9@0; f1 BY f10 f11 f12; f2 BY f13 f14 f15; And then for the tests of invariance, I only work with the F10f15 "factors" for which I can constrain or relax, as with the wp1wp9 in the preceeding model: thresholds, loadings, residuals/scale (theta/delta), intercepts, etc. In other words, I then conduct my tests of invariance as I did (lets suppose I did it right before) but replacing wp1wp9 by f10f15? Or do I, for instance, constrain thresholds on wp1wp9 and loadings/residuals/intercepts on f10f15 ? And, I you have time, a follow up question: why is the nu matrix not opened by default in Mplus ? Thank you very much for taking time to answer our questions! 


You are correct. We don't open the nu matrix as the default because both tau and nu cannot be identified at the same time. We don't see a strong need to work with nu parameters. 


Thank you again, However this means I cant be right completely... If the nu and tau matrix cannot be identified at the same time, it means that opening the nu matrix closes the tau matrix and that the invariance of thresholds cannot be estimated in such a model? Or is it still possible to evaluate thresholds invariance at the level of the "pseudo factors" ? 


Using pseudo factors your nu's will be put into alpha (the factor means) and you will in this way have access to both nu and tau (the thresholds). The question is which restrictions do you have to place on the model to identify nu and tau parameters. Roger Millsap has written (in MBR?) about parameterizations different from the Mplus defaults where nu and tau are used, but with certain restrictions on other parameters. 


Thank you very much, Yes, I was planning to work from the Millsap paper (2004, MBR). In this case, if I'm right, I will have to work with the thresholds at the items level (wp1wp9) and on the other parameters at the pseudo factors level (f10f15: residuals, intercepts, loadings) ? 


Yes, it works this way. No need to answer me, I tried it. Thanks for your time. 


I am using a zeroinflated Poisson model for onset of cigarette smoking, using measures of inattention to predict whether or not an adolescent has started smoking and the level of smoking. My confusion is that when I look at the 3 different standardized models, I see different results and pvalues for the relationship of the predictor to smoking. Can you tell me which of the standardizations I should be using? Here is some of the output: MODEL RESULTS STDYX Standardization CIGDAYT6 ON INATTEN 1.000 0.000 ********* 0.000 CIGDAYT6#1 ON INATTEN 0.076 0.366 0.207 0.836 YSRATTNT WITH SRINAT45 0.478 0.045 10.687 0.000 STDY Standardization CIGDAYT6 ON INATTEN 1.000 0.000 ********* 0.000 CIGDAYT6#1 ON INATTEN 0.076 0.366 0.207 0.836 YSRATTNT WITH SRINAT45 0.478 0.045 10.687 0.000 STD Standardization CIGDAYT6 ON INATTEN 1.864 0.532 3.502 0.000 CIGDAYT6#1 ON INATTEN 0.138 0.670 0.205 0.837 YSRATTNT WITH SRINAT45 0.243 0.037 6.471 0.000 


STDYX is used for continuous covariates. STDY is used for binary covariates. See pages 577579 of the user's guide for more information. 

QianLi Xue posted on Thursday, September 10, 2009  8:07 pm



I run Example 5.2 in User's guide. Here are the fit statistics: CFI 1.000 TLI 1.001 RMSEA 0.000 WRMR 0.342 Why do CFI, TLI, and RMSEA all give indication of a good fit, while WRMR suggests the opposite? 


We have observed several instances where WRMR does not seem to work well. It did, however, work well in most of the simulations of the Yu dissertation  see our web site. If most of the other fit indices are good, I think you should ignore WRMR. 


Hi I am testing a proposed 5factor measurement model using CFA. The observed variables are questionnaire items of an ordinal categorical form, 5 cats, coded 04, that would often be treated as continuous. However... Whilst half of them (which ask about likelihood of behaving well in different aspects of one's job) have a roughly uniform distn across the 5 categories, the other half, (which ask about likelihood of behaving badly) have a very skewed distribution; with 7080% of cases selecting category 0. As such, the most honest way of defining these vars seemed to be to treat the positive items as continuous, and the negative as count data, since whilst they don't actually represent a count, they are a measure of the occurence of rare events. I therefore ran the model in Mplus 5  wWhilst the model runs OK, there are very limited measures of model fit given in the output; just the chisq (1910.744 on 49947df!), AIC and BIC. So... What are your thoughts re: the allocation of variables type; would it be better to treat all items as continuous or all as categorical? And my choices are correct, how can i get an indication of model fit; the chisq statistic above looks very strange with a huge df with respect to the actual chisq figure. 


I would not treat ordered categorical variables as count variables. I would use the CATEGORICAL option for all of the ordered categorical variables both those with and without floor effects. The default estimator in this situation is weighted least squares which gives you chisquare and the related fit measures that you are used to. The Pearson and Likelihood Chisquares statistics that you obtain with count data are for the frequency table. 


I am wondering whether to use EFA before conducting CFA (e.g., to see crossloadings) or to start with CFA right away. The problem is that when I conduct CFA, the model (4factor) has a good fit (and loadings are high). However, when I conduct EFA, several items seem to be actually loading on several factors (and none of the loadings are particularly high). And, when I start taking items out, the model seems to be unstable (one of the residual variances becomes very high and negative. so...a 3factor solution might be actually enough). So, I am a bit puzzled. Should my choice be based on whether we have a good theory vs. whether we want to explore the number of factors underlying the variables? 


If the variables are not behaving as you expect, this points to them not being valid measures. You should think about why this might be the case. Ultimately, the meaning of the factors based on theory needs to be considered. 


Hi, I an running a CFA with categorical indicators. I get a message that some of the bivariate tables have an empty cell. When I remove these problematic items, the model fit is pretty much the same. Is it absolutely necessary to remove these items? 


I would not use them. 


Hello, I have count indicators for a latent factor with missing. 1) Is FIML applicable to this Problem? 2) For correction of nonnormality i want to use MLR, is this correct? 3) can you offer a reference dealing with the missing problem for count data? thanks a lot. alex 


1. Yes. 2. MLR is robust against nonnormality of continuous outcomes. For count outcomes, the statistical model takes into account the nature of the data. In this case, MLR can help with model misspecification, for example, using a Poisson model when a negative binomial model is needed. 3. It is the same for all variable types. See the Little and Rubin reference in the user's guide. 


hello. I have three follow up question. 1) can I censor a zip distributed indicator variable in Mplus? 2) I have a continouos variable, which is similar distributed like the one in 1) which distributional model would you suggest to use with mplus? I want to form a factor out of these two variables. 2)is there a source in which is described, which empirical distribution function I can use for observed variables in Mplus. Thanks. 


1) If you mean a count variable where a set of high counts have been combined, I would not use ZIP, but perhaps instead an ordinal categorical approach. 2) If you mean a continuous variable which is censored, you can use a censorednormal approach in Mplus. The variable types that Mplus handles are shown in the UG. 


Hello Bengt, I would like to clarify ad2) my continouos variable is a percentage measure and has a lot of zeros. Can I just inflate normal theory based ML on zero with indicating (i) or something like this behind the variable? Best Alexander 


Mplus provides both censorednormal and censorednormal inflated modeling for such variables. 

Cecily Na posted on Thursday, June 14, 2012  12:33 pm



Hello Professors, I have a latent variable with several indicators which are either count or categorical. Should I convert the count indicator into a categorical indicator to perform CFA? Does it matter? Note that the count indicator has a range of 0200, but the categorical indicator has a range of 06. Greatly appreciate your advice! Cecily 


You should treat the count variable as a count variable by putting it on the COUNT list and the categorical variable should be put on the CATEGORICAL list. 

Cecily Na posted on Friday, June 15, 2012  11:58 am



Hello, Thank you. I want to follow up with my previous post. When outcomes are categorical, the slopes are probit. When outcomes are counts, does Mplus use poisson regression? Thanks. 


For variables on the CATEGORICAL list, weighted least squares gives probit regression. Maximum likelihood gives logistic regression as the default. Probit regression can be requested. Poisson models are used for variables on the COUNT list. 


Hello Professors, I'm testing for measurement invariance on a depression measure between 4 ethnic groups. The measure has 20 items scored on a 4point scale. I have two questions regarding fitting the unconstrained factor loading and unconstrained thesholds models. 1. I first ran separate CFA's for each of the 4 ethnic groups and all of the models fit well. I next ran the fully constrained multigroup model, which ran fine and also fit the data well. When I tried to run the model with unconstrained thresholds, I do not get fit indices because the model is underidentified. I am not sure if there is something wrong in my syntax or if having 3 thresholds for 20 items is too many? I am wondering, 1. would one solution be to fix or constrain some of the model parameters? and 2. are there certain parameters that I should try and fix/constrain? 2. My second issue occurred when trying to run the unconstrained factor loadings model with MLR estimation (I did this using a LCA framework). I was able to get fit indices when I first ran the WLSMV model first. However, when I tried to run the MLR model in order to get AIC and BIC, my model is not converging. I am confused as to why it would converge using one estimator versus another  and am wondering if you have any suggestions for how to get it to converge. Would greatly appreciate any advice. best, Laura 


1. I assume you treat the items as polytomous, categorical in which case you need to constrain some parameters  see MillsapTein (2004) in Multiv Behav Res. 2. The output for this needs to be looked at by Support. 


thanks so much for the advice. I will look at MillsapTein (2004) and try constraining some parameters I will send the output to Support. Laura 

Kathy Xiao posted on Sunday, April 09, 2017  5:22 am



I am doing a LCA where one of the variables is the number of disorder the participant has before, which has values as 0, 1, 2, 3. And 3 is a combined list of those having 3 and more disorders. Shall I treat it as a categorical variable instead of a continuous one? Thanks! 


Yes, if you have a strong floor effect. 


Good morning! I have a question about the scale of a latent variable with categorical indicator variables. I am conducting an ESEM analysis and using categorical indicator variables. I understand that when I have predictor variables with direct paths to my observed variables, I can interpret their effects as the log of odds. However, I am not clear on what the scale is for my latent variable underlying the observed categorical variables. My model fits best with a single factor, and I have significant effects from my predictor variables to the single factor. I understand in CFA with continuous variables, the scale is determined by the indicators if I set one path to equal 1. However, I am unclear on the scale for my latent variable with categorical indicators. Should I attempt an interpretation at all, or is it better to just report if the effects are significant? I greatly appreciate your guidance on this matter. 


The regression of the factor on its predictors is a linear regression because the factor is continuous even when the factor indicators are categorical. Because the indicators are categorical DVs, you have logistic regression if the Link=logit, otherwise probit regression. 


Thank you very much for your response. I appreciate it very much. In terms of reporting results for the effect of predictors on the latent variable, is standardization a wise option? Or is it better to just report the effects as significant or not significant only? Again, I greatly appreciate your time. 


I would report both unstandardized and standardize with their significance. You may want to ask these general analysis questions on SEMNET to get broader input. 

Back to top 