Mplus Discussion >> Underlying normality and polychoric correlations

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Underlying normality and polychoric c...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Michael Bohlig posted on Tuesday, March 28, 2000 - 6:26 am

Muthen (1993) indicates that LISCOMP could test the underlying normality in conjunction with polychoric correlations (p. 218). Is such a test available in Mplus? If so, how does one request such a test? Thanks for your assistance.

Muthen, B.O. (1993). Goodness of Fit with categorical and other nonnormal variables (pp. 205-234). In Bollen & Long (Eds.) Testing Structural Equation Models. Newbury Park: Sage

bmuthen@ucla.edu posted on Tuesday, March 28, 2000 - 7:52 am

No, this test is not included in Mplus. I have found that it is hard to draw conclusions from the multitude of such tests obtained with many variables.

Svend Kreiner posted on Tuesday, May 16, 2000 - 5:44 am

Despite the well-known problems with summarizing information from multiple testing it would be nice to have some indication of whether or not the assumption of underlying normality seems to be adequate.

Is there any way that we can get Mplus to show us the tables from which the polychoric correlations have been calculated and the fitted tables corresponding to the estimated thressholds and correlation coefficients?

Linda K. Muthen posted on Tuesday, May 16, 2000 - 7:33 am

This information is not currently printed in Mplus. I will add it to our list of suggestions for future improvements.

delfino posted on Thursday, July 11, 2002 - 7:08 am

Is it fair to say that MPLUS uses Generalized Least Squares (GLS) methods to estimate factor loadings in the case of factor analysis with binary variables? As indicated in Muthen (1989) and Mislevy (1986).

bmuthen posted on Monday, July 15, 2002 - 2:51 pm

Short answer, Yes. Long answer follows. The GLS estimator discussed in Muthen (1978, 1984, 1989) and Mislevy (1986) is called WLS in Mplus. A newer weighted least squares version is called WLSMV and is the default in Mplus. Technically speaking, WLSMV is not a GLS estimator since it is customary to use the "G" (generalized) to mean that the weight matrix is based on an estimate of the asymptotic covariance matrix of the sample quantities analyzed. WLSMV uses a diagonal weight matrix which is not the asymptotic matrix. For WLSMV the term weighted least squares estimator can be used.

Andreas Oranje posted on Friday, July 26, 2002 - 9:18 am

Are the polychorics in Mplus computed the same way as in Prelis? I.e.: is Olsson's 1979 ML Estimation used? If so, are the thresholds computed from the marginals or are parameters estimated simultaneously?

bmuthen posted on Friday, July 26, 2002 - 10:06 am

Yes, they are computed as in Olsson's article. The thresholds are computed from the univariate marginals, not from bivariate information.

Svend Kreiner posted on Monday, September 09, 2002 - 5:50 am

The chi squared model test of fit for categorical y variables is not discussed in the user's guide? Could you give me a reference?

Linda K. Muthen posted on Monday, September 09, 2002 - 8:39 am

Following is the reference for the chi-square test of fit when one or more dependent variables is categorical. You can request the paper from Bengt by emailing him at bmuthen@ucla.edu.

Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika. (#75)

You can find reference to this paper on page 357 of the Mplus User's Guide.

Annarita Roscino posted on Tuesday, December 03, 2002 - 11:33 am

hi. I read the paper 'Testing the assumption underlying tetrachoric correlation', by Muthén & Hofacker in Psychometrica(53). I would like to know the next developments. ie
testing the same assumption wheter latent are not normal, or for the polychoric, and so on. Thank you

bmuthen posted on Tuesday, December 03, 2002 - 7:51 pm

For the polychoric correlation, you can test underlying normality from a 2-way table since you have an over-identified model. The triplet testing is only necessary for binary items.

I am not sure I understand your question about next developments. If you mean what to do when rejecting normality, you might want to take a look at the article:

Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage. (#45

bmuthen posted on Monday, August 18, 2003 - 8:51 am

The discussion under this rubric gets into the question of whether polychoric and related techniques can be used only when the categorical outcomes are truly cut-off continuous ones, so that there are underlying continuous latent variable distributions. This question and surrounding misunderstandings come up about once a year on SEMNET. Now and then I try to clarify the misunderstandings but since they are repeated in the literature, new researchers are misled - so I'll offer an attempt at a clarification again. Clarification is also needed regarding the related SEMNET comments about "Satorra-Bentler corrections" and tests for multivariate normality that have come up in the discussion.

In short, the polychoric techniques for categorical data do not necessitate underlying continuous latent variables behind the outcomes, not even for "true dichotomies". Here is why. When analyzing categorical outcomes, SEMnetters are typically interested in a factor analysis model or an SEM, so consider for example a factor model with categorical outcomes (the same argument can be made using the example of a path analysis model with observed covariates replacing the factors). The confusion arises because there are two equivalent formulations of this model, the IRT (Item Response Theory) formulation and the underlying continuous latent variable formulation. In both cases, the factors are typically specified as normally distributed. The IRT model does not invoke the notion of a categorized underlying continuous latent variable, but merely describes the regression of each item on the factors using a regular probit or logit regression model. In contrast, the underlying continuous latent variable formulation adds to the factor specification the specification of a residual ("measurement error") that is normal or logistic. The sum of the factors and the residual is the underlying continuous latent variable for each outcome. The fact is that the IRT model amounts to the same as assuming underlying continuous latent variables behind the categorical outcomes. The two model formulations are statistically equivalent. The same item probability as in IRT is generated by the underlying continuous latent variable exceeding a threshold. Since the two model formulations are the same, this shows that no assumptions about "true" underlying continua are needed. This is true even when considering events like death - a truly non-equivocal state. If you believe that a factor such as disease severity influences the probability of death as in the IRT probit/logit regression, then this can equivalently be seen as the death item having an underlying continuous latent variable (propensity to die, or "frailty" as survival analysts would say).

This paragraph may be skipped for those less technically interested. One may raise the question of why in IRT there is typically no discussion of underlying normality and polychoric (or tetrachoric correlations). I think this is because IRT uses maximum-likelihood estimation in contrast to the limited-information (WLS) estimators connected with polychorics. The focus on the second-order moment estimation with polychorics as opposed to regular correlations ("phi coefficients") is what raises the question. Ironically, the two types of estimators give very similar results. It follows then that a more pointed question than the existence of underlying continuous latent variables is if the application warrants the specification of the factors being normally distributed (something that can be studied using e.g. finite mixture, i.e. latent class, modeling).

Satorra-Bentler corrections are good for giving better inference, i.e. better chi-square tests of overall model fit, but assume that linear models for continuous outcomes are relevant. When data are categorical and strongly non-normal (e.g. with strong floor/ceiling effects), the corrections are not sufficient because the parameter estimates themselves are wrong due to the linearity assumptions being off. Also, multivariate normality testing is not very useful when data are categorical because non-normality is a foregone conclusion.

More technical aspects of categorical outcomes modeling are discussed in Mplus Web Note #4 at www.statmodel.com. For an early article, see

Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.

Bengt Muthen

Anonymous posted on Thursday, August 21, 2003 - 7:54 am

Hi,
How do you test the significance of tetrachoric/ biserial correlations in m-plus?
thanks.

Linda K. Muthen posted on Thursday, August 21, 2003 - 9:39 am

If you ask for TYPE=BASIC, you will get the estimates of the correlations and their standard errors. You can take the ratio of the estimate to the standard error to determine significance which is distributed approximately as a z-score.

Anonymous posted on Saturday, October 25, 2003 - 11:53 am

Drawing from the B Muthen posting on August 18, 2003 and the recent publication on IRT & SEM by Glockner-Rist & Hoijtink in the SEM journal, the information shows how IRT & SEM are equivalent when analyzing categorical variables.

When using CVM, many books andarticles state that the assumption of underlying normality is needed to analyze categorical variables.

Can WLSMV & WLS can be used when no assumption of underlying normality is made (as with an IRT case)?
Is the requirement of underlying normality tied to CVM or the estimation technique?

bmuthen posted on Saturday, October 25, 2003 - 12:34 pm

It follows from my August 18 posting above that if IRT is used with the standard assumption of normally distributed factor(s), then underlying normality (of the underlying latent response variable y*) essentially holds as well. This is because y* is a weighted sum of the factor(s) and the residual and the residual is either assumed normal or logistic (which is close to normal) in both CVM and IRT - so this weighted sum is (close to) normal. If the factor is not normal, underlying normality doesn't hold. The underlying normality assumption is tied to the estimation technique (limited-information weighted least squares), but it follows that the same can be said about maximum likelihood for IRT when you apply the standard assumption of factor normality. Note that Mplus allows the more relaxed assumption of conditional normality of the factor given covariates. Note also that the forthcoming Mplus Version 3 has maximum likelihood estimation just like IRT (except no guessing parameters yet).

Anonymous posted on Sunday, April 04, 2004 - 4:13 am

Hello Drs Muthen,

Following up on the above thread about underlying normality of latent response variables, may I ask the following: (I hope it's not too long!)

In a recent paper in Educ + Psych Measurement by Greer, Dunlap and Beatty (Dec 2003, p931-950), they examine by monte carlo some situations using tetrachoric correlations where the underlying response variables are skewed, but are transformable to normal distributions (ie they are log normal in their simulations). They then show emprically that the tetrachoric correlations estimate the correlation between the *normal-transformed* underlying response variables. That is, tetrachorics do not estimate the correlation between the skewed underlying response variables themselves, but rather the correlations after transformation to normality of these variables.

Is the following logic concerning categorical response variables in SEM (eg CFA, EFA) correct?

1. There exist monotone transformations to normality of the latent underlying response variables, and it is these transformed variables that have structure defined by SEM involving latent factors and residual terms (which are assumed to be normally distributed).

2. Therefore the tetrachoric correlations based on the observed categorical response variables then estimate the correlations of the normally distributed transformed underlying response variables.

3. Therefore if the transformation(s) to normality of underlying response variables are monotonic, the probit relationship between observed categorical responses and underlying response variables (on their natural scale) would then also hold for the normal-transformed underlying response variables, ie Pr(Y>a) = Pr(g(Y)>g(a)), for a monotone function g() translating Y to normality, where Y is the "natural" scale of the underlying response variable, and a is the threshold on the "natural" scale.

4. We proceed as usual, and obtain estimates of factor structure on the normal-transformed scale.

Conclusion - the normality of the underlying response variables themselves is therefore NOT required for (eg) CFA/EFA. All that is needed is the assumption of monotone transformation to normality, which (I think) is OK for any unimodal distribution.

A problem may be interpretation of results of the latent structure on the transformed scale?

Also - would extensions to ordinal scales be fairly automatic? (as it would seem to me)

I hope these questions make sense, as there are sizable ramifications for robustness if the removal of the assumption of underlying response variable normality is valid!

bmuthen posted on Sunday, April 04, 2004 - 5:36 am

Three points come to mind. First, I need to be convinced that any bivariate distribution can be transformed into a bivariate normal distribution. Note that a univariate transformation is not sufficient. Second, a substantively interpretable model with simple (say linear) relationships may exist for underlying correlations corresponding to non-normal distributions, not normal ones, in which case transformations would make those relationships nonlinear and perhaps difficult to interprete. Third, if there is substantive reason to want to investigate non-normal distributions, a non-parametric representation of latent variables can be made using maximum-likelihood mixture modeling (see ex7.26 in the User's Guide of Mplus Version 3).

Anonymous posted on Wednesday, May 12, 2004 - 10:48 am

I recently used Mplus to test a model based on 29 observed variables. Because the data for the observed variables were responses to a 5-pt Likert scale, I was recommended to use Mplus and identify the variables as CATEGORICAL. Apparently this would allow for polychoric calculations? I emailed Mplus Help for an explanation of what "polychoric" is or does but was told to look at the website. But I still seem to be missing something. Although the term polychoric appears above, everyone seems to assume we all know what it means or what it does. I could really use a very basic explanation of what needs to occur if your data is categorical and why? THANKS!

Linda K. Muthen posted on Wednesday, May 12, 2004 - 11:34 am

Polychoric correlations are a type of correlation between ordered categorical variables. Tetrachoric correlations are a type of correlation beteen binary variables. Pearson correlations are a type of correlation between continuous variables. If you go to the reference section of the website under categorical variables, you will find references which should discuss these correlations.

Jim posted on Friday, August 20, 2004 - 5:44 am

I am new to MPLUS, having previously used LISREL and EQS heavily. I have a question concerning modeling categorical outcomes. Say I want to test a unidimensional CFA model using 10 variables measured using a 3-point scale. In LISREL, I would have to compute a polychoric correlation matrix and the asymptotic covariance matrix of this correlation in PRELIS and read it into LISREL. The model would then be estimated using these correlations in order to estimate a linear relationship between the factor and y* (the underlying continuous variable). After reading the MPLUS manual it seems that MPLUS calculates the relationship between y* and the factor directly using a probit model (factor loadings are probit regressions). Is this correct? Does MPLUS calculate polychoric (or tetrachoric) correlations in the process of CVM? I ask because there is all this talk of polychorics on the discussion board but when reading the manual it seems that they aren't necessary.

Would it be correct to say that a difference between MPLUS and the other two software programs when modeling categorical outcomes is that MPLUS calculates the parameters directly via a probit model, whereas the others calculate the parameters indirectly (by first computing the appropriate correlation matrix (e.g., polychorics) then using that matrix when estimating the parameters)?

Linda K. Muthen posted on Friday, August 20, 2004 - 9:17 am

When there are no covariates in the model, Mplus estimates tetrachoric/polychoric correlations to use in model estimation. This is done in one step, not two steps with Prelis followed by LISREL. When there are covariates in the model, Mplus estimates probit thresholds, probit regression coefficients, and residual correlations to use in model estimation. In both cases, the a probit model is used. Note, however, that with a maximum likelihood, a logistic model is used. A discussion of the differences between Mplus and LISREL in their handling of categorical outcomes can be found in Web Note 4 which is posted on the Mplus website.

Anonymous posted on Wednesday, September 15, 2004 - 8:11 am

In your paper published in the 1989 Sociological methods & research (vol 18, No. 1, 19-65), you mentioned a two-stage maximum likelihood procedure for estimating the "non-normal tetrachorics" (pp30) due to the non-normality of the underlying factor. Is this procedure embedded in Mplus? If so, how can I do it?

Linda K. Muthen posted on Wednesday, September 29, 2004 - 4:20 pm

No, it is not.

yang posted on Tuesday, October 12, 2004 - 5:15 pm

Hi, Dr. Muthen:
thanks for the reply.
Follow the last question. If Mplus doesnot include the two stage maximum likelihood procedure, do you know any software can do it, or can you recommend me a reference for the details of how to do it? Thanks again.

Bengt O. Muthen posted on Tuesday, October 12, 2004 - 5:22 pm

I am not aware of any software that uses this estimator or any reference that describes it in enough detail that you could do it yourself. In my experience, the covariates often do not have enough explanatory power for this estimator to give results that are very different from the regular approach.

Anonymous posted on Sunday, October 17, 2004 - 11:46 pm

Hi,
I have two questions, one regarding estimation, and the second one regarding interpretation of a path model with categorical variables (I am not using any factors). Assuming a causal flow from left to right, my left hand side of the model has one ordered dependent variable with three categories, and six independent variables (five binary and one continuous). I declared the dependent variable as categorical and I use the WLSMV estimation method.
When I run just that portion of the model, the estimates I get are quite different from a similar model estimated using the Ordered Probit method.
Why the difference? If I understand this thread correctly, under MPlus, my regression coefficients should be Probit estimates. Am I am missing something? I see that polychoric and tetrachoric correlations were mentioned, so I wonder if this has something to do with those differences (if the latter, then my next question may make 'some' sense).

Can I assume an underlying continuum for the binary �independent� variables (I know the standard interpretation on how the likelihood of Y* increases for a change from 0 to 1). But I understand that under polychoric or tetrachoric correlations, one assumes an underlying continuum for all binary variables. An interesting application for me would be to be able to say that an increase in, lets say, .1 to .3 in the binary independent variables (rather than from 0 to 1) leads to an X percentage point increase in one of the categories of the dependent variable (right now, I can only estimate percentage point differences when I change the independent from 0 to 1 - holding other variables at a constant value).
The independent variable could be whether people are aware or not of Kerry�s Health Care program, and the dependent variable the likelihood to vote for him. I would like to say if the proportion aware increases from 20% to 30% (.1 to .3), the very likely category changes by X percentage points.
I understand that the coefficients for dummy variables measure the average difference in the dependent for two groups, but I wonder if with polychoric or tetrachoric correlations (using MPlus) we could make this kind of interpretation (or approximation). Please, any advice or comment would be welcome.

Thanks!

Linda K. Muthen posted on Monday, October 18, 2004 - 10:52 am

I believe that you should get the same results. How are you estimating your model using the Ordered Probit method?

The binary indepdent variables are not part of the model estimation in Mplus unless you explicitly force them into the model. Then the model is not estimated conditioned on the x's. When you force them into the model as y variables (for example x ON z@0; would make it a y), the normality assumptions are stricter. But then you would have the underlying continuum interpretation that you want.

Anonymous posted on Monday, October 18, 2004 - 1:55 pm

Hi Linda,
Looking back at my results I may have overstressed the differences. Most coefficients are pretty similar, except one.
For one binary independent variable, the probit coefficient under MPlus is .523 while the figure in Stata is .468. Other coefficients and threshold values are quite similar. Can this be explained? I am pretty confident about Stata's results but I wonder if this has to do with slightly different estimation methods.

Regarding the second part of your message, let me see if I understood it correctly. Are you suggesting I can transform each of the binary independent variables into a continuous by pretending they are 'latent variables'? For example, if X1 is a binary independent variable, I create first the latent factor X1* (X1* on X1@0), and then regress Y* on X1*? Is that how you incorporate the binary independent into the model as a continuous (they are in the model already as dummy variables)?
If so, it would be what I need. My next question would be more technical. How can I simulate the predicted probabilities in Y* for a change in X1*. An example with a formula would be telling and very helpful. With observed variables I use Pr(yi = 1 / xi) = CDF(Tau1 - Alpha - Beta*xi), and so on. But MPlus does not give a constant (I do get the threshold values). It would help a lot to help translate these models to the lay person.
Thanks a bunch.

Linda K. Muthen posted on Monday, October 18, 2004 - 4:45 pm

The results for the regression of one y on a set of x's should be identical in Mplus and Stata. There must be some difference that you are not seeing, for example, a different sample size. If you send the Mplus and Stata complete outputs to support@statmodel.com, I can see if I can see the difference.

The answer to your second paragraph is yes. In thinking about this, I would instead create a factor behind each x instead of using an ON statement. So f1 BY x1; Then use f1 as a covariate.

Instead of beta*xi, you would use beta*fi where fi is the latent response variable behind your binary independent variable. The values for fi would be chosen according to the percentiles of a z distribution because the mean of f is zero and the variance of f is one. For example, if you are interested in the probability of y for individuals who are one standard deviation above the mean of fi, you choose the value of one for fi.

Anonymous posted on Monday, October 18, 2004 - 5:34 pm

Thanks for your quick answer.
I will send copies of the output to the e-mail above (sample size are identical in both runs).

Thank you for the rest of your message. I've done this before, but it did not occur to me that the new factor would be an underlying continuum. So I have to assume that is a standardized variable. Is it possible to transform that back to the original scale (ranging from 0 to 1), so that I can talk about an increase in the proportion of people, lets say aware of Kerry's Health plan, going from .3 (30%) to .5 (50%) or so? For the average person, the impact of a one standard deviation change on the indep. variable on the dep does not mean much and my whole problem is how can I translate these models into something the lay person could understand outside an academic environment (even if is a rough approximation). I would just like to know if this is possible or just wishful thinking. Is there a way to I understand this may go beyond the scope of support you provide here.

Thanks again for all your help.

Anonymous posted on Monday, October 18, 2004 - 5:37 pm

Sorry about the last sentence.
Where it says "Is there a way to I understand this .." should say "I understand this may go beyond the scope of support you provide here".

Linda K. Muthen posted on Tuesday, October 19, 2004 - 9:27 am

Thanks for sending your output for Mplus and Stata. I see that you are including a weight variable in the Mplus analysis. I assume that you are doing the same in the Stata analysis. The differences you are seeing are most likely because of differences in how weights are handled in Mplus and Stata. See Web Note 7 on the Mplus website for a discussion of weights in Mplus.

Regarding a change in the proportion of the independent variable, it is hard to transform this variable to a zero/one range. Instead how about translating the proportion changes that you are interested in to values on the standardized factor so the audience is presented with proportion changes but you calculate them behind the scenes as standardized factor values. For example, changing a proportion from 50 percent to 75 percent implies a standardized factor value change from 0 to .675 (z scores for 50 percent and 75 percent).

Anonymous posted on Monday, February 07, 2005 - 5:14 am

Dr Muthen

I was wondering if you would be so kind as to help me with a stats query regarding my dissertation.

I am attempting to factor dichotomous (present/absent) data and am in a quandry as to which correlational method to apply to my data set. I wondered if you could highlight the benefits and limitations of using binary methods such as the Jaccard, Hamman, Tetrachoric and phi four point, or if you could point me in the right direction for literature addressing these approaches.

Thank you for your time

Linda K. Muthen posted on Monday, February 07, 2005 - 9:08 am

I don't think there is one single overview of this topic although I think that tetrachoric correlations are probably the most accepted method. You might look at the following papers:

1. Flora and Curran in the December 2004 Psych Methods.

2. Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.

3. Muthén, B. & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 453, 19-30.

4.
Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage.

You might also consider doing your own comparison the the methods as part of your dissertation.

Anonymous posted on Tuesday, February 08, 2005 - 12:56 am

Dr Muthen,

Thankyou for your swift reply, and for the advice, it is much appreciated. I will consult the references.

Thanks again

Anonymous posted on Sunday, January 22, 2006 - 1:28 pm

Concerning ordered categorical variables, what is the definition of a threshold? Do you have a reference? Thanks.

bmuthen posted on Sunday, January 22, 2006 - 7:39 pm

For a normal latent response variable with a unit variance, thresholds are simply z scores. See definitions and references in Appendix 1 of the Technical Appendices for Mplus Version 2 on the Mplus web site. Or, read the Muthen (1993) "Goodness of fit..." chapter in the Bollen-Long book on the web site.

Gabriel Schlomer posted on Tuesday, March 06, 2007 - 3:28 pm

Hi,

In the model that I am analyzing I have endogenous and exogenous binary ordinal variables as well as endogenous and exogenous continuous variables. Relationships in the model include binary-->binary, continuous-->continuous, binary-->continuous, and continuous-->binary. Some of the binary variables can be assumed to have an underlying normal distribution while others cannot. Furthermore, all of my latent variabels are specified using single indicators, save one which is continuous and specifed from three.

After a lot of digging I am still unsure as to what is the most appropriate approach. I have experience using LISREL and have recently been turned on to Mplus because I'm told it can handle truly dichotomous variables. I understand (refering to the bmuthen Aug. 18 2003 post) that there is some debate about the robustness of the assumptions regarding polychoric (tetrachoric) corrleations, specifically the underlying normal distribution of binary ordinal variables. If it is tenable to use polychoric correlations even when indicators are not assumed to have an underlying normal distribution, what is the advantage of using Mplus to analyze these type of data in contrast to using the PRELIS/LISREL two step technique? What is the appropriate technique in Mplus when these assumptions cannot be met?

Any insight you could provide would be of great help and please let me know if I misinterpreted anything.

Linda K. Muthen posted on Wednesday, March 07, 2007 - 10:12 am

Suppose you have the following model:

x -> u1 -> u2

where u1 and u2 are categorical.

For u2 it does not matter if the variable u2 or the variable u2*, underlying latent variable, is used. In both cases, the model is the same because u2 is a final variable in the chain. However, for the mediating variable u1, the model is not the same when using u1 versus u1*.

One difference between LISREL and Mplus for such a model is that in Mplus you can treat u1 as u1* by using weighted least squares estimation or you can treat it as u1 by using maximum likelihood estimation. In LISREL, you can treat it only as u* I believe.

Linda K. Muthen posted on Wednesday, March 07, 2007 - 10:13 am

One other thing -- it is not necessary to put a latent variable behind a single indicator in Mplus. This is done automatically if it is necessary.

Gabriel Schlomer posted on Wednesday, March 07, 2007 - 10:33 am

Hi Dr. Muthen,

Thank you for the info, I am curious to know what is going on computationally in these kind of analyses when using Mplus. Drawing on your example, what regression technique does Mplus use for each of the paths. Would it use logistic for all the paths or something different for each given the nature of the variables. Also, computationally, how does the model change when using u1 vs. u1*. The only reason I put my single indicators into latent space is because, you're right, LISREL doesn't allow Beta's to be specified among indicators. So what is Mplus doing in these different situations?

Linda K. Muthen posted on Wednesday, March 07, 2007 - 10:50 am

WLS estimates a probit regression model when the dependent variable is categorical. ML can estimate either a probit or a logistic regression model when the dependent variable is categorical. Logistic is the default. For both estimators, if the dependent variable is continuous, simple linear regression coefficients are estimated.

The difference here is in how u1 is treated when it is a covariate in the regression of u2 on u1. With weighted least squares, the latent response variable u1* is used as the covariate. With maximum likelihood, the observed variable u1 is used. In both cases as in regular regression, the covariates is treated as a continuous variable.

Mplus puts a factor behind the variable automatically. I think this is safer because people often do this incorrectly. There will be no difference in the results. It is just a convenience feature.

Lois Downey posted on Thursday, July 05, 2007 - 8:32 am

I have 4 questions related to bmuthen's posting of Monday, August 18, 2003 - 8:51 am. I read this posting to mean that if I build a probit regression model with an OBSERVED ordinal outcome, the appropriateness of that model is NOT contingent on an assumption that the observed outcome represents an underlying normally distributed variable.
QUESTION 1: Is my interpretation correct?

I have 26 measured outcomes, representing 26 priority scores. For each outcome, the value is 5 if the respondent rated the item as their top priority, down through 1 if it was their 5th most important priority. If the item was not in the respondent's top 5 priorities, it was coded 0. So by definition, most of the outcomes have an abundance of 0's, since each respondent gave a non-0 rating to only 5 of the 26. I would like to use multiple regression models to test whether any of the priority scores are contingent on demographic features such as gender, racial/ethnic minority status, age, etc. Distributionally, the outcomes suggest ZIP regressions, except the scores are not counts.
QUESTION 2: Is probit regression with the WLSMV estimator the optimal choice for this problem?
QUESTION 3: Would ordinal logistic regression be an acceptable alternative, and are there circumstances that would make it a better choice than probit?
QUESTION 4: Are ZIP regressions definitely inappropriate, given that the scores are not counts?

Thank you.

Linda K. Muthen posted on Thursday, July 05, 2007 - 9:29 am

1. Your interpretation is correct for final dependent variables. This would not be the case if the variable is a mediating variable.
2.-3. Either a probit or logisitic regression would be appropriate.
4. Yes.

Jeannine Tamez posted on Monday, July 09, 2007 - 9:18 am

Hello,

I am validating a measure that uses dichotomous variables. The literature suggests to use a tetrachoric correlation matrix in the analysis. However, I keep getting the following error and am unsure how to tackle this. Please advise as I am new to this program and type of analysis. Thank you.

Jeannine

Error Message

WARNING: THE BIVARIATE TABLE OF A51 AND A11 HAS AN EMPTY CELL.

THE MODEL ESTIMATION TERMINATED NORMALLY

WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE
DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A
LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT
VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES.

CHECK THE TECH4 OUTPUT FOR MORE INFORMATION.
PROBLEM INVOLVING VARIABLE F5.

Linda K. Muthen posted on Monday, July 09, 2007 - 9:55 am

When a bivariate table has an empty cell, that implies a tetrachoric correlation of one. So only of the two variables should be used in the analysis.

The warning says that f5 either has a negative variance/residual variance which you can see in the results, a correlation greater or equal to one between it and another latent variable which you can see in TECH4, or a linear dependency with another latent variable. Any of these makes the model inadmissible.

Cecily Na posted on Saturday, February 05, 2011 - 12:22 pm

Hi,
I did polychoric correlation analysis using WLS and WLSMV respectively. How come I get the same polychoric correlations and standard errors for both estimators? Are they supposed to be the same? thanks!

Linda K. Muthen posted on Sunday, February 06, 2011 - 10:44 am

The sample statistics for model estimation are the same for both estimators.

Cecily Na posted on Thursday, July 12, 2012 - 8:06 am

Hello Linda,
I have a heavily skewed variable which I want to treat as categorical (ordered ordinal), but I don't think the variable has underlying normal distribution. Is this a problem? I think I read somewhere that categorical variables are normally distributed underlyingly; thus we can calculate, say, polychoric correlations.
Thanks!

Linda K. Muthen posted on Thursday, July 12, 2012 - 10:06 am

How many categories does the variable have?

Cecily Na posted on Saturday, July 14, 2012 - 1:39 pm

Hello,
Some are dichotomous, some have three or five categories.

Cecily Na posted on Saturday, July 14, 2012 - 1:44 pm

I'm sorry. Corrections. One latent variable has four to five dichotomous indicators; other latent variables either have three-level indicators or five-level indicators. They're not supposed to be normally distributed underlyingly.Can I treat these indicators as categorical? Thanks!

Linda K. Muthen posted on Sunday, July 15, 2012 - 9:51 am

Categorical variables with floor or ceiling effects can have normal underlying latent response variables. See Slide 117 of the Topic 2 course handout on the website.

Categorical data methodology can deal with floor and ceiling effects.

Alden Gross posted on Wednesday, September 12, 2012 - 1:10 pm

Dear Drs. Muthen,
I need to run CFA models in Mplus from a correlation matrix. My indicators are categorical. If I feed Mplus with a polychoric correlation matrix instead of columns of real data, may I say I have done a CFA using categorical indicators, without having to specify that indicators are categorical? Or is the "categorical = u1 u2 u3;" subcommand under Variables doing something more than calculating polychoric/tetrachoric correlations among my indicators?

Linda K. Muthen posted on Thursday, September 13, 2012 - 10:48 am

It will be the same if you use the ULS estimator. With all other estimators it is not the same because you are not using a weight matrix.

Rolf Gjestad posted on Wednesday, August 28, 2013 - 4:01 am

Dear Dr. Muthen

I installed the latest version of Mplus after using then version 5.21 for some time. Then I experienced that ordinal predictors that I had declared to be categorical in earlier analyses not any more were allowed for such declarations. In the earlier version, this gave estimation of thresholds. Now, a message was given that such declaration only is possible for outcome variables and not predictors. I have used LISREL for several years and I am now unsure if polychorical correlations between two ordinal variables, as it is estimated with prelis, is possible in Mplus? Or does Mplus assumes that a predictor variable is at interval level even if this acutally is a ordinal variable?

Thank you.

Bengt O. Muthen posted on Wednesday, August 28, 2013 - 6:16 pm

Yes, Mplus can estimate correlations between polytomous, ordered variables.

But, no, you don't want to specify covariates (predictors) as categorical because that involves unnecessarily strong model assumptions. This is explained in the Muthen (1984) Psychometrika article, where with covariates you want to take the "Case B" approach.

Yes, I would treat an ordinal covariate as continuous unless I had very strong reasons not to.