Sanjoy posted on Thursday, April 28, 2005 - 9:41 pm
Dear professor/s ... can we measure polychoric correlation in MPlus ... what should be my "analysis" and "output" command ... I couldn't find any in the MPlus CD
I have six five-scaled categorical variables
thanks and regards
Sanjoy posted on Thursday, April 28, 2005 - 9:45 pm
Oh! in connection to my earlier post ...I forgot to mention onething ... none of them are covariates, all six are indicator outcome variables .. and they are categorical in nature, I suppose in order to check the association among these six , we need to find the polychoric correlation ...
It seems right. Some parameters are free as the default. You can read about defaults in the Mplus User's Guide. If you want this parameter to be fixed to zero, say r WITH b@0;
The correlation between r and r1 is a biserial correlation. It is estimated from the sample statistics of the observed variables. You can think of the correlation between r and r1 as the correlation between the factor scores for r and the scores for r1 but factor scores are not actually computed in order to estimate the correlation between r and r1.
Sanjoy posted on Saturday, April 30, 2005 - 2:22 am
Thanks ... well madam, a mild confusion remains
1.are we then calculating everything simultaneously in MPlus ...
I mean the Factor analysis term is ok (in regular textbook jargon; y = delta*eta + epsilon along with threshold adjustment since Ri's and Bi's are all categorical)
R BY R7-R9; B BY B6-B8; R1 WITH R; B2 WITH B;
Now vector “eta” is 2*1, one of them is "R" and other one is "B", right ... R and B, our two continuous latent variable...
The next two lines (WITH stuff) requires two calculation of "polyserial" correlation, one for R1(5-scale categorical) and R, and the other one for B1(5-scale categorical) and B
So how is MPLUS measuring (asking for the program logistics behind) ... is it a two-step or some kind of Full information technique
Thanks and regards
Sanjoy posted on Saturday, April 30, 2005 - 3:38 am
Madam, In connection to my previous post ... kindly check my output and tell me please why we are having two DIFFERENT correlation matrix
Model 1: we are running everything simultaneously
TITLE: polychoric test DATA: FILE IS d:\mpluspaper1.txt; VARIABLE: NAMES ARE X1-X19 Y1-Y4 XB1-XB6 XP1-XP9 R1-R9 B1-B11 T1-T4; USEVARIABLES ARE R1 R7-R9 B2 B6-B8; CATEGORICAL ARE R1 R7-R9 B2 B6-B8;
MODEL: R BY R7-R9; B BY B6-B8; R1 WITH R; B2 WITH B;
file is d:\polychoric.txt;
Estimates S.E. Est./S.E.
R BY R7 1.000 0.000 0.000 R8 0.924 0.220 4.205 R9 0.855 0.195 4.386
B BY B6 1.000 0.000 0.000 B7 0.890 0.221 4.037 B8 0.960 0.227 4.233
R1 WITH R 0.369 0.074 4.972
B2 WITH B 0.421 0.084 5.035
B WITH R 0.023 0.055 0.416
Variances R 0.605 0.179 3.386 B 0.579 0.195 2.965
Model 2: using the data set "d:\polychoric.txt" which has factor score saved from model 1 ... here we are calculating only polyserial correlation between R1 and R and B2 and B (using TYPE=BASIC)
LOOK AT THE OUTPUT ... each value is different, correlation as well as the variance of R and B
TITLE: Polyserial test between factor scores and R1 and B2 DATA: FILE IS d:\polychoric.txt; VARIABLE: NAMES ARE R1 R7-R9 B2 B6-B8 R B; USEVARIABLE ARE R1 B2 R B; CATEGORICAL ARE R1 B2;
ANALYSIS: PARAMETERIZATION=THETA; TYPE=BASIC;
CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
R1 B2 R B
R1 B2 0.191 R 0.353 -0.006 0.332 B 0.224 0.405 0.040 0.316
Thanks and regards
bmuthen posted on Saturday, April 30, 2005 - 2:52 pm
The WLSMV estimator first computes a sample correlation matrix (tetrachoric, polychoric) and then fits the model to that, thereby estimating the model parameters. So the fitting of the model is similar to what is done if the outcomes had been continuous. No factor score estimation is involved in this, but the parameters are estimated directly.
If you instead estimate factor scores and then fit a model to a covariance matrix involving those estimated scores, you will get biased results. These biases are well-known in psychometrics and are due to the fact that estimated factor scores do not have the same variances or covariances with other variables as the true factors. See literature on factor score estimation in Psychometrika.
Sanjoy posted on Saturday, April 30, 2005 - 10:41 pm
Thank you Professor ... I think I got your words, at least partially
1. In our model-1 we keep the idea of checking correlationship between R1 and R (which is the common factor to R7-R9), however we are not calculating the factor score ... hence we are circumventing the problems associated with factor score calculation like Thurstone validity maximization at the cost of un-orthogonality or Anderson's process which ensures us orthogonality but lacks determinacy and so on
2. In model 2, instead of R, we are using "estimated R", which it self incorporates some measurement error and hence we end up having some bias while calculating correlation between R and R1 at the second step ... am I right!
I never have Psychometrics, my major was Statistics and Economics, so my acquaintance with psychometric literature is very minimal ... could you please refer one seminal article like urs one (1984, 1983) so that I will be able to understand the basic nuances and the solution of the factor score calculation problem ... I'm relatively comfortable with mathematical rigor
Yes, for an unconditional model using weighted least squares regression. For a conditional model, the sample statistics used for model estimation are the thresholds, probit regression coefficients, and residual polychoric correlations.
Hi, I was wondering is there is an adequate procedure to obtain the polychoric correlation between two variables with underlying non-normal discributions, that have in addition been censored in the middle (so that only extremes are used), and dichotomized? Thanks a lot, Sanja
Cecily Na posted on Wednesday, December 15, 2010 - 3:41 am
Dear Linda, I did an SEM with MLSMV. I suppose the correlation in the output before the model estimation is the polychoric matrix of the variables? Why on the diagnol, the correlation is not 1, but very close to 1? I am copying from the output the diagnal of the correlation matrix, all with non-1 values. 0.851 0.993 0.998 0.994 0.747 0.744 0.985
Thank you very much!
Cecily Na posted on Wednesday, December 15, 2010 - 5:47 am
Dear Linda, A follow-up of my previous post. I think I mistook the covariance coverage of data for correlation matrix. So there shouldn't be any confusion regarding it. I would like to know what the covariance coverage of data in the output is. Thank you very much for your time.
I have one simple question on obtaining tetrachoric/biserial correlations. I have tried ˇ°type=basicˇ± command, as you pointed out. In addition, I also tried the "modelˇ± command along with ˇ°samstat.ˇ±The correlation matrixes are somewhat different, and I was wondering why this occurs?
Eric Deemer posted on Saturday, December 28, 2013 - 4:58 pm
Hello, I'd like to save the correlation matrix for a set of ordered categorical variables using the SAMPLE option. Just to be sure, will the saved matrix consist of polychoric or tetrachoric correlations? Many thanks.
If the variables are on the CATEGORICAL list, the correlations will be polychoric correlations.
Eric Deemer posted on Saturday, December 28, 2013 - 7:53 pm
Great, thanks Linda!
Lars Bocker posted on Tuesday, January 28, 2014 - 8:39 am
I understand the CATEGORICAL list is for dependent variables only and my independent dummy variables are read by Mplus as continuous. If I understand it correctly the correlation matrix then estimates polychoric correlations between the dependent variables, but not between the dependent and the independent variables. I am asking this because I am trying to understand differences in model outcomes between Mplus and LISREL, which we found out are based on a different correlation matrix in LISREL, in which we specified also the independent variables as categorical.
Do you know if it would be possible, or necessary, to also specify the measurement level of independent variables in mplus?
You can treat the independent variables as dependent variablesin Mplua and put them on the CATEGORICAL list. In regression analysis, however, the model is estimated conditioned on the independent variables. Treating them as dependent variables is not advantageous. See the Muthen 1984 Psychometrika article where Case A is compared to Case B.
Nara Jang posted on Friday, March 28, 2014 - 3:16 am
Dear Dr. Muthen,
Would you tell me what type of correlation I need to conduct for the mixed variables such as binary, ordinal, rank, and continuous variables.
Thank you so much for your expert advice in advance!
Dear Dr. Muthen, above you mention that polychoric covariance matrices do not exist. However, I have found several references online that describe at least a method to estimate such, e.g., in Bollen & Curran (2005) on page 238 as a rescaled polychoric correlation matrix (using standard deviations/variances of the variables in their original form for the rescaling). Unfortunately, I do not have access to the entire book, but I wonder: If the polychoric covariance matrix is estimated that way, can any model be applied to that matrix without having to worry about scale invariance (Cudeck, 1989) etc.?
The term polychoric refers to correlations, not covariances as there is typically no information on variances for categorical items. A special approach to multiple-group and longitudinal modeling with ordinal data has been described by Joreskog which uses the term polychoric covariance matrix and that term is picked up in the Bollen-Curran book. I have criticized this approach in my Mplus Web Note 4, see section 8. A better approach is available in Mplus using WLSMV and the default Delta parameterization. Alternatively, maximum-likelihood estimation can be used, bypassing the limited-information polychorics.
Thank you for your fast response! I have read section 8 of note 4 you are referring to. I am still having trouble recognizing the direct connection between estimation of a polychoric covariance matrix and the issues discussed in aforementioned note, and I apologize for asking you to bear with me. The main criticism of Joreskog's ideas seems to lie with assumed threshold invariance, and I do not understand the role of polychoric covariance estimation in that.
Aside from that, I agree that typically the calculation of means and standard deviations for categorical variables (measured on an ordinal level) is inappropriate. However, when calculating polychoric correlations, are we not estimating a property for an underlying variable with a number of assumed properties, including interval level measurement? And could we therefore assume that means and variances based on the values of the categorical variables are our best estimates for means and variances of the underlying continuous variables?
Yes, the issue is the means and, in this case in particular, the variances of the underlying continuous-normal latent response variables. So in principle such variances are well-defined. It is just a matter of which assumptions you are willing to make in order to identify those variances. That's what my web note deals with.