Mplus Discussion >> Polychoric correlation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Polychoric correlation

Mplus Discussion > Categorical Data Modeling >

Message/Author

Sanjoy posted on Thursday, April 28, 2005 - 3:41 pm

Dear professor/s ... can we measure polychoric correlation in MPlus ... what should be my "analysis" and "output" command ... I couldn't find any in the MPlus CD

I have six five-scaled categorical variables

thanks and regards

Sanjoy posted on Thursday, April 28, 2005 - 3:45 pm

Oh! in connection to my earlier post ...I forgot to mention onething ... none of them are covariates, all six are indicator outcome variables .. and they are categorical in nature, I suppose in order to check the association among these six , we need to find the polychoric correlation ...

thanks

Linda K. Muthen posted on Thursday, April 28, 2005 - 5:16 pm

If you put your outcomes on the CATEGORICAL list in the VARIABLE command and ask for TYPE = BASIC; in the ANALYSIS command without the MODEL command, you will get polychoric correlations.

Sanjoy posted on Friday, April 29, 2005 - 5:06 pm

Thank you madam, it worked nicely ...

Two very quick questions before the weekend starts

Q1. Kindly tell me whether my codes are correct ... below is what I want to do

I have latent factor "R" being loaded onto "R7 - R9"
I have latent factor "B" being loaded onto "B6 - B8"

I WANT to check another indicator named "R1" is related with the latent factor "R", similarly for "B1" with "B"

below is my code

DATA: FILE IS d:\mpluspaper1.txt;
VARIABLE:
NAMES ARE X1-X19 Y1-Y4 XB1-XB6 XP1-XP9 R1-R9 B1-B11 T1-T4;
USEVARIABLES ARE R1 R7-R9 B2 B6-B8;
CATEGORICAL ARE R1 R7-R9 B2 B6-B8;

ANALYSIS: PARAMETERIZATION=THETA;
ESTIMATOR=WLSMV;

MODEL: R BY R7-R9;
B BY B6-B8;
R1 WITH R;
B2 WITH B;

output is ok, In fact according to my expectation ... I just want to make sure I have done the correct thing

* though I have NOT asked for the correlation between "R" and "B" ... MPlus reports that also in the output ... WHY?

Q2. I understand the maths behind Factor analysis but how does MPlus measure the correlation between the our latent "R" and other categorical "R1"

Does it use the "estimated value of factor�? Or something else

Linda K. Muthen posted on Friday, April 29, 2005 - 5:29 pm

It seems right. Some parameters are free as the default. You can read about defaults in the Mplus User's Guide. If you want this parameter to be fixed to zero, say r WITH b@0;

The correlation between r and r1 is a biserial correlation. It is estimated from the sample statistics of the observed variables. You can think of the correlation between r and r1 as the correlation between the factor scores for r and the scores for r1 but factor scores are not actually computed in order to estimate the correlation between r and r1.

Sanjoy posted on Friday, April 29, 2005 - 8:22 pm

Thanks ... well madam, a mild confusion remains

1.are we then calculating everything simultaneously in MPlus ...

I mean the Factor analysis term is ok (in regular textbook jargon; y = delta*eta + epsilon along with threshold adjustment since Ri's and Bi's are all categorical)

R BY R7-R9;
B BY B6-B8;
R1 WITH R;
B2 WITH B;

Now vector �eta� is 2*1, one of them is "R" and other one is "B", right ... R and B, our two continuous latent variable...

The next two lines (WITH stuff) requires two calculation of "polyserial" correlation, one for R1(5-scale categorical) and R, and the other one for B1(5-scale categorical) and B

So how is MPLUS measuring (asking for the program logistics behind) ... is it a two-step or some kind of Full information technique

Thanks and regards

Sanjoy posted on Friday, April 29, 2005 - 9:38 pm

Madam, In connection to my previous post ... kindly check my output and tell me please why we are having two DIFFERENT correlation matrix

Model 1: we are running everything simultaneously

TITLE: polychoric test
DATA: FILE IS d:\mpluspaper1.txt;
VARIABLE:
NAMES ARE X1-X19 Y1-Y4 XB1-XB6 XP1-XP9 R1-R9 B1-B11 T1-T4;
USEVARIABLES ARE R1 R7-R9 B2 B6-B8;
CATEGORICAL ARE R1 R7-R9 B2 B6-B8;

ANALYSIS: PARAMETERIZATION=THETA;
ESTIMATOR=WLSMV;

MODEL: R BY R7-R9;
B BY B6-B8;
R1 WITH R;
B2 WITH B;

SAVEDATA:
SAVE=FSCORES;

file is d:\polychoric.txt;

MODEL RESULTS

Estimates S.E. Est./S.E.

R BY
R7 1.000 0.000 0.000
R8 0.924 0.220 4.205
R9 0.855 0.195 4.386

B BY
B6 1.000 0.000 0.000
B7 0.890 0.221 4.037
B8 0.960 0.227 4.233

R1 WITH
R 0.369 0.074 4.972

B2 WITH
B 0.421 0.084 5.035

B WITH
R 0.023 0.055 0.416

Variances
R 0.605 0.179 3.386
B 0.579 0.195 2.965

Model 2: using the data set "d:\polychoric.txt" which has factor score saved from model 1 ... here we are calculating only polyserial correlation between R1 and R and B2 and B (using TYPE=BASIC)

LOOK AT THE OUTPUT ... each value is different, correlation as well as the variance of R and B

TITLE: Polyserial test between factor scores and R1 and B2
DATA: FILE IS d:\polychoric.txt;
VARIABLE:
NAMES ARE R1 R7-R9 B2 B6-B8 R B;
USEVARIABLE ARE R1 B2 R B;
CATEGORICAL ARE R1 B2;

ANALYSIS: PARAMETERIZATION=THETA;
TYPE=BASIC;

MODEL RESULTS

CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)

R1 B2 R B

R1
B2 0.191
R 0.353 -0.006 0.332
B 0.224 0.405 0.040 0.316

Thanks and regards

bmuthen posted on Saturday, April 30, 2005 - 8:52 am

The WLSMV estimator first computes a sample correlation matrix (tetrachoric, polychoric) and then fits the model to that, thereby estimating the model parameters. So the fitting of the model is similar to what is done if the outcomes had been continuous. No factor score estimation is involved in this, but the parameters are estimated directly.

If you instead estimate factor scores and then fit a model to a covariance matrix involving those estimated scores, you will get biased results. These biases are well-known in psychometrics and are due to the fact that estimated factor scores do not have the same variances or covariances with other variables as the true factors. See literature on factor score estimation in Psychometrika.

Sanjoy posted on Saturday, April 30, 2005 - 4:41 pm

Thank you Professor ... I think I got your words, at least partially

1. In our model-1 we keep the idea of checking correlationship between R1 and R (which is the common factor to R7-R9), however we are not calculating the factor score ... hence we are circumventing the problems associated with factor score calculation like Thurstone validity maximization at the cost of un-orthogonality or Anderson's process which ensures us orthogonality but lacks determinacy and so on

2. In model 2, instead of R, we are using "estimated R", which it self incorporates some measurement error and hence we end up having some bias while calculating correlation between R and R1 at the second step ... am I right!

I never have Psychometrics, my major was Statistics and Economics, so my acquaintance with psychometric literature is very minimal ... could you please refer one seminal article like urs one (1984, 1983) so that I will be able to understand the basic nuances and the solution of the factor score calculation problem ... I'm relatively comfortable with mathematical rigor

Thanks and regards

bmuthen posted on Sunday, May 01, 2005 - 5:05 pm

Sounds like you got that right. As for factor score literature, search for Skrondal's Psychometrika article in the last 5 years.

Sanjoy posted on Sunday, May 01, 2005 - 6:02 pm

Thank you professor ... I will look for his articles ... regards

Sanja Franic posted on Tuesday, April 07, 2009 - 3:07 am

If I specify all indicator variables as ordinal, does MPlus calculate (and perform all subsequent calculations on) polychoric correlation matrices by default?

Linda K. Muthen posted on Tuesday, April 07, 2009 - 10:18 am

Yes, for an unconditional model using weighted least squares regression. For a conditional model, the sample statistics used for model estimation are the thresholds, probit regression coefficients, and residual polychoric correlations.

Andrea Vocino posted on Sunday, October 04, 2009 - 9:21 pm

I am trying to estimate polychoric asympt cov matrix in text format in mplus 5.21 and wondering wether the following syntax is appropriate. Thx in advance

TITLE: This is the Mplus syntax to extract
polychoric asympt cov matrix in text format

DATA: FILE IS c:\tetrad\file.txt;

VARIABLE: NAMES ARE q83 q84 q85 q88 q89 q90 q91;
CATEGORICAL q83 q84 q85 q88 q89 q90 q91;

ANALYSIS: TYPE = GEN;
ESTIMATOR = WLS;
MODEL: q83-q90 WITH q91;
q83-q89 WITH q90;
q83-q88 WITH q89;
q83-q85 WITH q88;
q83-q84 WITH q85;
q83 WITH q84;

SAVEDATA: tech3 is Jason22.acm;
OUTPUT: SAMPSTAT;

Linda K. Muthen posted on Monday, October 05, 2009 - 8:02 am

That should do it.

Sanja Franic posted on Tuesday, August 03, 2010 - 8:00 am

Hi, I was wondering is there is an adequate procedure to obtain the polychoric correlation between two variables with underlying non-normal discributions, that have in addition been censored in the middle (so that only extremes are used), and dichotomized?
Thanks a lot,
Sanja

Linda K. Muthen posted on Tuesday, August 03, 2010 - 8:56 am

I am not aware of such a procedure.

Cecily Na posted on Tuesday, December 14, 2010 - 9:41 pm

Dear Linda,
I did an SEM with MLSMV. I suppose the correlation in the output before the model estimation is the polychoric matrix of the variables? Why on the diagnol, the correlation is not 1, but very close to 1?
I am copying from the output the diagnal of the correlation matrix, all with non-1 values.
0.851
0.993
0.998
0.994
0.747
0.744
0.985

Thank you very much!

Cecily Na posted on Tuesday, December 14, 2010 - 11:47 pm

Dear Linda,
A follow-up of my previous post. I think I mistook the covariance coverage of data for correlation matrix. So there shouldn't be any confusion regarding it.
I would like to know what the covariance coverage of data in the output is.
Thank you very much for your time.

Linda K. Muthen posted on Wednesday, December 15, 2010 - 6:10 am

It tells you the percentage of observations with no missing data for that value.

Cecily Na posted on Wednesday, December 15, 2010 - 8:34 am

Dear Linda,
Thanks! When I use WLSMV, the correlation matrix generated in the output before the model estimates should be the polychorical correlations of the observed variables, right?
Thanks!

Linda K. Muthen posted on Wednesday, December 15, 2010 - 8:40 am

If you ask for SAMPSTAT and put the ordered polytomous variables on the CATEGORICAL list, the correlation matrix for those variables are polychoric correlations.

Cecily Na posted on Saturday, February 05, 2011 - 11:11 am

Dear Linda,
I used WLS to generate polychoric covariance matrix. Why couldn't I get the covariance matrix, but only the correlation matrix? What is the command I can use?
Thanks a lot!

Linda K. Muthen posted on Sunday, February 06, 2011 - 10:45 am

There is no such thing.

Byungbae Kim posted on Sunday, July 01, 2012 - 5:12 pm

HiDr.Muthen

I have one simple question on obtaining tetrachoric/biserial correlations.
I have tried ��type=basic�� command, as you pointed out. In addition, I also tried the "model�� command along with ��samstat.��The correlation matrixes are somewhat different, and I was wondering why this occurs?

Thank you.

Linda K. Muthen posted on Sunday, July 01, 2012 - 7:57 pm

Please send the two outputs and your license number to support@statmodel.com.

Eric Deemer posted on Saturday, December 28, 2013 - 10:58 am

Hello,
I'd like to save the correlation matrix for a set of ordered categorical variables using the SAMPLE option. Just to be sure, will the saved matrix consist of polychoric or tetrachoric correlations? Many thanks.

Eric

Linda K. Muthen posted on Saturday, December 28, 2013 - 1:40 pm

If the variables are on the CATEGORICAL list, the correlations will be polychoric correlations.

Eric Deemer posted on Saturday, December 28, 2013 - 1:53 pm

Great, thanks Linda!

Eric

Lars Bocker posted on Tuesday, January 28, 2014 - 2:39 am

Dear Linda,

I understand the CATEGORICAL list is for dependent variables only and my independent dummy variables are read by Mplus as continuous. If I understand it correctly the correlation matrix then estimates polychoric correlations between the dependent variables, but not between the dependent and the independent variables. I am asking this because I am trying to understand differences in model outcomes between Mplus and LISREL, which we found out are based on a different correlation matrix in LISREL, in which we specified also the independent variables as categorical.

Do you know if it would be possible, or necessary, to also specify the measurement level of independent variables in mplus?

Linda K. Muthen posted on Tuesday, January 28, 2014 - 12:01 pm

You can treat the independent variables as dependent variablesin Mplua and put them on the CATEGORICAL list. In regression analysis, however, the model is estimated conditioned on the independent variables. Treating them as dependent variables is not advantageous. See the Muthen 1984 Psychometrika article where Case A is compared to Case B.

Nara Jang posted on Thursday, March 27, 2014 - 10:16 pm

Dear Dr. Muthen,

Would you tell me what type of correlation I need to conduct for the mixed variables such as binary, ordinal, rank, and continuous variables.

Thank you so much for your expert advice in advance!

Best regards,
Nara Jang

Linda K. Muthen posted on Friday, March 28, 2014 - 12:19 pm

A FAQ called Correlations with Categorical Variables will be posted on the website this afternoon.

Nara Jang posted on Saturday, March 29, 2014 - 11:58 am

Dear Dr. Muthen,

Thank you very much for your expert help!

Best regards,
Nara Jang

Miriam Kraatz posted on Tuesday, September 30, 2014 - 2:23 pm

Dear Dr. Muthen, above you mention that polychoric covariance matrices do not exist. However, I have found several references online that describe at least a method to estimate such, e.g., in Bollen & Curran (2005) on page 238 as a rescaled polychoric correlation matrix (using standard deviations/variances of the variables in their original form for the rescaling). Unfortunately, I do not have access to the entire book, but I wonder: If the polychoric covariance matrix is estimated that way, can any model be applied to that matrix without having to worry about scale invariance (Cudeck, 1989) etc.?

Sincerely,

Miriam Kraatz

Bengt O. Muthen posted on Tuesday, September 30, 2014 - 3:02 pm

The term polychoric refers to correlations, not covariances as there is typically no information on variances for categorical items. A special approach to multiple-group and longitudinal modeling with ordinal data has been described by Joreskog which uses the term polychoric covariance matrix and that term is picked up in the Bollen-Curran book. I have criticized this approach in my Mplus Web Note 4, see section 8. A better approach is available in Mplus using WLSMV and the default Delta parameterization. Alternatively, maximum-likelihood estimation can be used, bypassing the limited-information polychorics.

Miriam Kraatz posted on Wednesday, October 01, 2014 - 1:57 pm

Thank you for your fast response! I have read section 8 of note 4 you are referring to.
I am still having trouble recognizing the direct connection between estimation of a polychoric covariance matrix and the issues discussed in aforementioned note, and I apologize for asking you to bear with me. The main criticism of Joreskog's ideas seems to lie with assumed threshold invariance, and I do not understand the role of polychoric covariance estimation in that.

Aside from that, I agree that typically the calculation of means and standard deviations for categorical variables (measured on an ordinal level) is inappropriate. However, when calculating polychoric correlations, are we not estimating a property for an underlying variable with a number of assumed properties, including interval level measurement? And could we therefore assume that means and variances based on the values of the categorical variables are our best estimates for means and variances of the underlying continuous variables?

--- mk

Bengt O. Muthen posted on Wednesday, October 01, 2014 - 5:42 pm

Yes, the issue is the means and, in this case in particular, the variances of the underlying continuous-normal latent response variables. So in principle such variances are well-defined. It is just a matter of which assumptions you are willing to make in order to identify those variances. That's what my web note deals with.