Message/Author 

Sanjoy posted on Thursday, April 28, 2005  3:41 pm



Dear professor/s ... can we measure polychoric correlation in MPlus ... what should be my "analysis" and "output" command ... I couldn't find any in the MPlus CD I have six fivescaled categorical variables thanks and regards 

Sanjoy posted on Thursday, April 28, 2005  3:45 pm



Oh! in connection to my earlier post ...I forgot to mention onething ... none of them are covariates, all six are indicator outcome variables .. and they are categorical in nature, I suppose in order to check the association among these six , we need to find the polychoric correlation ... thanks 


If you put your outcomes on the CATEGORICAL list in the VARIABLE command and ask for TYPE = BASIC; in the ANALYSIS command without the MODEL command, you will get polychoric correlations. 

Sanjoy posted on Friday, April 29, 2005  5:06 pm



Thank you madam, it worked nicely ... Two very quick questions before the weekend starts Q1. Kindly tell me whether my codes are correct ... below is what I want to do I have latent factor "R" being loaded onto "R7  R9" I have latent factor "B" being loaded onto "B6  B8" I WANT to check another indicator named "R1" is related with the latent factor "R", similarly for "B1" with "B" below is my code DATA: FILE IS d:\mpluspaper1.txt; VARIABLE: NAMES ARE X1X19 Y1Y4 XB1XB6 XP1XP9 R1R9 B1B11 T1T4; USEVARIABLES ARE R1 R7R9 B2 B6B8; CATEGORICAL ARE R1 R7R9 B2 B6B8; ANALYSIS: PARAMETERIZATION=THETA; ESTIMATOR=WLSMV; MODEL: R BY R7R9; B BY B6B8; R1 WITH R; B2 WITH B; output is ok, In fact according to my expectation ... I just want to make sure I have done the correct thing * though I have NOT asked for the correlation between "R" and "B" ... MPlus reports that also in the output ... WHY? Q2. I understand the maths behind Factor analysis but how does MPlus measure the correlation between the our latent "R" and other categorical "R1" Does it use the "estimated value of factor”? Or something else 


It seems right. Some parameters are free as the default. You can read about defaults in the Mplus User's Guide. If you want this parameter to be fixed to zero, say r WITH b@0; The correlation between r and r1 is a biserial correlation. It is estimated from the sample statistics of the observed variables. You can think of the correlation between r and r1 as the correlation between the factor scores for r and the scores for r1 but factor scores are not actually computed in order to estimate the correlation between r and r1. 

Sanjoy posted on Friday, April 29, 2005  8:22 pm



Thanks ... well madam, a mild confusion remains 1.are we then calculating everything simultaneously in MPlus ... I mean the Factor analysis term is ok (in regular textbook jargon; y = delta*eta + epsilon along with threshold adjustment since Ri's and Bi's are all categorical) R BY R7R9; B BY B6B8; R1 WITH R; B2 WITH B; Now vector “eta” is 2*1, one of them is "R" and other one is "B", right ... R and B, our two continuous latent variable... The next two lines (WITH stuff) requires two calculation of "polyserial" correlation, one for R1(5scale categorical) and R, and the other one for B1(5scale categorical) and B So how is MPLUS measuring (asking for the program logistics behind) ... is it a twostep or some kind of Full information technique Thanks and regards 

Sanjoy posted on Friday, April 29, 2005  9:38 pm



Madam, In connection to my previous post ... kindly check my output and tell me please why we are having two DIFFERENT correlation matrix Model 1: we are running everything simultaneously TITLE: polychoric test DATA: FILE IS d:\mpluspaper1.txt; VARIABLE: NAMES ARE X1X19 Y1Y4 XB1XB6 XP1XP9 R1R9 B1B11 T1T4; USEVARIABLES ARE R1 R7R9 B2 B6B8; CATEGORICAL ARE R1 R7R9 B2 B6B8; ANALYSIS: PARAMETERIZATION=THETA; ESTIMATOR=WLSMV; MODEL: R BY R7R9; B BY B6B8; R1 WITH R; B2 WITH B; SAVEDATA: SAVE=FSCORES; file is d:\polychoric.txt; MODEL RESULTS Estimates S.E. Est./S.E. R BY R7 1.000 0.000 0.000 R8 0.924 0.220 4.205 R9 0.855 0.195 4.386 B BY B6 1.000 0.000 0.000 B7 0.890 0.221 4.037 B8 0.960 0.227 4.233 R1 WITH R 0.369 0.074 4.972 B2 WITH B 0.421 0.084 5.035 B WITH R 0.023 0.055 0.416 Variances R 0.605 0.179 3.386 B 0.579 0.195 2.965 Model 2: using the data set "d:\polychoric.txt" which has factor score saved from model 1 ... here we are calculating only polyserial correlation between R1 and R and B2 and B (using TYPE=BASIC) LOOK AT THE OUTPUT ... each value is different, correlation as well as the variance of R and B TITLE: Polyserial test between factor scores and R1 and B2 DATA: FILE IS d:\polychoric.txt; VARIABLE: NAMES ARE R1 R7R9 B2 B6B8 R B; USEVARIABLE ARE R1 B2 R B; CATEGORICAL ARE R1 B2; ANALYSIS: PARAMETERIZATION=THETA; TYPE=BASIC; MODEL RESULTS CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL) R1 B2 R B R1 B2 0.191 R 0.353 0.006 0.332 B 0.224 0.405 0.040 0.316 Thanks and regards 

bmuthen posted on Saturday, April 30, 2005  8:52 am



The WLSMV estimator first computes a sample correlation matrix (tetrachoric, polychoric) and then fits the model to that, thereby estimating the model parameters. So the fitting of the model is similar to what is done if the outcomes had been continuous. No factor score estimation is involved in this, but the parameters are estimated directly. If you instead estimate factor scores and then fit a model to a covariance matrix involving those estimated scores, you will get biased results. These biases are wellknown in psychometrics and are due to the fact that estimated factor scores do not have the same variances or covariances with other variables as the true factors. See literature on factor score estimation in Psychometrika. 

Sanjoy posted on Saturday, April 30, 2005  4:41 pm



Thank you Professor ... I think I got your words, at least partially 1. In our model1 we keep the idea of checking correlationship between R1 and R (which is the common factor to R7R9), however we are not calculating the factor score ... hence we are circumventing the problems associated with factor score calculation like Thurstone validity maximization at the cost of unorthogonality or Anderson's process which ensures us orthogonality but lacks determinacy and so on 2. In model 2, instead of R, we are using "estimated R", which it self incorporates some measurement error and hence we end up having some bias while calculating correlation between R and R1 at the second step ... am I right! I never have Psychometrics, my major was Statistics and Economics, so my acquaintance with psychometric literature is very minimal ... could you please refer one seminal article like urs one (1984, 1983) so that I will be able to understand the basic nuances and the solution of the factor score calculation problem ... I'm relatively comfortable with mathematical rigor Thanks and regards 

bmuthen posted on Sunday, May 01, 2005  5:05 pm



Sounds like you got that right. As for factor score literature, search for Skrondal's Psychometrika article in the last 5 years. 

Sanjoy posted on Sunday, May 01, 2005  6:02 pm



Thank you professor ... I will look for his articles ... regards 


If I specify all indicator variables as ordinal, does MPlus calculate (and perform all subsequent calculations on) polychoric correlation matrices by default? 


Yes, for an unconditional model using weighted least squares regression. For a conditional model, the sample statistics used for model estimation are the thresholds, probit regression coefficients, and residual polychoric correlations. 


I am trying to estimate polychoric asympt cov matrix in text format in mplus 5.21 and wondering wether the following syntax is appropriate. Thx in advance TITLE: This is the Mplus syntax to extract polychoric asympt cov matrix in text format DATA: FILE IS c:\tetrad\file.txt; VARIABLE: NAMES ARE q83 q84 q85 q88 q89 q90 q91; CATEGORICAL q83 q84 q85 q88 q89 q90 q91; ANALYSIS: TYPE = GEN; ESTIMATOR = WLS; MODEL: q83q90 WITH q91; q83q89 WITH q90; q83q88 WITH q89; q83q85 WITH q88; q83q84 WITH q85; q83 WITH q84; SAVEDATA: tech3 is Jason22.acm; OUTPUT: SAMPSTAT; 


That should do it. 


Hi, I was wondering is there is an adequate procedure to obtain the polychoric correlation between two variables with underlying nonnormal discributions, that have in addition been censored in the middle (so that only extremes are used), and dichotomized? Thanks a lot, Sanja 


I am not aware of such a procedure. 

Cecily Na posted on Tuesday, December 14, 2010  9:41 pm



Dear Linda, I did an SEM with MLSMV. I suppose the correlation in the output before the model estimation is the polychoric matrix of the variables? Why on the diagnol, the correlation is not 1, but very close to 1? I am copying from the output the diagnal of the correlation matrix, all with non1 values. 0.851 0.993 0.998 0.994 0.747 0.744 0.985 Thank you very much! 

Cecily Na posted on Tuesday, December 14, 2010  11:47 pm



Dear Linda, A followup of my previous post. I think I mistook the covariance coverage of data for correlation matrix. So there shouldn't be any confusion regarding it. I would like to know what the covariance coverage of data in the output is. Thank you very much for your time. 


It tells you the percentage of observations with no missing data for that value. 

Cecily Na posted on Wednesday, December 15, 2010  8:34 am



Dear Linda, Thanks! When I use WLSMV, the correlation matrix generated in the output before the model estimates should be the polychorical correlations of the observed variables, right? Thanks! 


If you ask for SAMPSTAT and put the ordered polytomous variables on the CATEGORICAL list, the correlation matrix for those variables are polychoric correlations. 

Cecily Na posted on Saturday, February 05, 2011  11:11 am



Dear Linda, I used WLS to generate polychoric covariance matrix. Why couldn't I get the covariance matrix, but only the correlation matrix? What is the command I can use? Thanks a lot! 


There is no such thing. 


HiDr.Muthen I have one simple question on obtaining tetrachoric/biserial correlations. I have tried ˇ°type=basicˇ± command, as you pointed out. In addition, I also tried the "modelˇ± command along with ˇ°samstat.ˇ±The correlation matrixes are somewhat different, and I was wondering why this occurs? Thank you. 


Please send the two outputs and your license number to support@statmodel.com. 

Eric Deemer posted on Saturday, December 28, 2013  10:58 am



Hello, I'd like to save the correlation matrix for a set of ordered categorical variables using the SAMPLE option. Just to be sure, will the saved matrix consist of polychoric or tetrachoric correlations? Many thanks. Eric 


If the variables are on the CATEGORICAL list, the correlations will be polychoric correlations. 

Eric Deemer posted on Saturday, December 28, 2013  1:53 pm



Great, thanks Linda! Eric 

Lars Bocker posted on Tuesday, January 28, 2014  2:39 am



Dear Linda, I understand the CATEGORICAL list is for dependent variables only and my independent dummy variables are read by Mplus as continuous. If I understand it correctly the correlation matrix then estimates polychoric correlations between the dependent variables, but not between the dependent and the independent variables. I am asking this because I am trying to understand differences in model outcomes between Mplus and LISREL, which we found out are based on a different correlation matrix in LISREL, in which we specified also the independent variables as categorical. Do you know if it would be possible, or necessary, to also specify the measurement level of independent variables in mplus? 


You can treat the independent variables as dependent variablesin Mplua and put them on the CATEGORICAL list. In regression analysis, however, the model is estimated conditioned on the independent variables. Treating them as dependent variables is not advantageous. See the Muthen 1984 Psychometrika article where Case A is compared to Case B. 

Nara Jang posted on Thursday, March 27, 2014  10:16 pm



Dear Dr. Muthen, Would you tell me what type of correlation I need to conduct for the mixed variables such as binary, ordinal, rank, and continuous variables. Thank you so much for your expert advice in advance! Best regards, Nara Jang 


A FAQ called Correlations with Categorical Variables will be posted on the website this afternoon. 

Nara Jang posted on Saturday, March 29, 2014  11:58 am



Dear Dr. Muthen, Thank you very much for your expert help! Best regards, Nara Jang 


Dear Dr. Muthen, above you mention that polychoric covariance matrices do not exist. However, I have found several references online that describe at least a method to estimate such, e.g., in Bollen & Curran (2005) on page 238 as a rescaled polychoric correlation matrix (using standard deviations/variances of the variables in their original form for the rescaling). Unfortunately, I do not have access to the entire book, but I wonder: If the polychoric covariance matrix is estimated that way, can any model be applied to that matrix without having to worry about scale invariance (Cudeck, 1989) etc.? Sincerely, Miriam Kraatz 


The term polychoric refers to correlations, not covariances as there is typically no information on variances for categorical items. A special approach to multiplegroup and longitudinal modeling with ordinal data has been described by Joreskog which uses the term polychoric covariance matrix and that term is picked up in the BollenCurran book. I have criticized this approach in my Mplus Web Note 4, see section 8. A better approach is available in Mplus using WLSMV and the default Delta parameterization. Alternatively, maximumlikelihood estimation can be used, bypassing the limitedinformation polychorics. 


Thank you for your fast response! I have read section 8 of note 4 you are referring to. I am still having trouble recognizing the direct connection between estimation of a polychoric covariance matrix and the issues discussed in aforementioned note, and I apologize for asking you to bear with me. The main criticism of Joreskog's ideas seems to lie with assumed threshold invariance, and I do not understand the role of polychoric covariance estimation in that. Aside from that, I agree that typically the calculation of means and standard deviations for categorical variables (measured on an ordinal level) is inappropriate. However, when calculating polychoric correlations, are we not estimating a property for an underlying variable with a number of assumed properties, including interval level measurement? And could we therefore assume that means and variances based on the values of the categorical variables are our best estimates for means and variances of the underlying continuous variables?  mk 


Yes, the issue is the means and, in this case in particular, the variances of the underlying continuousnormal latent response variables. So in principle such variances are welldefined. It is just a matter of which assumptions you are willing to make in order to identify those variances. That's what my web note deals with. 

Back to top 