Correlation Matrix in Mplus and other... PreviousNext
Mplus Discussion > Exploratory Factor Analysis >
 Dorothee Durpoix posted on Wednesday, January 28, 2009 - 2:54 pm

I run Exploratory Factor Analysis in Mplus using ML estimator. My dataset is composed of ordinal variables and has missing values (n total = 806 cases). Missing values were specified in Mplus as: MISSING IS ALL(999);
and variables were treated as continuous in the ML analysis.
However, the correlation matrix obtained in Mplus output is slightly different from the Pearson correlation computed in SPSS and in R (the latter two are identical by the way).
Would you have any idea why the Mplus correlation matrix is different from the two others?
Thanks heaps in advance.
 Linda K. Muthen posted on Wednesday, January 28, 2009 - 2:56 pm
I would imagine that the sample size may be different between the programs and also perhaps the type of estimation. The default in Mplus since Version 5 is to use TYPE=MISSING which uses all available information. I think SPSS would do a listwise deletion or pairwise present analysis.
 Dorothee Durpoix posted on Wednesday, January 28, 2009 - 3:07 pm
Thanks a lot for your prompt answer.
I did specify pairwise deletion in SPSS and R. It seems that the two softwares perform the pairwise deletion because the number of observations are different for each pair of variables. I really don't see where the differences come from...
 Linda K. Muthen posted on Wednesday, January 28, 2009 - 3:09 pm
We don't use pairwise deletion. We have the same sample size for each estimated correlation. So different data are being analyzed. This is why you see differences in the correlations.
 Dorothee Durpoix posted on Wednesday, January 28, 2009 - 3:30 pm
Sorry, I thought when you said "uses all available information" you meant pairwise deletion....
Thanks a lot for your help!
 Renee McDonald posted on Wednesday, February 23, 2011 - 8:54 am
I am trying to find a way to carry out EFA with an Euclidean distance matrix (instead if a correlation or covariance). I am coming up with nothing. Can anyone help me with this?
 Linda K. Muthen posted on Wednesday, February 23, 2011 - 12:24 pm
Mplus does not read an Euclidean distance matrix per se. You could use TYPE=CORRELATION in the DATA command to read the matrix. You would be on your own as far as interpreting the results however. I can't comment on other programs.
 Jan Zirk posted on Tuesday, July 24, 2012 - 6:19 pm
Dear Linda or Bengt,
Is it possible to automatically obtain in MPlus the correlation matrix with Bayesian estimation (non-inf. priors)? Also, how can I obtain the significance level for the ML correlations?
 Bengt O. Muthen posted on Tuesday, July 24, 2012 - 7:04 pm
If you mean the sample correlation matrix, you don't get that automatically in a Bayes run, but you can get it by running ML Type=Basic.
 Jan Zirk posted on Wednesday, July 25, 2012 - 10:33 am
Yes, I see. Thank you. I see the coefficients but can I also obtain their significance levels?
 Linda K. Muthen posted on Thursday, July 26, 2012 - 11:43 am
We don't provide standard errors for the correlations.
 Jan Zirk posted on Thursday, July 26, 2012 - 2:48 pm
I see, thank you and best wishes.

 Guanyi Lu posted on Wednesday, November 07, 2012 - 4:55 pm
Hi Linda,

Can we get the correlation matrix from a measurement model?

I have a few latent factors predicting a few sets of objective performance measures. Theoretically it does not make sense to put all objective performance measures in one model. I test my hypotheses using different structural models with each set of objective performance measures as DVs respectively. Now I want to get a correlation matrix including all objective measures and the latent variables.

Tech4 only gives me the correlation matrix of latent variables.

 Linda K. Muthen posted on Thursday, November 08, 2012 - 9:35 am
The only way you can do that is to put a latent variable behind each observed variable, for example,

f1 BY gender@1;
f2 BY y;

and use the latent variables in the MODEL command, for example,

f2 ON f1;
 Marissa Allen posted on Friday, April 19, 2013 - 2:12 pm

My correlation matrix in Mplus is different from the one computed in SAS. In mplus I specified in the data command LISTWISE = ON and selected listwise deletion in SAS to ensure that the correlation analyses were being conducted on the same sample in both programs. However, this didn't help with the problem. Any suggestions as to what I can do in Mplus to fix this?

 Linda K. Muthen posted on Friday, April 19, 2013 - 4:11 pm
If the data are the same, the correlations will be the same. Be sure you have the same sample size in both. If you can't see the difference, send both outputs and your license number to
 Nara Jang posted on Friday, September 26, 2014 - 11:29 pm
Dear Drs. Muthen,

What is the command to get the p-values of correlation coefficients in Mplus?

Thank you so much!
 Linda K. Muthen posted on Saturday, September 27, 2014 - 9:18 am
For ML, MLR, and MLF, use the H1SE option along with TYPE=BASIC. For WLSMV, they are given automatically with TYPE=BASIC.
 Nara Jang posted on Saturday, September 27, 2014 - 10:11 pm
Thank you so much, Dr. Muthen!
 Tracy Witte posted on Friday, January 30, 2015 - 6:49 am
I am working on a manuscript and would like to include the MLR correlations I get from Mplus. When I use the H1SE option with TYPE=BASIC, I get standard errors for the covariance, but not the correlation matrix. Would it be inappropriate to use the statistical significance of the covariances (i.e., covariance divided by standard error) to indicate whether the correlations are statistically significant? If so, what options are there for determining the statistical significance of correlation matrices in Mplus?
 Linda K. Muthen posted on Friday, January 30, 2015 - 7:07 am
We don't give these. The standard errors for the covariances will not be the same for the correlations. You can standardize the variables and use the covariances which will be the correlations and you will get standard errors for those.
 Tracy Witte posted on Monday, February 02, 2015 - 7:04 am
Thanks! One thing I've tried to do is model the correlations between the observed variables using syntax similar to this:

v1 v2 v3 v4;
[v1 v2 v3 v4];
v1 with v2 v3 v4;
v2 with v3 v4;
v3 with v4;

However, when I do this, I get an error message that says

Parameter 99, EDEQ18 WITH BMIADMIT


My sample size is 99, and I'm trying to model bivariate correlations between 22 different variables. Would I be better off standardizing the variables and using the approach you suggested above? Are there any downsides to standardizing the variables in this way? (and does it make a difference that I'm using MLR and have some variables with notable skew). Or, is this error message ignorable?

Thank you very much for your time!
 Bengt O. Muthen posted on Monday, February 02, 2015 - 10:39 am
Why don't you explore this by working with only 2 variables, doing it the way you show here and pre-standardizing. Compare the results of those 2 approaches to each other and to the approach you show for all 22 variables. Make sure you use the same sample size in all 3 cases so the runs are comparable.
 Tracy Witte posted on Tuesday, February 03, 2015 - 8:17 am
Thank you so much for the suggestion! So far, I've run 20 models, each with just two variables. All of the estimates have been identical to the full model that contains all 22 variables, with the exception of one that was .004 different.

Do you think that I should run all 210 pairwise correlations just to be sure, or is this enough to be confident that the warning message can be ignored for the full model?

Thanks again for your input!
 Bengt O. Muthen posted on Tuesday, February 03, 2015 - 8:19 am
That's enough of an exploration.
 Simone Schmidt posted on Monday, May 11, 2015 - 3:51 am
I wonder why the correlations given in the sample statics differ from the correlations given in the results section when using the "with" command. Is this due to the fact that when using "with" Pearson correlation is requested independent of the metric of the scale?
Thank you!
 Linda K. Muthen posted on Monday, May 11, 2015 - 6:50 am
I would need to see the output to answer that. Please send it and your license number to
 Rachael Gribble posted on Wednesday, May 20, 2015 - 5:33 am

My question is related to differing results for correlation matrices.

I have run correlations based on mental health measure scores for couples in both Stata and in MPlus. However, I get vastly different results for these.

I have 340 couples with no missing data. I have checked the data file and format command etc in case I have made a mistake somewhere but cannot work out why these would be so so different. I am using ML estimation in MPlus.

Any advice welcome please!
 Linda K. Muthen posted on Wednesday, May 20, 2015 - 6:13 am
Please send the Mplus and Stata outputs and your license number to
 Jamin Day posted on Sunday, August 09, 2015 - 11:29 pm
I have a dataset containing missing values and I'm curious as to how Mplus calculates the correlation/covariance matrices that are reported at the top of the output when running EFA.

From what I can determine these seem to be *estimated* matrices, calculated using missing data theory, hence N is equivalent for all variables (is this correct?). Most other statistical packages I've tried calculate correlation matrices using listwise or pairwise deletion methods. I'm able to replicate the Mplus correlation matrix in R by using LISTWISE = ON under the DATA command, but I'm not able to replicate the correlation matrix otherwise so it would seem Mplus approaches this differently.

Are you able to shed any light on what's happening 'behind the scenes' here?

Many thanks
 Linda K. Muthen posted on Monday, August 10, 2015 - 6:34 am
The default is Mplus is to use all available information using missing data theory.
 Jamin Day posted on Monday, August 10, 2015 - 5:54 pm
Thanks Linda.

Would it be correct to say that Mplus generates an estimated correlation matrix using FIML, whereas other software (e.g. SPSS) uses an observed correlation matrix with deletion methods?

Just trying to clarify why I'm getting different matrices.
 Bengt O. Muthen posted on Monday, August 10, 2015 - 6:41 pm
Not sure how SPSS does it. Experiment by starting with one pair of variables and check the sample size.
 Kelly M Allred posted on Monday, December 07, 2015 - 8:38 am
I am running a latent growth curve analysis using MLR in a sample of 130 subjects. I found that the correlation between the intercept and slope variables of interest is . 27, p = .18. Normally, a Pearson's correlation of this size (.27) would be highly significant in a sample of 130 subjects, but here it is not significant.My questions are: Are these correlations in the output strict Pearson correlations? Also, how is the p-value for these correlations calculated? Could missing data estimation using MLR affect the p-value?
 Linda K. Muthen posted on Monday, December 07, 2015 - 10:41 am
I think you are looking at a covariance not a correlation. The p-value is taken from a z-table.
 Kelly M Allred posted on Thursday, December 17, 2015 - 9:59 am
Thank you, Dr. Muthen. I have a follow up question.

In the latent growth curve analysis, I am looking at the relationship between a latent intercept slope with the following statement:

Alpha_C WITH Beta2_P

Here I get an estimate of .453, p =.026

How should I refer to such a parameter estimate when I report it in a journal article?
 Linda K. Muthen posted on Thursday, December 17, 2015 - 5:16 pm
It is either a covariance or a residual covariance depending on whether the variables are exogenous or endogenous.
 Hue-Ryen Jang posted on Saturday, April 08, 2017 - 5:39 am

Mplus demonstrated the correlation matrix for the latent variables by TECH 4

But, the correlation matrix has unreasonable scores ( 0.000) in among some variables with gender & grade.

Factor correlation matrix demonstrated reasonable scores with gender & grade like 0.104, 0.096 etc..

I think logically it is impossible, I cannot understand.

Is the real scores (i.e., 0.00) in the correlation matrix for the latent variables?
 Linda K. Muthen posted on Saturday, April 08, 2017 - 11:30 am
Please send the output and your license number to
 Georg Kessler posted on Friday, August 11, 2017 - 1:25 am

Is it possible to specify the order of variables in the observed covariance matrix?

The default seems to be that variables are processed according to their position in the Model command.

In case of autoregressive models the second measurement is mentioned before the third and the third before the first.
(x2 on x1; x3 on x2; x3 on x1;)
It would be nice to observe the increase/decrease in the correlations due to the underlying change process with a natural progression of measurements without referring another statistical software.

 Linda K. Muthen posted on Friday, August 11, 2017 - 6:40 am
When you say observed covariance matrix, I assume your mean the correlation matrix from SAMPSTAT or TYPE=BASIC. I believe the order the the variables is taken from the USEVARIABLES list.
 Georg Kessler posted on Saturday, August 12, 2017 - 4:18 am
Hi Linda,

Yes, you are right. I thought I had had that checked. The USEVARIABLE option creates the order in the correlation matrix of the SAMPSTAT option.
 Paraskevas Petrou posted on Monday, October 16, 2017 - 2:39 am

In my 2-level analysis, I get in the beginning of the output the correlation matrix for all my study variables.

Two questions:
1) Where can I find the p values for all correlations?
2) Do I understand correctly that there is no way I can get correlations between within-level variables and between-level variables in my within-level correlation matrix?

Thank you in advance!

 Bengt O. Muthen posted on Monday, October 16, 2017 - 10:33 am
1) You would have to specify a model with

y1-yp WITH y1-yp;

where p is the number of variables.

2) You can get correlations between a between-level variable and the between-level part of a variable measured on within.
 Holly Levin-Aspenson posted on Tuesday, March 19, 2019 - 7:22 am
I'm working on a series of ESEM models using a dataset that includes dichotomous, polytomous, and continuous variables (N = 8,405). The tetrachoric, polychoric, and Pearson correlations I get from Mplus are generally similar those I get from R (mixedcor, psych package), but the biserial and polyserial correlations are much smaller. For example, a polyserial correlation of .77 in R is .59 in Mplus (no missing data for polychoric variable, 0.54% data missing for continuous variable). As another example, a different polyserial pair is correlated .51 in R and .27 in Mplus (no missing data for either variable).

What concerns me is that only the polyserial/biserial correlations deviate substantially across programs, which could distort the parameter estimates in my models.

Because I'm working with psychopathology data, the distributions of these variables are highly skewed. However, to my knowledge, there aren't substantial differences in skewedness or missing data across measurement types that would explain why I'm getting these results. Do you have any ideas?


 Bengt O. Muthen posted on Tuesday, March 19, 2019 - 1:50 pm
One question is if the estimation in R and in Mplus (1) uses ML or not and (2) takes place for each pair of variables separately or all variables together. You can send the 2 outputs to Support along with your license number.
 Holly Levin-Aspenson posted on Friday, April 05, 2019 - 11:55 am
Thanks, Dr. Muthén! The Mplus estimation uses WLSMV, whereas R psych uses 2-stage ML. I'm guessing that accounts for some of the discrepancies. I ended up just sticking with the Mplus correlations.
 Bengt O. Muthen posted on Saturday, April 06, 2019 - 12:48 pm
The sample correlations (polychoric, polyserial, tetrachoric, biserial) used in WLSMV are computed using 2-stage ML for each pair of variables (first thresholds, then correlations). WLSMV is used to fit the model parameters to those sample correlations. So there are two kinds of estimation going on.
 Holly Levin-Aspenson posted on Monday, April 08, 2019 - 9:59 am
Dr. Muthén,

Thank you—that's very helpful!
 Bharath Shashanka Katkam posted on Saturday, July 20, 2019 - 7:27 am
Hello Mplus team,

In my research model, there is a need to use the Correlation Matrix instead of Covariance Matrix. I am going to use Two-level SEM.

1Q) So, would there be any change in the Model fit Indices like Chi-square model fit, CFI, RMSEA, AIC etc., by using the Correlation Matrix instead of the Covariance Matrix?

2Q) In the web link, "", it is said that, "WLS" is the only estimator, that is suitable to the Correlation Matrix.
Is that true? Or could we use other estimators as well?
 Bengt O. Muthen posted on Saturday, July 20, 2019 - 1:16 pm
1) You cannot do multilevel analysis using a correlation matrix. You want to know about variances and covariances - these are the quantities that you are dividing up into within and between parts of the model. See e.g. the article on our website:

Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398.
download paper contact author

2) Yes, WLS specifies the correct weight matrix for a correlation matrix but see answer 1).
 Bharath Shashanka Katkam posted on Sunday, July 21, 2019 - 5:01 am
Thank you Sir. In my research Model, there are three Constructs.
a) One Construct is measured on a Non-Likert scale by using decimal values
b) Other two constructs are measured on a Likert scale.

1Q) In this context, can we use Covariance Matrix for the three Constructs because the two constructs have a Likert Scale & One Construct is having a Decimal Scale?
2Q) If I cannot use the Covariance Matrix, what is the remedy?
 Bengt O. Muthen posted on Sunday, July 21, 2019 - 4:47 pm
Treating a Liker scale variable as continuous may be ok if you don't have strong floor or ceiling effects. If you don't have that problem and treat the variable as continuous, working with a covariance matrix is ok.
 Bharath Shashanka Katkam posted on Sunday, July 21, 2019 - 10:15 pm
Do you mean to say that, we can use the Covariance Matrix of the three Constructs which has three different Likert scales?
 Bengt O. Muthen posted on Tuesday, July 23, 2019 - 5:56 pm
 Bharath Shashanka Katkam posted on Tuesday, July 23, 2019 - 11:54 pm
Thank you, Sir.
But, assume that, One Construct is in 11 point Likert scale & Other Construct is in 5 point Likert scale.
Wouldn't the Structural Equation Modeling accommodating both the constructs of different Likert scales be erroneous?
 Bengt O. Muthen posted on Wednesday, July 24, 2019 - 5:35 pm
This general analysis question is suitable for SEMNET.
 Bharath Shashanka Katkam posted on Thursday, July 25, 2019 - 9:38 am
Okay Sir. Thank you.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message