Anonymous posted on Saturday, December 20, 2003 - 3:44 pm
Hello. In running a latent variable growth mixture model, I get an error message that includes the note that the solution returned "A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX". IS it true that I cannot trust the standard errors from such a solution? But can I interpret the parameter estimates? I guess I'm wondering how improper is this solution? Thanks much.
bmuthen posted on Saturday, December 20, 2003 - 4:04 pm
It is generally a very good check of non-identification. In some cases, however, the MLR SEs that come out - if they do - are trustworthy. For example, if a binary y is treated as continuous you get a singularity between the mean and the variance, but the MLR SEs may be ok. You can do a Monte Carlo simulation to check.
If a model is truly not identified, the parameter estimates should not be interpreted.
Anonymous posted on Wednesday, December 24, 2003 - 10:58 am
Thanks for the reply Dr. Muthen. What puzzles me is that there are other solutions that converge just fine. However, these other converged solutions had a worse loglikelihood than this solution with the warning. So should I discard this better solution because of the warning in favor of a solution with a worse loglikelihood? Or should I suspect that the model is in general unidentified despite the other proper solutions? Thanks so much.
bmuthen posted on Thursday, December 25, 2003 - 2:10 pm
When you say that you have "other solutions that converge just fine", do you mean for the same model using other starting values, or do you mean other model variations?
Anonymous posted on Saturday, December 27, 2003 - 9:20 pm
I mean that other solutions converge just fine for the same model using other starting values.
bmuthen posted on Sunday, December 28, 2003 - 3:16 pm
This may indicate that your data do not support a model with this many classes or this parametric structure. I would not choose a solution with worse likelihood, but change the model. Note also the part in my first response:
"if a binary y is treated as continuous you get a singularity between the mean and the variance, but the MLR SEs may be ok. "
bmuthen posted on Sunday, December 28, 2003 - 4:23 pm
Another reason you might get non-identification is that one class collapses, so that there is almost nobody in it (check the class count section in the output) - this is an indication to reduce the number of classes.
If you like, you can also send your input, output and data to Mplus support.
Hello, I have come across a similar problem as "Anonymous" in the exchange above. I am trying to estimate an LCA with c=6 for five indicators and three covariates. Both the latent categorical variable and the indicators are regressed on the covariates. With random starts, I get a solution that converges fine. However, using specific starting values, I can achieve a substantially different solution with a lower-loglikelihood - but with this solution, I get the same warning as "Anonymous" (NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX). The warning identifies a problem parameter: it's a threshold that's estimated to be 16.679. Moreover, the same warning gives a condition number (0.146D-12) - which I don't know how to interpret.
1. What is the condition number, and what does the value 0.146D-12 indicate?
2. Do my results indicate that a model with fewer than 6 classes might be appropriate? Or are there other changes to the model that I could introduce to make a six-class model viable?
I assume that you don't regress all indicators on the covariates - that is not an identified model.
You say you get a lower loglikelihood - but you want a higher one (max).
1. Thresholds that go large like that are harmless causes of the non-pos def message. With large thresholds, the information matrix estimate obtained by the first-order derivative approach can be numerically determined as singular. Degree of singularity is measured by the condition number, which is the ratio of the smallest to largest eigenvalue of the info matrix estimate. You don't want a very small condition number and 0.146D-12 is very small, very close to exactly zero in machine numerical precision terms.
2. Not necessarily. A large threshold of 16 simply indicates that for this class, this item has value 1 instead of 0 (assuming a binary item). Such items are useful for defining classes. It seems strange, however, that the automatic fixing of large thresholds does not come into play for your run. If you are using Version 4.0 and don't get SEs for this solution, please send your input, output, data, and license number to email@example.com.
I realize I have to understand some basics. I did try to regress all indicators on the covariates. Why is this model not identifiable? Given that my 5 indicators each have 3 or 4 categories, I thought I had up to 3*3*3*4*4 - 1 = 435 independent pieces of information. In my attempted 6-class model, I was estimating 107 parameters - for this reason, I assumed that it's identifiable. I'd be grateful for any hints about where my thinking is flawed, or for a reference to a good introduction to the issue of identifiability in latent class models. I'm interested in understanding this thoroughly. Moreover, my immediate practical concerns are these: Can I achieve identifiability by introducing restrictions, or by reducing the number of classes?
Having more pieces of information than parameters is only a necessary, not sufficient condition for identification. Your model regresses the latent class variable on the covariates. In addition to that you try to get "direct effects" by regressing each latent class indicator on the covariates. That is not identified. Think of the information that contributes to those estimates - it is the regression of each indicator on all covariates. Say that you have p indicators and q covariates giving p*q slopes. These slopes can't be divided up into both p*q direct effects and regression slopes for the latent class variable on the covariates. You can have some direct effects, but not all.
No, the condition number that Mplus prints is the ratio of the smallest to the largest eigenvalue of the estimated information matrix. The smaller it is, the closer the matrix is to being singular, that is, the closer the model is to not being identified.
The singularity of the sample statistics covariance matrix is evaluated separately.
Ben Chapman posted on Friday, August 31, 2007 - 3:12 pm
I am interested in the extent to which outliers in a latent profile model introduce a non-positive first order derivative product matrix at large numbers of classes.
I don't have McLaghlan & Peel in front of me but I believe they offer some cautions about outliers in mixtures of normal distributions.
What happens is that the outliers tend to form one or more of their own tiny classes (the "collapsing classes" mentioned above). The parameters where the SEs are not trustworthy are in these outlier classes.
So I am assuming the small classes produced by the outlier are introducing non-identification.
My intuition is to say the model can't estimate, say 5 class-specific means, 5 variances, and a class proportion from only say 2 observations--but I am not sure this is technically correct, because isn't other information is the sample used to some degree to estimate these parameters?
I don't plan on retaining this model and it is easily estimable without outliers, i am just curious about the possible effect of outliers on normal mixtures.
Small classes can produce non-identification if the number of parameters specific to such a class exceeds the number of people in the class. This would produce the first-derivative-product-based non-identification message. Class-specific parameters draw only on the information from the people in that class. So you can have a mean parameter for 1 outlying person, but not also a variance parameter specific to this person.
I got a message about "A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX," while it says, "THE MODEL ESTIMATION TERMINATED NORMALLY." Should I not trust the standard errors of the model parameters estimates even though I got converged solution?
I ran a model testing a potential interaction among latent variables using TYPE=RANDOM, ALGORITHM=INTEGRATION, and the XWITH approach for creating latent variable interactions. I obtained the error message "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX." The model estimation terminated normally though, and I am modeling some dichotomous variables. Can I trust these standard errors given that (based on your replies above) the dichotomous variable issue sometimes causes this error message to arise?
Thanks in advance for your time and consideration.
I have a model with 3 latent and 1 observed DV and 4 covariates (2 binary, 2 continuous).
When I run the model, with either ML or MLR estimation, the model converges but I receive a message that "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX.
The message indicates that the problem is with 1 continuous covariate and 1 binary covariate (when I remove these from the model I no longer receive the message).
I can find no reason for non-identification, the results are quite sensical, and when I ran the model in another SEM program to check I didn't receive any error message.
I would like to be sure that I can indeed trust the SEs. Any guidance would be appreciated.
If you have brought the binary covariate into the model by mentioning its mean, variance, or covariance with another variable, you will get that message because the mean and variance of a binary variable are not orthogonal. The message can be ignored if this is the reason for it.
Hi I ran a multi-level multi-indicator latent growth model MODEL: %within% f1w BY CELF1C*(1)PLN1C(2)PP1C PWPA1W(4); f2w BY CELF3C(1)PLN3C(2)PP3C(3)PWPA3W(4); f3w BY CELF4C*(1)PLN4C(2)PP4C(3)PWPA4W(4); iw sw | f1w@0f2w@1f3w@2; %between% f1b BY CELF1C*(5)PLN1C(6)PP1C(7)PWPA1W(8); f2b BY CELF3C*(5)PLN3C(6)PP3C(7)PWPA3W(8); f3b BY CELF4C*(5)PLN4C(6)PP4C(7)PWPA4W(8); ib sb | f1b@0f2b@1f3b@2;
I got two warnings: 1) A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. 2) THE LATENT VARIABLE COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. When comparing mine with Example 9.15 in User's guide. I noticed 1) you set the cross-level loading equal.2) WLSM. 3) doesn’t use fixed factor method of scaling as I did. 4) the between-level intercept growth factor is set zero; residual variances of the factors are held equal over time . Could any of these differences be the reasons of warnings?
need to be separated by a semicolon or be on separate lines.
A growth model with multiple indicators needs to have intercept invariance, not only loading invariance. See UG pages 687-692 for how to parameterize your model.
Sophie Dan posted on Saturday, April 22, 2017 - 9:44 pm
Hallo! I run a twolevel CFA, and get a warning like this "MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -28091.078
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.314D-16. PROBLEM INVOLVING PARAMETER 45.
THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS.
THE MODEL ESTIMATION TERMINATED NORMALLY " But I also get all the needed result, is the warning make sense, or I can ignore it?
Another problem is that if I do twolevel EFA and get a negative residual, how to make it positive?
Thank you! Any help from you will be greatly appreciated!
Use TECH1 to check that you don't have more between-level parameters than the number of clusters.
EFA with negative residual variances often suggests that too many factors have been extracted.
Sophie Dan posted on Wednesday, April 26, 2017 - 1:23 am
Thanks very much for your reply!
To say "between level parameters", do you mean the m*(m+1)/2 numbers of parameters? I f the number of variables I used at the within level is the same as at the between level, so the number of estimated parameters at the within and between level should be the same?
And in terms of EFA with negative variance, although it indicates I should extract less factors, if it can meet the requirement of theory and the factor correlation does not reach the critical value which suggests they should be combined into one factor, can I just ignore the residual variance, or it is inadmissible?