Message/Author 


A question came up about references discussing identification in SEM and I thought I'd post my answer here. No really comprehensive and thorough references come to mind, perhaps because the topic is difficult to say something general about. In practice people are satisfied of empirical identification by the information matrix being nonsingular. But the following classic references have bits and pieces of the puzzle:  Wiley, D (1973). The identification problem for structural equation models... In Goldberger & Duncan (eds) Structural Equation Modeling in the Social Sciences (Seminar Press, NY) 6973. This talks about the column rank of the derivative matrix for covariance elements by parameters.  Joreskog, KG (1977). Structural equation models in the social sciences. In Krishnaiah (ed) Applications of Statistics. Amsterdam: Holland Publishing, 265287. Some general statements mostly  Joreskog, KG (1979). Author's Addendum (to Confirmatory Factor Analysis, the 1969 Psychometrika article). In Magidson (ed), Advances in Factor Analysis and Structural Equation Models. Cambridge, Mass: Abt Books. This has a little more detail than is typical. More recently,  Bartholomew's book Latent Variable Models and Factor Analysis (Griffin, 2nd ed.) has some more general identifiability sections. I invite others to add more good references. 

Anonymous posted on Tuesday, April 16, 2002  9:29 am



What methods does Mplus use to solve overidentified models / SEMs ? I've been running a rather large SEM with latent variables and have noticed that one or two portions of the model are overidentifed. Mplus doesn't return any error messages, the model converges without difficulty, and all the parameter estimates appear to be wellbehaved. I'm wondering if the estimates for the coefficients are valid. Thanks. 


I assume you mean underidentified, that is, not identified. A part of the model can be not identified if analyzed alone but can become identified when it is part of a larger model. For example, a factor model with two indicators is not identified but a model with two factors each with two indicators is identified due borrowing information from each other. If this does not answer your question, let me know. If Mplus does not complain about the model not being identified, it is most likely identified. 

Anonymous posted on Tuesday, April 16, 2002  2:04 pm



I am using Hanushek and Jackson's definition of "overidentified". In a "just identified" model, you have an equal number of equations and unknowns, thus you can get a unique solution for each unknown for all the information provided in the model. In an "overidentified" model, you have more equations than unkowns, which at first blush appears to be a good thing  you have many different ways of obtaining estimates for your unknown quantities. A problem occurs with overidentified models because with sample data, the errors and variances are not the same across variables. Recommended ways of obtaining solutions include twostage least squares (TSLS), indirect least squares (ILS), etc. Thus, I'm wondering how Mplus obtains parameter estimates in the overidentified case. 


A model with more unknowns (parameters) than equations (sample statistics) is not identified and will not be estimated by Mplus. A model with less unknowns than equations will be estimated if it is identified using maximum likelihood or weighted least squares. 

Jason posted on Thursday, November 06, 2003  4:15 am



Very simple question from a new user. If I have a single indicator of a construct to be used in a larger SEM mediation model that has other latent constructs with multiple indicators, what is the best way to identify it. I tried fixing the factor loading of the single indicator to "1", but the model said it could not estimate the error variances. Suggestions? 


You do not need to create a latent variable if you have a single indicator. Just use the observed variable in the analysis and Mplus will take care of it. The only reason you would want to create a latent variable behind a single indicator is if you wanted to fix the residual variance to a value that reflects the reliability of the measure. 


Linda following on from this last posting, I was wondering if you could tell me how one goes about fixing the residual variance to reflect a previous estimate of reliability. I have tried the following (fixing error variance at 0.3) and it does not seem to work (model does not converge): latent by x1@1; error by x1@1; error@.3; 


f BY y@1; y@a; where a is the error variance in y chosen as a = (1 – reliability) * sample variance 

Anonymous posted on Thursday, March 04, 2004  2:57 pm



I’d like to pick up the discussion singular information matrices and identifiability again. You said above that the model is probably identified if Mplus does not report any error messages, but I am wondering what types of error messages to look for/will come up when a model is not identified. 1. Is it correct to conclude that even if a model estimation terminates normally and yeilds fit statistics, class counts and memberships, model results (e.g., thresholds and/or proportions, the regression model part, etc.) … WHENEVER you get an error message in Tech11 that states that the information matrix with one less class is singular, this is always empirical evidence that the model is unidentified? 2. If this statement is true, when I encountered this error message I noted that Mplus still reported Estimates, S.E., and Est./S.E. when the model was not identified, but did not report Std or StdYX. Is the absence of values for Std and StdYX in the output another was to tell that a model is not identified? 3. Is Tech11 the only place in the output that indicates whether the information matrix with one less class is singular (or that indicates in some way that the model is not identified)? If so, is it true that the only was to determine that the information matrix is nonsingular by making sure that there are no error messages in Tech11? Thanks in advance. This discussion has been very helpful. 

bmuthen posted on Thursday, March 04, 2004  6:21 pm



1. The usual nonidentifiability message in the regular results section refers to the kclass model  use that. Tech11 refers to the k1 class model. 2. My answer to 1. also answers 2  SE's are not reported when the model (the kclass model) is not identified. 3. A better way to check if the k1 class model is identified is to run it (without Tech11). 

Anonymous posted on Thursday, September 30, 2004  11:49 am



Hi Linda, i have the following model and I am using mplus version 3. MODEL: A ON C D; B ON C D; E ON A B; F ON A B; G ON A B; H ON A B E F G; MODIFICATION INDICES INDICATE THAT THE MODEL CAN BE IMPROVED BY ADDING A WITH B; E WITH F; F WITH G; NOW BY LOOKING AT THIS MODEL IS THIS MODEL IDENTIFIED AFTER CORRELATING THESE ERRORS OF MEDIATING VARIABLES? THANKS A LOT 


You should add one term and at time because the modification indices for the other terms may change when a term is added. Mplus will notify you if the model is not identified. 

Anonymous posted on Monday, November 15, 2004  9:24 pm



I have two questions First, I am interested in estimating a model where: y1* = w1w5+u1 where y* is a categorical variable while also estimating: y2 = y1*+x1x5+u2. There is some concern that the first equation may be overspecified. Is there a way to test this? Second, assuming that y1* was continous, in which of the Tech files would I find the data to compute a Hausman test to see if I needed to instrument y1* at all? Thank you. 

bmuthen posted on Tuesday, November 16, 2004  12:19 pm



Let me ask some question so I understand this setup  are the w1 and w5 exogenous variables and if so what do you mean by overspecified? Remind me, the Hausman test concerns leftout exogeneous variables that might make residuals correlate with the included exogenous variables, right? I am not sure the Mplus output has the information needed for this test (but it is a test we are interested in adding). 

Anonymous posted on Monday, April 18, 2005  5:48 pm



Can someone please help? How can the model be unidentified when I have 52 degrees of freedom? See below Computation of degrees of freedom Number of distinct sample moments = 91 Number of distinct parameters to be estimated = 39 Degrees of freedom = 91  39 = 52 The model is probably unidentified. In order to achieve identifiability, it will probably be necessary to impose 3 additional constraints. The (probably) unidentified parameters are marked. 


Can you send the full output to support@statmodel.com along with your license number. This is not enough information to answer your question. Note also that a model with positive degrees of freedom may not be identified due to a part of the model not being identified. 

Matt Moehr posted on Monday, December 04, 2006  7:42 am



The data I have come from a study where babies were given 3 different toys for sixty seconds each. The babies' behavior was timed and coded as Focused Attention (f), Casual Attention (c), or Not Looking (n). A subsample of the babies was coded by two raters and a reliability score, given as percent agreement, was calculated for each of the three categories: Focused = 55.8% Casual = 79.6% Not Look = 95.8% I found Linda's post from 1/23/2004, where she recommends using the formula (1reliability)*sample variance to specify the error variance. My question is, can this same method be applied to latent variables with more than one measure? I tried the model below, but some of the results are confusing: ANALYSIS: type = general missing h1; estimator = ml; MODEL: mood1 BY f1@1 c1 n1*1; mood2 BY f2@1 c2 n2*1; mood3 BY f3@1 c3 n3*1; mood2 ON mood1; mood3 ON mood2; n1@10.724; n2@10.747; n3@9.133; f1@63.258; f2@20.849; f3@27.685; c1@37.651; c2@34.753; c3@36.456; I triple checked my math, so I suspect something is going wrong because I have multiple measures on a single latent variable. Can I still use the interrater reliability to specify the error variance? 


You would only want to fixed the residual variance to correct for reliability when you have a single indicator. When you have a factor with several indicators, this captures the unreliability. You should remove the following statements: n1@10.724; n2@10.747; n3@9.133; f1@63.258; f2@20.849; f3@27.685; c1@37.651; c2@34.753; c3@36.456; 

Matt Moehr posted on Monday, December 04, 2006  2:50 pm



My strategy on this analysis was to run three separate models with just one category of attention as the indicators This made 3 simple hidden Markov models, but I had to build in some type of assumption because they were all underidentified. I made the choice to bring in the interrater reliability because it seemed like the least restrictive assumption I could make. Then I began to wonder if I could make a model that used all three indicators for each trial, but I could never get this model to converge. When I went back and fixed the error variances, the model estimated just fine and most of the structural and measurement paths looked great. However, the standardized error variances were no longer equal to the interrater reliability, so I agree that this model is probably misspecified. Do you have any suggestions to improve model convergence in hidden Markov models? Is there an inherent problem in using multiple indicators, which are almost perfectly (negatively) correlated due to the mutually exclusive coding groups? 


I am a little confused by your mentioning of both factor analysis and hidden markov, but let me focus on the latter. With a single observed binary indicator at each of 3 time points a hidden markov is identifiable if you estimate measurement error that is constrained to be invariant across time. If you have problems with this, send input, output, data and license number to support@statmodel.com. Yes, you don't want to use multiple indicators created from mutually exclusive coding groups. Such variables could be treated as nominal and then LTA (hidden markov) is possible. 

Matt Moehr posted on Friday, December 08, 2006  10:36 am



I'm also a little confused by the use of factor analysis and hidden Markov , but I was handed this data long after the study was designed and executed. The theory we're working with says that 6month old babies will show an habituation to novel stimulus, in this case new toys. However, interwoven with the habituation each baby should show early signs of innate temperament, or "personality" if you like. The main goal of the project is to measure temperament at 6months and relate it to followup measures of temperament and psychosocial adjustment at age 24 months. Seems like a good idea, but my models don't like the variables for timespans of attention. I think this is because they are mutually exclusive categories. For instance, if a baby spends a lot of time "Not Looking" during the trial, that baby is going to spend less time in "Focused Attention". There are only sixty seconds in each trial, so more time in one category means less time in another. Simple correlations among the residuals could account for this, but then the habituation that occurs between trials would be lost. I think what I'm trying to do is a basic LTA, but with a panel analysis (or hidden Markov?) stuck on the other side of the variables. I'll try to clean up my syntax and send you all of the files. 


It sounds like you have a 3category nominal variable (a single indicator) measured at two time points and that you want to do an LTA  see the UG for how to do that. 


Identification in Latent Class, Cluster Path Model I have a data set with 49 clusters  and I am estimating a path model with 61 parameters I get the following error message: THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. I am a novice here, and don't quite follow why the number of clusters has to exceed the number of parameters  can you please clarify? Thanks in advance for your help! 


You can see formulas 167, 168, and 169 in Technical Appendix 8 which is on the website. Basically, an information matrix that is used in the computation of the standard errors is singular. It does not need to be inverted for the MLR estimator so we provide standard errors. This is a warning that we are not sure if they are accurate when there as less clusters than parameters. The affect of this matrix being singular has not been studied. 


Thanks for the prompt response! Much appreciated... Raji 

dm posted on Sunday, May 20, 2007  12:54 pm



Hi, I am using the following model in a paper: … CATEGORICAL IS Y1 Y2; ANALYSIS: PARAMETERIZATION=THETA; TYPE=MEANSTRUCTURE; MODEL: Y1 ON Y2 Y3 X1 X2 X3; Y2 ON Y3 X1 X4 X5 X6 X7; Y3 ON Y1 X1 X4 X5 X2 X3 X8 X9; Y1, Y2 and Y3 are endogenous variables and X1 … X9 are exogenous variables. The paper received a R&R from a major journal and one of the reviewers questioned whether MPLUS can solve the causation (feedback effect) between Y1 and Y3. (1) The reviewer said, “I do not see how your model can be identified without some instruments for the variables “Y3” and “Y1.” Do I really need instrumental variables? I think if the model cannot be identified, MPLUS will generate an error message (it didn’t in my case). If I don’t need instrumental variables, how should I respond to the reviewer? (2) The reviewer said, “Your very schematic description of the model (you need to provide its graphic representation) does not give me any idea how your model is identified.” For “graphical representation,” do you think the reviewer just wants a graph indicating possible causal paths between all the variables? (3) It seems that the reviewer wants more technical discussion, so could you please give me suggestions on methodological details of model identification in MPLUS (for the procedure I used)? 


(1)The reciprocal interaction between y1 and y3 is identified because your model fulfills the rule of having at least one x variable that influences y1 and not y3 and at least one x variable that influences y3 and not y1. I think Bollen's SEM book covers this. Empirically, you are right that Mplus would complain if the model was not identified. (2) Draw the model and also refer to a section in some SEM book for identification. (3) Identification matters in Mplus is the same as in other SEM, so Bollen's book is a good resource. 


Dear Linda and/or Bengt, I know that if a portion of a model is underidentified then the entire model is underidentified (e.g., a secondorder CFA with two firstorder factors and one secondorder factor is always underidentified in the absence of a constraint on the higherorder loadings because the secondorder portion of the model is underidentified regardless of whether the firstorder portion is identified or overidentified). What I am wondering is whether an overall model is still overidentified if a portion of it is just identified? As one example, consider a secondorder CFA with three firstorder factors and one secondorder factor and four observed indicators of each firstorder factor. The firstorder portion of the model is overidentified but the secondorder portion is just identified. Is the overall model overidentified? Thanks! Rick Zinbarg 


The overall model is overidentified as far as degrees of freedom. The justidentified part of the model fits perfectly so the test of fit applies to the part of the model that is overidentified. 


thanks very much! 


I've adjusted 2 different models but I think I've obtained the same results żare they equivalent?  because I've expected to found some difference. (All observed variables are continuous and normally distributed). model_1 > f1 BY y1 y2 y3 x; model_2 > f1 BY y1 y2 y3; x ON f1; I thought that if the error distribution of the "structural part" (x ON f1) is different to the errordistribution of the "measurement part" then both models could be different, but if the error distribution is the same then it makes no difference, and both models are equivalent. Is it correct? 


f1 BY x is the same as x ON f1 so these models are the same. 


I am running a CFA with several constructs, one of which has only 1 indicator. I fixed error variance to (1 – reliability) * sample variance. Would you happen to know the reference for that formula? I tried looking in some multivariate and SEM textbooks, but I could not find it. Thx! 


I don't now offhand. I would think the Bollen book might discuss this. 


I cannot get the Bollen book (Structural equations with latent variables)...it is not on Psyinfo nor in any library in our country. I also looked at his paper ''Latent variables in psychology and the social sciences'', no success. Any other suggestions? Thx. 


This should be documented in most SEM books. We show this in slide 43 of our Topic 1 course handout. You can try Google or ask on SEMNET. Does anyone else on this forum know of a reference? 


Maša, as Linda says, Bollen's book, chapter 5 ("The consequences of measurement error") is the standard reference. Other, easytofind ebook, references are: Schumaker&Lomax: A Beginner's Guide to Structural Equation Modeling (2nd. ed), pags. 198199 Kline: Principles and Practice of Structural Equation Modeling (2nd. ed), pags. 229231 (Or try Google with: latent variable with single indicator). 

Vlad posted on Wednesday, October 21, 2009  8:48 am



Hello, I am estimating the model with 3classes and always get this messege from mplus: WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OR RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA. I've tried to use starts but nothing has changed. Any suggestion? 


Vlad, The message is just a reminder. It will pop up no matter how many starting values you'll specify. /Amir 


Hi, I've question for underidentified model with 2 indicators. I did equate both loadings for this model. However the results give me one loading=1.00, se=0.000 so that est/se=999.00 and p=999. The other loading is 1.42 with se=.056, est/se=25.35. My question, what is that means with est/se=999 and p=999? Thanks 


The 999 means the value can not be computed. A fixed parameter has no standard error or significance test. It does not sound like you held the loadings equal. To do this, f BY y1* (1) y2 (1); 


Hi, After reading through some of the earlier postings, it looks as though a globally identified model will estimate even if there are some locally nonidentified parameters. Is that correct, or is there a warning if local underidentification is present? If there is no warning, does this mean that it is safe to interpret all model parameters or the locally identified parameters? Thanks for your help! 


We check for local not global identification. I think in 99% of the cases, Mplus will catch nonidentification and give a message. If you want to be certain about identification, you can use the STARTS option to generate random starts. Perhaps you mean that some parameters are not identified and others are. 


Hi, Thank you for your response. Yes, I was asking about how Mplus would handle a situation in which some parameters are not identified and others are. From your response, it sounds like Mplus will notice this and give a warning message in almost all circumstances. Is that correct? 


Yes. 

Kip Sorgen posted on Tuesday, February 08, 2011  10:37 am



I have a latent construct with three indicators that is just identified in CFA and explains 92% of the variance. What are some considerations of having a factor with zero degrees of freedom when including it in the full SEM model? 


The major one for me is that you can't test its fit. 


In a measurement model; what is the relation between the reliability of a single indicator and its sensibility and specificity? 


I don't know if there is a relationship between reliability and sensitivity and specificity. You might look at the LCA literature. 

Joni posted on Thursday, March 22, 2012  12:41 pm



Hi, I am a very new user to Mplus. And need a question answered about identification. I have a latent factor with 3 indicators that will be part of larger SEM. I wanted to check my measurement models before I added them to the larger model. This is my code: CATEGORICAL b15a b15b b15c; Weight W1; cluster C1; STRATIFICATION C3; Analysis: Type = complex; Model: NEG BY b15a* b15b b15c@1 ; My model has 0 degrees of freedom for the ChiSquare Test of Model Fit and therefore don't get estimates for the Test of fit or RMSEA and my TLI/CFI=1, does this mean I need to fix my factor variance? And if so, how do I do that? Thanks! 


A factor with three indicators is justidentified. Model fit cannot be assessed for a factor with less than four indicators. 

Cecily Na posted on Wednesday, October 31, 2012  12:26 pm



Hello, I have a model with two latent factors and other observed covariates. Each latent factor has two indicators. One latent factor together with other covariates cause the other latent factor. No errors are correlated. Is this model identified? How do these two factors borrow information from each other? Thanks a lot! 


Sounds like an identified model given that a model with 2 factors that are correlated and each has 2 indicators is identified. The covariances between the 2 sets of factor indicators make it identified. And your model also has covariates. 

Paula Vagos posted on Friday, December 14, 2012  4:12 am



Hello, I am trying to test a secondorder model, with two firstorder factors, each with 8 indicators. I fexed the paths between the two first order factors and the higherorder factor. Would this model be possible/ identified? Thank you. 


If both factor loadings are fixed, then the model would be identified. But I wonder what the point of this is. If they are fixed at one, the one parameter that is estimated is the factor variance which is the covariance between the two factor indicators. 

Back to top 