Message/Author 


I have a model that estimates two separate LCA's (two 4class models for two sets of behaviors comprised of ordinal and dichotmous indicators) and regresses the later class variable on the first. In addition, I include some covariates and test a covariate by latent class interaction. To determine appropriate class sizes for the two LCA's I ran each separately, first, and used the adjusted BIC, LMRLRT, and entropy as a guide (data is complex sample, so I did not use Tech14). Then I ran a combined model informed by these separate results. The reviews were very positive, but one reviewer suggests the need to test the validity of the local independence assumption  referencing Garrett & Zeger (2000) LogOdds Ratio Check or evaluation of the bivariate residuals (Vermunt & Magidson, 2000). My questions  is this assumption necessary to test (wouldn't nonindepence be relaxing this assumption of the method); are the two recommended tests appropriate for my data type (i.e., ordinal categorical) and model; and are the readily implemented in Mplus (my sense from Karen Nylund's 2007 paper is "no"). 


You can check the assumption by looking to see if you have significant bivariate residuals in TECH10. 


With regard to Tech10  is there a particular standard I should use in evaluating local independence? With 6 variables (each having 2 to 4 levels), there are a number of standardized residual zscores in excess of 1.96. Does the presence of any significant residuals pose a problem for the local independence assumption, or is there a "rule of thumb" regarding how many (or what percent) are significant? Would a significant residual indicate that I need to have a direct relation between two given indicators in the model? Also, is there a way to request (or assess) the expected information matrix to formally check model identification (another request from the reviewer)? I see the ratio of smallest to largest eigenvalue in my output (0.137E05), would that indicate that none of my eigenvalues equal "0"  and, thus, my model is identified? Thank you, Christian 


The first thing I would do is try one more class and see if the significant standardized residuals are still significant. It may be you need one more class. Significant standardized residuals indicate that the assumption of conditional independence is not met. There is no rule of thumb here as to how many are acceptable. Each residual covariance is one dimension of integration so adding more than 4 may not be feasible. If the information matrix is singular, we give a message. The ratio of smallest to largest eigenvalue in your output (0.137E05) indicates that your model is identified. 


I had run additional models with more classes, but the Adjusted BIC, LMR Likelihood Ratio Test, and entropy did not suggest these models were appropriate. Significant bivariate residuals are still present (assuming that I'm interpreting the output correctly  would the absolute value of the standardized residual (zscore) > 1.96 indicate significance?). Do such findings invalidate the model results? In the case of conflicting evidence (i.e., if standardized residuals did decrease)  which set of standards should guide model selection? Vermunt appears to indicate there is a tradeoff between local independence and class size (i.e., by relaxing this assumption and allowing direct relations among indicators you may reduce classes). However, by allowing such relations you may also ignore potentially meaningful classes. I don't recall a clear statement as to how to weigh the decision. Also, would one relax this assumption by modeling a covariance between indicators? In the current model, for example, I am modeling substance use classes based upon categorical indicators of frequency of use for various substances. Would I have to indicate that some of the substances (tobacco and alcohol use) are also indendently associated with one another apart from the class structure that I am reporting? 


Yes, a standardized residual of 1.96 indicates significance. It is not clear how to handle this situation. I would use the meaningfulness of the classes as a guide. Yes, you can relax the assumption by modeling the covariance. 


I am using the bivariate results in TECH10 to evaluate local independence of an IRT mixture model and would like to clarify the following: 1) How are the standardized residuals calculated? 2) Are the standardized residuals expected to follow a particular distribution (e.g., as in Yen's Q3 statistic which, under multivariate normality, is expected to be normally distributed)? 3) Is the chisquare for the bivariate association the same as the chisquare for local independence suggested by Chen (1997)? 4) How are the degrees of freedom for the chisquare statistics calculated? Are they simply (row1) * (columns1) (e.g., Df = 4 two variables with 3 ordinal categories each) or must the number of estimated parameters (thresholds) be taken into account (e.g., DF = 5 for two variables with three ordinal categories each)? Thank you very much for clarifying these points. 


The standardized residuals given in tech10 are the standardized Pearson residuals. See Agresti's Categorical Data Analysis book, Sections 3.3.1 and 4.5.5. The original article on this topic is The Analysis of Residuals in CrossClassified Tables, Shelby J. Haberman, Biometrics, Vol. 29, No. 1 (Mar., 1973), pp. 205220. They are normally distributed zscores. For the bivariate tables the standardized residuals are computed by (OE)/[sqrt(E)*sqrt(1E/n)]. O and E are the Observed and Expected (model estimated) quantities for a pattern in the categorical data. 

Rob Dvorak posted on Friday, November 27, 2009  8:55 am



Hi Drs. Muthen, I was wondering if there is a way to evaluate the Condition Number for the Information Matrix. I have been told that > 0.xE06 is a rule of thumb, but I'm wondering if there is a citation I am missing (perhaps in the Mplus manual that I've missed). In my analysis, mine is currently 0.133E04. 


I don't know about citations, although I assume the numerical analysis literature would have something on it. Depending on the algorithm, I think our epsilon limit for calling it singular, and most likely nonidentified, is E09 or E10. I think different sized models with different sized parameter values can influence whether or not a small value should be seen as an indicator of nonidentification. Then there is also the matter of which estimator of the information matrix one uses. Mplus works with MLF, ML, and MLR. MLF seems to be most sensitive to possible singularity/nonidentification. 


I am trying to evaluate local independence for a LCA with a four class solution. I have requested tech 10 from mplus, but, I wasnt sure what part of the output under the bivariate model information section to interpret. I see z scores for the different combinations of variables as well as two chi square tests. Thank you for your time. 


You can use both. The z scores give you detailed information about sources of misfit. Chisquare presents it more globally. 


Hi. I am doing a CFA with one factor and ordinal categorical outcome variables. I have used multiple imputation with wlsmv. I guess I can't get the bivariate correlation of the standardized residuals to evaluate local independence. Is there another way I can get this information? Thanks so much! New to MPLUS and still learning. 


You can use MLR on the original data and ask for TECH10. With only one factor and categorical factor indicators, you require only one dimension of integration. 


Hi. I am doing a CFA with one factor and ordinal categorical outcome variables. In order to evaluate local independence, I am using Reeve's >.2 criterion. The ouput I get with the following syntax shows correlations across categories. Is there a way I can collpase this to just show the resdiual correlation averaged across the indicators? This is the syntax I used: TITLE: CFA safety tbi mlr with tech 10_CC DATA: FILE IS "C:\Users\kathy\Desktop\shepherd_safety_project\ ControlFIle_cg_cc3_3_2012.dat"; VARIABLE: NAMES ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14 cc15 cc16 cc17 cc18 cc19 cc20 cc21; CATEGORICAL ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14 cc15 cc16 cc17 cc18 cc19 cc20 cc21; MISSING ARE ALL (9); ANALYSIS: Estimator=mlr; Model: f BY cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14 cc15 cc16 cc17 cc18 cc19 cc20 cc21; f@1; OUTPUT: tech10 SAMPSTAT; STAND; RESIDUAL; PATTERNS; SAVEDATA: FILE IS COGCAP_03022012cfa.DAT; FORMAT IS F2.0; Thanks! 


Please send the output and your license number to support@statmodel.com. 

Andy Daniel posted on Wednesday, December 12, 2012  6:59 am



Hi, I'm running a LCA with repeated measures of one nominal variable in a longitudinal dataset (6 categories in each wave). It is very plausible that the Local Independence Assumption isn't met in this case and that the measurement errors are associated. I was wondering if there is a way to check the Local Independence Assumption in this model with mplus. Due to the fact that the variables are nominal TECH10 is not provided. The following question would be if it is possible to model the assocation between the measurement errors to deal with the violation of the LIAssumption. Many Thanks for your help!!! Best, Andy 


It is difficult to model such associations. You can take the approach in UG ex 7.16. 


Dr. Muthen, I requested tech 10 for my LCA; I have continuous indicators and one categorical indicator. In this output I only see info for the categorical indicator. How do I request residuals for the continuous indicators? Thanks, Danyel 


Use the RESIDUAL option if it is available for your analysis. 


Thanks, Dr. Muthen. I requested the residual, but I don't see the Z tests associated with the covariances between the indicators. Am I supposed to request something additional? Thanks so much. Danyel 


Also, I have another question about interpreting the odds ratio. latent class 1 compared to latent class 2 c4gen category > 1 1.266 p value .03 The categorical variable is gender whereby 1 = female and 2 = male. Would a correct interpretation be the following? In comparison to class 2, those in class 1 are more likely to be male? Thanks so much. Danyel 


Also, here is another table. RESULTS IN PROBABILITY SCALE Latent Class 1 C4GEN Category 1 0.486 0.103 4.739 0.000 Category 2 0.514 0.103 5.016 0.000 Latent Class 2 C4GEN Category 1 0.545 0.037 14.588 0.000 Category 2 0.455 0.037 12.194 0.000 


The standardized residuals are zscores. See pages 496497 of the user's guide for the interpretation of odds ratio results. The table above is a translation of the logits in the results section to probabilities. 


Hello, I am using tech10 to evaluate the conditional independence assumption of an LCA model. I have 2 questions: 1) How do I calculate the degrees of freedom for the Overall Bivariate Pearson Chisquare posted at the very end of the tech10 output? 2) None of the individual bivariate standardized Pearson residuals are significant at 1.96. However some of the Bivariate Pearson Chisquares for variable pairs are significant (> 3.84). Should I consider these violations of the conditional independence assumption? Thanks! 


1) The distribution of the Overall Bivariate Pearson Chisquare statistic is not known. It is computed mostly for comparative purposes. 2) No. These are again computed for comparative purposes and I would not recommend a cutoff value. The proper use is as follows. Consider the "ChiSquare Test of Model Fit" in the "MODEL FIT INFORMATION" section. If the model is rejected examine tech10 tables and modify the model for pairs of variables with the largest Bivariate Pearson Chisquares values. Modifications along the line of (page 8) are recommended http://www.statmodel.com/download/Version7.2LanguageAddendum.pdf 


Thanks for the speedy response. Given that the Chisquare statistic does not follow a known distribution, is it possible to bootstrap the residuals in Mplus as recommended by Oberski et al.? http://members.home.nl/jeroenvermunt/oberski2013a.pdf http://daob.nl/wpcontent/uploads/2013/05/oberskibreschia.pdf 


Also, as a quick followup question to your previous recommendation: When would you recommend modeling the residual covariances as constrained to be equal across classes vs. free across classes? 


It is possible Evann but it will require a bot of programing on your end. 1) Generate 100 data sets according to your estimated model. Then compute tech10 statistics for each and assemble the values of these statistics to obtain the null hypothesis distribution of these statistics. You can use https://www.statmodel.com/utility/extractor.shtml http://www.statmodel.com/examples/webnotes/web10.zip or use R https://www.statmodel.com/usingmplusviar.shtml For "equal across classes vs. free across classes" question, start with unequal and test using model test or model constraints for equality. 


Thanks for the advice. I'd like to give bootstrapping a try. Extracting the model parameters from the output was relatively simply in R (my native statistical language), but I'm having trouble sorting out the best way to generate data from them. Given the unstandardized threshold estimates and standard errors, how would you proceed? Thanks again 


You can use the SVALUES option of the OUTPUT command to get the input with ending values as starting values and use those statements as input in MODEL POPULATION to generate data sets. See Chapter 12 for examples of Monte Carlo inputs. 


Thanks for the SVALUES tip. Using the starting values, however, I'm now running into the error: *** ERROR in MODEL POPULATION command One or more pairs of ordered thresholds are not increasing in Class 1. Check your population values. Problem with the following pairs: PT_EL$2 (1.427) and PT_EL$3 (1.427) What's the best way to proceed? 


Please send the output with the SVALUES and the output with the error message along with your license number to support@statmodel.com. 

Evann Smith posted on Thursday, July 02, 2015  10:32 am



Hi, I've now successfully generated data and bootstrapped the distributions of my BVRs. Because my data is clustered, I did this in two steps (following the user's guide): 1) generate the data using the twolevel specification , 2) run the models using the complex specification. A few of my bootstrapped pvalues were significant. I've modeled the largest dependency using type=complex, parameterization=rescov, and the "with" statement in the model. Now I'd like to generate new data and get new bootstrapped pvalues for the BVRs, having modeled one local dependence. I'm having trouble, however, figuring out how to generate clustered data that also had a "with" parameter. I keep getting the error that twolevel and rescov don't work together. Is there a way to Montecarlo generate new clustered data that uses the model parameters from my new model that accounts for one local dependency? Thanks! 


RESCOV is for TYPE=MXITURE. You can create a covariance between two categorical variables when maximum likelihood is used by saying: f BY u1@1 u2; f@1; [f@0]; where the factor loading of u2 is the covariance parameter. Note that in this case, each covariance requires one dimension of integration. 

Evann Smith posted on Thursday, July 02, 2015  2:40 pm



I've been using type=mixture because the model for which I'm generating data and bootstrapping the BVRs is a latent class model. For the first iteration (bootstrapping the BVRs for a clustered latent class model with all residual covaiances held at 0), I used "type=twolevel mixture" to generate the data and then "type=complex mixture" to bootstrap the BVRs. Are you suggesting that for this second iteration (where I have parameters for a residual covariance) that I generate the data in the first step not as mixture model? Thanks! 


You say above you generate as twolevel. If you have TYPE=MIXTURE, you should be able to use RESCOV. If you can't see the problem, send the output and your license number to support@statmodel.com. 

Back to top 