Mplus Discussion >> Local Independence Assumption

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Local Independence Assumption

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Christian M. Connell posted on Saturday, January 24, 2009 - 12:14 pm

I have a model that estimates two separate LCA's (two 4-class models for two sets of behaviors comprised of ordinal and dichotmous indicators) and regresses the later class variable on the first. In addition, I include some covariates and test a covariate by latent class interaction.

To determine appropriate class sizes for the two LCA's I ran each separately, first, and used the adjusted BIC, LMR-LRT, and entropy as a guide (data is complex sample, so I did not use Tech14). Then I ran a combined model informed by these separate results.

The reviews were very positive, but one reviewer suggests the need to test the validity of the local independence assumption -- referencing Garrett & Zeger (2000) Log-Odds Ratio Check or evaluation of the bivariate residuals (Vermunt & Magidson, 2000).

My questions -- is this assumption necessary to test (wouldn't non-indepence be relaxing this assumption of the method); are the two recommended tests appropriate for my data type (i.e., ordinal categorical) and model; and are the readily implemented in Mplus (my sense from Karen Nylund's 2007 paper is "no").

Bengt O. Muthen posted on Saturday, January 24, 2009 - 4:03 pm

You can check the assumption by looking to see if you have significant bivariate residuals in TECH10.

Christian M. Connell posted on Monday, January 26, 2009 - 1:04 pm

With regard to Tech10 -- is there a particular standard I should use in evaluating local independence? With 6 variables (each having 2 to 4 levels), there are a number of standardized residual z-scores in excess of |1.96|. Does the presence of any significant residuals pose a problem for the local independence assumption, or is there a "rule of thumb" regarding how many (or what percent) are significant? Would a significant residual indicate that I need to have a direct relation between two given indicators in the model?

Also, is there a way to request (or assess) the expected information matrix to formally check model identification (another request from the reviewer)? I see the ratio of smallest to largest eigenvalue in my output (0.137E-05), would that indicate that none of my eigenvalues equal "0" -- and, thus, my model is identified?

Thank you,
Christian

Linda K. Muthen posted on Monday, January 26, 2009 - 4:32 pm

The first thing I would do is try one more class and see if the significant standardized residuals are still significant. It may be you need one more class.

Significant standardized residuals indicate that the assumption of conditional independence is not met. There is no rule of thumb here as to how many are acceptable. Each residual covariance is one dimension of integration so adding more than 4 may not be feasible.

If the information matrix is singular, we give a message. The ratio of smallest to largest eigenvalue in your output (0.137E-05) indicates that your model is identified.

Christian M. Connell posted on Monday, January 26, 2009 - 5:47 pm

I had run additional models with more classes, but the Adjusted BIC, LMR Likelihood Ratio Test, and entropy did not suggest these models were appropriate. Significant bivariate residuals are still present (assuming that I'm interpreting the output correctly -- would the absolute value of the standardized residual (z-score) > 1.96 indicate significance?).

Do such findings invalidate the model results? In the case of conflicting evidence (i.e., if standardized residuals did decrease) -- which set of standards should guide model selection? Vermunt appears to indicate there is a trade-off between local independence and class size (i.e., by relaxing this assumption and allowing direct relations among indicators you may reduce classes). However, by allowing such relations you may also ignore potentially meaningful classes. I don't recall a clear statement as to how to weigh the decision.

Also, would one relax this assumption by modeling a covariance between indicators? In the current model, for example, I am modeling substance use classes based upon categorical indicators of frequency of use for various substances. Would I have to indicate that some of the substances (tobacco and alcohol use) are also indendently associated with one another apart from the class structure that I am reporting?

Linda K. Muthen posted on Tuesday, January 27, 2009 - 10:32 am

Yes, a standardized residual of 1.96 indicates significance.

It is not clear how to handle this situation. I would use the meaningfulness of the classes as a guide.

Yes, you can relax the assumption by modeling the covariance.

Rick Sawatzky posted on Saturday, January 31, 2009 - 9:38 am

I am using the bivariate results in TECH10 to evaluate local independence of an IRT mixture model and would like to clarify the following:

1) How are the standardized residuals calculated?

2) Are the standardized residuals expected to follow a particular distribution (e.g., as in Yen's Q3 statistic which, under multivariate normality, is expected to be normally distributed)?

3) Is the chi-square for the bivariate association the same as the chi-square for local independence suggested by Chen (1997)?

4) How are the degrees of freedom for the chi-square statistics calculated? Are they simply (row-1) * (columns-1) (e.g., Df = 4 two variables with 3 ordinal categories each) or must the number of estimated parameters (thresholds) be taken into account (e.g., DF = 5 for two variables with three ordinal categories each)?

Thank you very much for clarifying these points.

Linda K. Muthen posted on Sunday, February 01, 2009 - 11:31 am

The standardized residuals given in tech10 are the standardized Pearson residuals. See Agresti's Categorical Data Analysis book, Sections 3.3.1 and 4.5.5. The original article on this topic is The Analysis of Residuals in Cross-Classified Tables, Shelby J. Haberman, Biometrics, Vol. 29, No. 1 (Mar., 1973), pp. 205-220. They are normally distributed z-scores.

For the bivariate tables the standardized residuals are computed by (O-E)/[sqrt(E)*sqrt(1-E/n)]. O and E are the Observed and Expected (model estimated) quantities for a pattern in the categorical data.

Rob Dvorak posted on Friday, November 27, 2009 - 8:55 am

Hi Drs. Muthen,

I was wondering if there is a way to evaluate the Condition Number for the Information Matrix. I have been told that > 0.xE-06 is a rule of thumb, but I'm wondering if there is a citation I am missing (perhaps in the Mplus manual that I've missed). In my analysis, mine is currently 0.133E-04.

Bengt O. Muthen posted on Friday, November 27, 2009 - 10:33 am

I don't know about citations, although I assume the numerical analysis literature would have something on it. Depending on the algorithm, I think our epsilon limit for calling it singular, and most likely non-identified, is E-09 or E-10. I think different sized models with different sized parameter values can influence whether or not a small value should be seen as an indicator of non-identification. Then there is also the matter of which estimator of the information matrix one uses. Mplus works with MLF, ML, and MLR. MLF seems to be most sensitive to possible singularity/non-identification.

Jerry Cochran posted on Wednesday, June 15, 2011 - 1:47 pm

I am trying to evaluate local independence for a LCA with a four class solution. I have requested tech 10 from mplus, but, I wasnt sure what part of the output under the bivariate model information section to interpret. I see z scores for the different combinations of variables as well as two chi square tests.

Thank you for your time.

Bengt O. Muthen posted on Wednesday, June 15, 2011 - 5:21 pm

You can use both. The z scores give you detailed information about sources of misfit. Chi-square presents it more globally.

Kathleen Berger posted on Wednesday, March 07, 2012 - 9:03 am

Hi. I am doing a CFA with one factor and ordinal categorical outcome variables. I have used multiple imputation with wlsmv. I guess I can't get the bivariate correlation of the standardized residuals to evaluate local independence. Is there another way I can get this information?

Thanks so much! New to MPLUS and still learning.

Linda K. Muthen posted on Wednesday, March 07, 2012 - 10:13 am

You can use MLR on the original data and ask for TECH10. With only one factor and categorical factor indicators, you require only one dimension of integration.

Kathleen Berger posted on Monday, March 19, 2012 - 2:11 pm

Hi.
I am doing a CFA with one factor and ordinal categorical outcome variables. In order to evaluate local independence, I am using Reeve's >.2 criterion. The ouput I get with the following syntax shows correlations across categories. Is there a way I can collpase this to just show the resdiual correlation averaged across the indicators?

This is the syntax I used:

TITLE: CFA safety tbi mlr with tech 10_CC
DATA: FILE IS "C:\Users\kathy\Desktop\shepherd_safety_project\
ControlFIle_cg_cc3_3_2012.dat";
VARIABLE:
NAMES ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14
cc15 cc16 cc17 cc18 cc19 cc20 cc21;
CATEGORICAL ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12
cc13 cc14 cc15 cc16 cc17 cc18 cc19 cc20 cc21;
MISSING ARE ALL (9);
ANALYSIS:
Estimator=mlr;
Model:
f BY cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14
cc15 cc16 cc17 cc18 cc19 cc20 cc21;
f@1;
OUTPUT:
tech10
SAMPSTAT;
STAND;
RESIDUAL;
PATTERNS;
SAVEDATA:
FILE IS COGCAP_03022012cfa.DAT;
FORMAT IS F2.0;

Thanks!

Linda K. Muthen posted on Monday, March 19, 2012 - 6:52 pm

Please send the output and your license number to support@statmodel.com.

Andy Daniel posted on Wednesday, December 12, 2012 - 6:59 am

Hi,

I'm running a LCA with repeated measures of one nominal variable in a longitudinal dataset (6 categories in each wave). It is very plausible that the Local Independence Assumption isn't met in this case and that the measurement errors are associated.

I was wondering if there is a way to check the Local Independence Assumption in this model with mplus. Due to the fact that the variables are nominal TECH10 is not provided.

The following question would be if it is possible to model the assocation between the measurement errors to deal with the violation of the LI-Assumption.

Many Thanks for your help!!!

Best,

Andy

Bengt O. Muthen posted on Wednesday, December 12, 2012 - 7:19 pm

It is difficult to model such associations. You can take the approach in UG ex 7.16.

Danyel A.Vargas posted on Monday, February 10, 2014 - 8:34 am

Dr. Muthen,

I requested tech 10 for my LCA; I have continuous indicators and one categorical indicator. In this output I only see info for the categorical indicator. How do I request residuals for the continuous indicators?

Thanks,

Danyel

Linda K. Muthen posted on Monday, February 10, 2014 - 10:50 am

Use the RESIDUAL option if it is available for your analysis.

Danyel A.Vargas posted on Monday, February 10, 2014 - 11:39 am

Thanks, Dr. Muthen. I requested the residual, but I don't see the Z tests associated with the covariances between the indicators. Am I supposed to request something additional?
Thanks so much.
Danyel

Danyel A.Vargas posted on Monday, February 10, 2014 - 1:09 pm

Also, I have another question about interpreting the odds ratio.
latent class 1 compared to latent class 2
c4gen
category > 1 1.266 p value .03
The categorical variable is gender whereby 1 = female and 2 = male. Would a correct interpretation be the following?

In comparison to class 2, those in class 1 are more likely to be male?

Thanks so much.

Danyel

Danyel A.Vargas posted on Monday, February 10, 2014 - 1:25 pm

Also, here is another table.

RESULTS IN PROBABILITY SCALE

Latent Class 1

C4GEN
Category 1 0.486 0.103 4.739 0.000
Category 2 0.514 0.103 5.016 0.000

Latent Class 2

C4GEN
Category 1 0.545 0.037 14.588 0.000
Category 2 0.455 0.037 12.194 0.000

Linda K. Muthen posted on Monday, February 10, 2014 - 3:41 pm

The standardized residuals are z-scores.

See pages 496-497 of the user's guide for the interpretation of odds ratio results.

The table above is a translation of the logits in the results section to probabilities.

Evann Smith posted on Sunday, June 28, 2015 - 12:37 pm

Hello,

I am using tech10 to evaluate the conditional independence assumption of an LCA model.

I have 2 questions:

1) How do I calculate the degrees of freedom for the Overall Bivariate Pearson Chi-square posted at the very end of the tech10 output?

2) None of the individual bivariate standardized Pearson residuals are significant at |1.96|. However some of the Bivariate Pearson Chi-squares for variable pairs are significant (> 3.84). Should I consider these violations of the conditional independence assumption?

Thanks!

Tihomir Asparouhov posted on Monday, June 29, 2015 - 9:34 am

1) The distribution of the Overall Bivariate Pearson Chi-square statistic is not known. It is computed mostly for comparative purposes.

2) No. These are again computed for comparative purposes and I would not recommend a cutoff value. The proper use is as follows. Consider the
"Chi-Square Test of Model Fit" in the "MODEL FIT INFORMATION" section. If the model is rejected examine tech10 tables and modify the model for pairs of variables with the largest Bivariate Pearson Chi-squares values. Modifications along the line of (page 8) are recommended
http://www.statmodel.com/download/Version7.2LanguageAddendum.pdf

Evann Smith posted on Monday, June 29, 2015 - 10:43 am

Thanks for the speedy response.

Given that the Chi-square statistic does not follow a known distribution, is it possible to bootstrap the residuals in Mplus as recommended by Oberski et al.?

http://members.home.nl/jeroenvermunt/oberski2013a.pdf
http://daob.nl/wp-content/uploads/2013/05/oberski-breschia.pdf

Evann Smith posted on Monday, June 29, 2015 - 10:48 am

Also, as a quick follow-up question to your previous recommendation:

When would you recommend modeling the residual covariances as constrained to be equal across classes vs. free across classes?

Tihomir Asparouhov posted on Monday, June 29, 2015 - 10:56 am

It is possible Evann but it will require a bot of programing on your end. 1) Generate 100 data sets according to your estimated model. Then compute tech10 statistics for each and assemble the values of these statistics to obtain the null hypothesis distribution of these statistics.

You can use
https://www.statmodel.com/utility/extractor.shtml
http://www.statmodel.com/examples/webnotes/web10.zip
or use R
https://www.statmodel.com/usingmplusviar.shtml

For "equal across classes vs. free across classes" question, start with unequal and test using model test or model constraints for equality.

Evann Smith posted on Monday, June 29, 2015 - 7:57 pm

Thanks for the advice. I'd like to give bootstrapping a try.

Extracting the model parameters from the output was relatively simply in R (my native statistical language), but I'm having trouble sorting out the best way to generate data from them.

Given the unstandardized threshold estimates and standard errors, how would you proceed?

Thanks again

Linda K. Muthen posted on Monday, June 29, 2015 - 8:19 pm

You can use the SVALUES option of the OUTPUT command to get the input with ending values as starting values and use those statements as input in MODEL POPULATION to generate data sets. See Chapter 12 for examples of Monte Carlo inputs.

Evann Smith posted on Monday, June 29, 2015 - 9:08 pm

Thanks for the SVALUES tip.

Using the starting values, however, I'm now running into the error:

*** ERROR in MODEL POPULATION command
One or more pairs of ordered thresholds are not increasing in Class 1.
Check your population values. Problem with the following pairs:
PT_EL$2 (-1.427) and PT_EL$3 (-1.427)

What's the best way to proceed?

Linda K. Muthen posted on Tuesday, June 30, 2015 - 5:57 am

Please send the output with the SVALUES and the output with the error message along with your license number to support@statmodel.com.

Evann Smith posted on Thursday, July 02, 2015 - 10:32 am

Hi,

I've now successfully generated data and bootstrapped the distributions of my BVRs. Because my data is clustered, I did this in two steps (following the user's guide): 1) generate the data using the twolevel specification , 2) run the models using the complex specification.

A few of my bootstrapped p-values were significant. I've modeled the largest dependency using type=complex, parameterization=rescov, and the "with" statement in the model.

Now I'd like to generate new data and get new bootstrapped p-values for the BVRs, having modeled one local dependence.

I'm having trouble, however, figuring out how to generate clustered data that also had a "with" parameter. I keep getting the error that twolevel and rescov don't work together.

Is there a way to Montecarlo generate new clustered data that uses the model parameters from my new model that accounts for one local dependency?

Thanks!

Linda K. Muthen posted on Thursday, July 02, 2015 - 12:51 pm

RESCOV is for TYPE=MXITURE. You can create a covariance between two categorical variables when maximum likelihood is used by saying:

f BY u1@1 u2;
f@1; [f@0];

where the factor loading of u2 is the covariance parameter. Note that in this case, each covariance requires one dimension of integration.

Evann Smith posted on Thursday, July 02, 2015 - 2:40 pm

I've been using type=mixture because the model for which I'm generating data and bootstrapping the BVRs is a latent class model.

For the first iteration (bootstrapping the BVRs for a clustered latent class model with all residual covaiances held at 0), I used "type=twolevel mixture" to generate the data and then "type=complex mixture" to bootstrap the BVRs.

Are you suggesting that for this second iteration (where I have parameters for a residual covariance) that I generate the data in the first step not as mixture model?

Thanks!

Linda K. Muthen posted on Thursday, July 02, 2015 - 3:47 pm

You say above you generate as twolevel.

If you have TYPE=MIXTURE, you should be able to use RESCOV. If you can't see the problem, send the output and your license number to support@statmodel.com.

Massimiliano Orri posted on Tuesday, December 22, 2015 - 7:53 am

Dear Drs. Muthen,
I�m using LPA to find latent affective profiles in a group of men (fathers) and women (mothers). My indicators are 6 continuous variables. I estimated men and women profiles separately, because my instrument measuring the 6 indicators is not gender invariant (and the profiles I obtained using the entire sample did not make any sense). I obtained 3 profiles for fathers and 3 for mothers.
I would like to ask 2 questions:
1)I used the RESIDUAL option to verify the conditional independence assumption, but I�m not sure about how to interpret the output. Can you give me any hint?
2)I would like to estimate whether these profiles predict a distal outcome (their child behavior) using the three-step approach (R3STEP). However, I cannot figure out how to do that, since I studied mothers and fathers separately (and I need to consider both profiles in my model predicting the distal outcome). It there any solution? Or the only one is to first classify and then analyze?
Thanks in advance
Max

Bengt O. Muthen posted on Thursday, December 24, 2015 - 6:25 pm

1) Residuals are hard to interpret here. You can instead run the model that includes all the within-class covariances and see if some are significant and also check with BIC.

2) To predict the distal I think you first have to do an analysis that considers the family, not the individual mother or father. This model would have 2 latent class variable, one for fathers and one for mothers. That model can then be used to take a manual BCH approach to the distal, drawing on the ideas in the paper on our website:

Asparouhov, T. & Muth�n, B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Web note 21. Download appendices with Mplus scripts.

Massimiliano Orri posted on Sunday, December 27, 2015 - 12:21 pm

Thanks Dr Muth�n for your answers and suggestions.

Just to be sure:

1) Are residuals in z-values?

2) Are you suggesting me to analyze fathers and mothers together using gender as a "Knownclass" latent class variable?

Bengt O. Muthen posted on Sunday, December 27, 2015 - 6:01 pm

1) No, not unless the output says so.

2) No. You will have one latent class variables for fathers and one for mothers, each with your 3 profiles.

'Alim Beveridge posted on Monday, December 28, 2015 - 7:31 am

Dear Bengt and Linda,

I am running an LCA with 50 DVs and 358 observations. 14 are counts, all the rest are categorical (binary or ordinal). I requested TECH10 and received the following warning:
TECH10 OUTPUT FOR CATEGORICAL VARIABLES IS NOT AVAILABLE BECAUSE THE FREQUENCY TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE.

As a results, I only am given the BIVARIATE MODEL FIT INFORMATION for the count variables. Is there a way to force Mplus to provide the same for the categorical variables?

Is there a recommended approach to removing extraneous DVs?

What is the max number of categorical variables Mplus will handle and still show the TECH10 output?

Thanks,
'Alim

Massimiliano Orri posted on Monday, December 28, 2015 - 8:47 am

Ok, thank you, I figured it out.

Using this approach, the profiles that I found for mothers and fathers are different from the profiles I obtained doing the analysis separately for mothers and fathers.
In the same way, in the separate analysis I found 3 profiles for each parent (using lower BIC as criterion), but the BIC in the model with 2 latent class variables suggests 3 latent profiles for mothers and 2 (not 3) for fathers (i.e. 6 joint profiles, not 9)

So I'd like to ask two other questions about that.

1) These differences are normal because the models are different, am I right?

2) Should these 2 approaches been considered as 2 alternative ways of analyzing my data (e.g. a family-centred approch vs and individual-centred approach), suitable to answer 2 different questions?
(if yes, I could explain the fact that the profiles of each parent obtained in the separate analysis make more sense according to the theory than the profiles of each parent obtained in the analysis with 2 latent class variables).

Thanks in advance for your help!

Max

Bengt O. Muthen posted on Monday, December 28, 2015 - 11:13 am

First make sure that your two latent class variables are allowed to correlate. Use

c1 WITH c2;

Bengt O. Muthen posted on Monday, December 28, 2015 - 4:08 pm

Alim - which version of Mplus are you using?

'Alim Beveridge posted on Monday, December 28, 2015 - 6:23 pm

Dear Bengt,

I am using 7.4.

best,
'Alim

Tihomir Asparouhov posted on Tuesday, December 29, 2015 - 9:18 pm

> Is there a way to force Mplus to provide the same for the categorical variables?

- Yes. Rewrite the model only with the categorical variables, drop the count variables, fix all parameters to the values in the original model.

> Is there a recommended approach to removing extraneous DVs?

- Use the univariate entropy (drop variables with small values)

> What is the max number of categorical variables Mplus will handle and still show the TECH10 output?

- Any joint distribution with max 2^31 cell i.e. max of 31 binary but again if you don't have the count variables there is no limit.

'Alim Beveridge posted on Wednesday, December 30, 2015 - 7:20 am

Thank you Tihomir,

I added the ENTROPY command to the output section to request univariate entropy.
The results I got are strange.
First, in the list of univariate entropies only 25 (out of 50) variables (DVs) are shown
Second, 8 variables show a univ. entropy of 999.000 - I thought entropy is bounded by 0 and 1.
Third, for 3 variables the the entropy value is blank.

Have I done something wrong?

'Alim

'Alim Beveridge posted on Wednesday, December 30, 2015 - 7:37 am

> Yes. Rewrite the model only with the categorical variables, drop the count variables, fix all parameters to the values in the original model.

Dear Tihomir,

as for your first recommendation, can you tell me which parameters have to be fixed and how? Are you referring to "user-specified starting values without random starts" as shown in Example 7.4 of the manual:

%OVERALL%
%c#1%
[u1$1*1 u2$1*1 u3$1*-1 u4$1*-1];
%c#2%
[u1$1*-1 u2$1*-1 u3$1*1 u4$1*1];

Would I find the desired values by looking at STARTING VALUES FOR LATENT CLASS INDICATOR MODEL PART under the TECH1 output?

Thanks.

Bengt O. Muthen posted on Thursday, December 31, 2015 - 5:40 pm

Please re-post your questions for when Tihomir returns at the end of the week of January 11.

'Alim Beveridge posted on Monday, January 25, 2016 - 4:24 am

I am reposting my two questions for Tihomir:

Thank you Tihomir,

I added the ENTROPY command to the output section to request univariate entropy.
The results I got are strange.
First, in the list of univariate entropies only 25 (out of 50) variables (DVs) are shown
Second, 8 variables show a univ. entropy of 999.000 - I thought entropy is bounded by 0 and 1.
Third, for 3 variables the the entropy value is blank.

Have I done something wrong?

'Alim

'Alim Beveridge posted on Monday, January 25, 2016 - 4:26 am

Reposting my second question for Tihomir:

> Yes. Rewrite the model only with the categorical variables, drop the count variables, fix all parameters to the values in the original model.

Dear Tihomir,

as for your first recommendation, can you tell me which parameters have to be fixed and how? Are you referring to "user-specified starting values without random starts" as shown in Example 7.4 of the manual:

%OVERALL%
%c#1%
[u1$1*1 u2$1*1 u3$1*-1 u4$1*-1];
%c#2%
[u1$1*-1 u2$1*-1 u3$1*1 u4$1*1];

Would I find the desired values by looking at STARTING VALUES FOR LATENT CLASS INDICATOR MODEL PART under the TECH1 output?

Thanks.

Tihomir Asparouhov posted on Monday, January 25, 2016 - 3:46 pm

All of the parameters should be fixed. Use the output:svalues; command and change * to @. These are the values of the estimated model.

Please send the entropy example to support@statmodel.com.

Fangsheng Zhu posted on Monday, February 13, 2017 - 10:03 am

Dear Professors,

I'm doing LCA with a battery of attitude questions, and the local independence assumptions are violated. There appear to be several approaches to this (see following).

Currently, I assume I should try a couple of them, and settle on the model with the best BIC. Am I on the right course? Are there any other approaches I should consider?

My other concern is, I'm going to run multi-level regressions in the LC model. Will all of these approaches be compatible with this next step?

1. Try other models, such as factor mixture models. However, Lubke and Muthen (2012) stresses that the method is "exploratory" for population heterogeneity -- does this method yield more uncertainties than the standard LC model?

2. Relax the local independence assumption. Start with the unstructured model, and then remove pairs of least significant local dependence, one by one.

3. Follow Ubersax's suggestions in his "practical guide" piece, such as
a. combining manifest variables;
b. finding a latent variable to manifest variables -- if I use factor model to do this, how is this different from the factor mixture model?
c. try a loglinear form of the model

Thank you for your time!

Bengt O. Muthen posted on Monday, February 13, 2017 - 5:57 pm

Q1: Yes.

1. See Topic 5's treatment of FMM.

2. See the paper on our website:

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844. Download Mplus files.

For these general analysis strategy questions you want to use SEMNET.

Fangsheng Zhu posted on Thursday, February 23, 2017 - 11:08 am

Dear Professors,

Thank you for your previous responses!

I have a quick technical question on Mplus: is it possible to test local independence on multi-level LCA models? I saw that TECH10 is only available for TYPE=MIXTURE.

Thanks!

Bengt O. Muthen posted on Thursday, February 23, 2017 - 6:28 pm

Mplus does not have that feature.

Leia Saltzman posted on Friday, April 14, 2017 - 8:29 am

Dear Drs. Muthen,

I am trying to solve two challenges with my LPA model. I have a four class solution using a combination of continuous and binary indicators.

1) How can I test the assumption of local independence (with-in class correlation)? I tried using TECH10 and the Residual command but I am not sure if this is correct and if so how to read the output.

2) comparing the LPA model to a Factor Analytic (Dimensional) model and/or a Factor Mixture Model. Can this be done in Mplus? if so could you direct me to where I might be able to find an example of the syntax.

Thank you

Bengt O. Muthen posted on Friday, April 14, 2017 - 3:57 pm

Factor mixture examples are given in the User's Guide. Having a continuous factor added to the LCA gives a way to relax the local independence assumption. See our short course videos and handouts on our website for Topic 5, Factor Mixture Modeling.

Leia Saltzman posted on Saturday, April 15, 2017 - 2:42 am

Dear Dr. Muthen,

Thank you for your response re: FMM.

Is there also a way to test the assumption of local independence in an LPA model with a combination of continuous and binary indicators? if so can you point me towards an example of the syntax?

Bengt O. Muthen posted on Saturday, April 15, 2017 - 11:38 am

Only by introducing a factor that influences both of these items. Fixing the factor variance @1, the covariance between the items is the second loading.

Leia Saltzman posted on Saturday, April 15, 2017 - 10:37 pm

Dear Dr. Muthen,

That is very helpful, thank you so much.

nidhi gupta posted on Wednesday, November 01, 2017 - 3:56 am

Dear Dr. Muthen
Is there a way I can test the assumption of local independence when i am performing latent profile analysis? I know that TECH10 can be used if i am using latent class analysis but i am not sure how to check for this assumption when i am using latent profile analysis.
Regards
Nidhi

Bengt O. Muthen posted on Thursday, November 02, 2017 - 3:00 pm

You can either

add all possible WITH statements

add a factor and see which loadings are significant.

'Alim Beveridge posted on Saturday, November 11, 2017 - 4:33 pm

Dear Dr. Muthen,

I am conducting LCA with categorical variables (binary and ordinal). In one model I have 22 DVs, in another 59. When I request TECH10 output, I get the following warning:

TECH10 OUTPUT FOR THE CATEGORICAL VARIABLES IS NOT AVAILABLE BECAUSE THE FREQUENCY TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE.

Can I use your suggestion above - "add a factor and see which loadings are significant" - to test the assumption of local independence?

Is it enough to add an "F by" all DVs to the %overall% part of the model, or is it necessary to add additional syntax to freely estimate or constrain thresholds, means, variances, etc across the 2 classes?

Thanks, 'Alim

Bengt O. Muthen posted on Sunday, November 12, 2017 - 5:44 pm

f BY can be class-varying if need be.

Rachael Robnett posted on Tuesday, April 17, 2018 - 5:13 pm

Really basic question: I'm trying to test the assumption of local independence with LPA. Dr. Muthen's response above to Nidhi suggests adding all possible WITH statements. It would be helpful to have an example of what WITH statement to add and where to add it.

My input is below. Thank you!!

variable:
names=id common serious;
usevariables = common serious;
idvariable = id;
classes=c(3);
missing = all(99);
analysis:
type=mixture;
model:
%OVERALL%
common WITH serious;
%C#1%
[common serious *1];
%C#2%
[common serious *2];
%C#3%
[common serious *3];

Bengt O. Muthen posted on Wednesday, April 18, 2018 - 4:02 pm

There is not a straightforward way to pick the right covariances (WITH). Modindices are not necessarily reliable with mixtures.

J.D. Haltigan posted on Thursday, April 19, 2018 - 1:13 am

Q: In an LPA if we test the assumption of LI using approach described above by adding all possible with statements and find some to be significant (thus violating LI) it would seem one needs to decide between several options:

fixing them to zero
modeling the covariance and related to this adding a factor (formally crossing over to FMM)

if we model a covariance (say between two items), is this analogous to a 'method factor' and thus the model becomes a special case of an FMM?

Although this seems like what many have criticized as 'post-hoc' model tweaking (e.g., to get a best-fitting CFA), if we are interested in the most precise class assignment (rather than any latent structure per se) would this not be a defensible strategy so as to properly model the subpopulations in the data with clear ramifications for external validation etc.?

Rachael Robnett posted on Thursday, April 19, 2018 - 8:51 am

Thank you for the quick reply! Per the discussion above, I modeled the covariance between indicators within each class to test for LI.

My input is below. Am I correct to look at my WITH statements for information about LI?

variable:
names=id common serious;
usevariables = common serious;
idvariable = id;
classes=c(3);
missing = all(99);
analysis:
type=mixture;
model:
%OVERALL%
common WITH serious;
%C#1%
common WITH serious;
%C#2%
common WITH serious;
%C#3%
common WITH serious;

Bengt O. Muthen posted on Thursday, April 19, 2018 - 4:11 pm

Answer for Robnett: Yes.

Bengt O. Muthen posted on Thursday, April 19, 2018 - 4:22 pm

Answer for Haltigan:

Yes, model the significant covariance.

It could be a method factor in certain applications.

Last question: I think so.

Sam Crawley posted on Thursday, June 28, 2018 - 8:08 pm

Dear Drs. Muthen,

I note from the above that it's not possible to get the TECH10 output for TYPE=TWOLEVEL MIXTURE.

In these cases, are there any steps I can take to check local independence?

Thanks.

Tihomir Asparouhov posted on Monday, July 02, 2018 - 1:13 pm

You can use a single level model to check that. If there are such violations they should show up in the single level analysis.

Sam Crawley posted on Tuesday, September 04, 2018 - 7:52 pm

Thanks for the previous response.

I am running a single-level LCA and using TECH10 & RESIDUAL output to detect LI violations. The output indicates there are some LI problems, so I have attempted to add associations between some of the most problematic variables, as per Asparouhov & Muth�n (2015).

I have also included "PARAMETERIZATION = RESCOVARIANCES" in the ANALYSIS section, as per the Mplus manual.

However, when I do this, the output from TECH10 shows H0 is always 0.000, and so the z score is always displayed as "********", and so is the Pearson Chi-square value. The RESIDUAL output still gives something useful. This happens even when I don't specify any associations, but just set PARAMETERIZATION = RESCOVARIANCES

Is this expected, or is there something I need to do differently?

Thanks.

Bengt O. Muthen posted on Thursday, September 06, 2018 - 2:55 pm

We need to see your full output - send to Support along with your license number.

Sam Crawley posted on Wednesday, September 26, 2018 - 9:32 pm

Thanks again for the previous answers.

Another question: given that nominal variables do not show up in TECH10 output, is it reasonable to declare nominal latent class indicators as categorical to check if they are contributing to local independence violations?

In my model, I seem to get very similar (perhaps identical?) results when changing the nominal variables to categorical (same LL, class prevalences).

Bengt O. Muthen posted on Friday, September 28, 2018 - 11:07 am

The Nominal option won't give the same results as the Categorical option unless you have only 2 categories.

Sam Crawley posted on Friday, September 28, 2018 - 1:53 pm

Thanks. So how should I check for violation of the local independence assumption with nominal indicator variables?

Bengt O. Muthen posted on Friday, September 28, 2018 - 3:59 pm

There is not a way to do that unless you break up the nominal into a series of binary dummies or try a factor behind a pair of items that you suspect violate the independence assumption. Perhaps the IRT literature has something to suggest on this topic.

Yuhan Ni posted on Monday, December 10, 2018 - 9:21 am

Dear Dr Muthen,

I am running a Monte-Carlo analysis on a LPA. I wonder to generate a set of data with local/conditional dependence.

There are two plans:

Plan A:
f BY u1@1 u2;
f@1;
[f@0];

Plan B:
u1 with u2 @0.4;

Which method is better for generating data?

Thank you very much for your time.

Bengt O. Muthen posted on Monday, December 10, 2018 - 5:26 pm

Do you get a difference between these two ways? Are your items binary?

Tihomir Asparouhov posted on Monday, December 10, 2018 - 6:11 pm

Either way should work but you will need estimator=bayes for B.

Dinh Son Bui posted on Sunday, March 03, 2019 - 5:33 pm

Hi,
I repeatedly assess 6 diseases (all binary variables: yes/no status) at 1, 3 and 5 years. I want to model patterns of these diseases over 3 time points.

1. I am thinking of including 18 binary variables (6 disease x 3 time points) in a LCA model. It may not be a wright way. It may violate the independence and other assumptions?

2. can you suggest a better way to model the pattern? Is Latent transition analysis or LCA a wright way?

Thanks

Bengt O. Muthen posted on Monday, March 04, 2019 - 5:19 pm

1. See our Short Course Topic 6 video and handout on our website for how to use LCA on longitudinal data as a precursor to growth modeling. See slides 73-88.

2. This is a substantive question about what is the best representation of your situation. Do you have smooth development over time (growth modeling), or changes in statuses (LTA). Do you consider the diseases as having separate developments (analyze one disease at a time) or do they have common features (perhaps work with a factor or LCA model for each time point and then applying either growth or LTA).

Dinh Son Bui posted on Sunday, March 17, 2019 - 5:02 am

Thank you very much

Jilian Halladay posted on Monday, April 22, 2019 - 8:55 am

Hello Dr. Muthen and Dr. Asparouhov,

I am trying to test for the assumption of local independence within an LPA models with 8 continuous indicator variables in large (n~15,000) a multi-level dataset.

It appears TECH10 is only available for categorical indicators and not available in multilevel datasets. Previous suggestions say you can examine residuals in LPA, but it is difficult to interpret.

What are your thoughts on the following approach to (attempting to) measure local independence:

(1) run the model as a single level LCA with TECH10 by dummy coding the ordinal response options for each indicator. My concern with this approach is: (a) my final model is continuous, not categorical; (b) my final model is multilevel; (c) I have such a large sample size, I am worried anything will yield significant results.

AND/OR

(2) Compare multi-level model fit indices between LPAs where all, some, and no indicators are allowed to covary.

Please let me know your thoughts and if you happen to have any other suggestions. Thanks for your time.

Bengt O. Muthen posted on Monday, April 22, 2019 - 4:41 pm

You can try adding a factor to pick up within-class correlations - maybe that gives you ideas for where WITHs are needed.

You can also try WITH@0 (for all pairs) and check Modindices to see which ones should not be fixed.

Jilian Halladay posted on Friday, April 26, 2019 - 12:36 pm

Hi Dr. Muthen,

Thank you so much for your suggestion. For LPAs, is their guidance on the magnitude of modindices that suggest modification is required? Especially with a large sample size?

Thanks again

Bengt O. Muthen posted on Saturday, April 27, 2019 - 6:15 am

With a very large sample size, I would look more for magnitude of the suggested parameter estimate change - does it make a substantive difference.

Jilian Halladay posted on Thursday, May 02, 2019 - 9:14 am

Thank you Dr. Muthen,

When looking at correlations between variables (using sampstat) is there "too high" or a correlation when doing a latent profile analysis? I have 2 variables that are correlated at 0.7 - is there a risk for multicollinearity within LPAs? OR only a risk of residual shared error after accounting for profiles (i.e. violating local independence)?

When looking at changes in parameter estimates - is it best to look at STD EPC or STDYX EPC? I am also wondering what the difference between epc and std epc is (my values are identical) and what is being further adjusted for STDYX EPC when I do not have covariates in my model, as these values are different (but I do have one clustering variable)?

Is there a particular cut off for "too large" of a standardized change? I have STDYXs ranging from 0.05 to 0.44 for all my variables and STDs ranging from 0.03 to 1.78. Again, I have ~12000 people in 68 clusters, so quite a large sample size that picks up small changes.

Thanks in advance,

Bengt O. Muthen posted on Thursday, May 02, 2019 - 5:07 pm

Q1: No

Q2: No

Q3: Right

Q4: We need to see the full output to say - send to Support along with your license number.

Q5: That's a subject-matter question, not a statistical one.

Jilian Halladay posted on Friday, May 03, 2019 - 8:29 am

Hi Dr. Muthen,

Thanks so much - I will be in touch soon with model output re: interpretation of STDYX EPC.

In the interim, I am wondering if it is appropriate to look at the variances (homogeneity) within profiles to help determine whether WITH statements should be included in the models?

For example, when I run a model without any WITH statements, the variance for each of my indicator variables is smaller within profiles compared to the variance estimates when I add WITH statements, suggesting that the model without WITH statements is yielding higher profile homogeneity. Would this be appropriate to cite as (one of) the reasons to not include a covariance?

Thanks again for all your help,
Jillian

Bengt O. Muthen posted on Friday, May 03, 2019 - 3:43 pm

I don't think that is a well-recognized approach - I think that consideration should be far down on the list.