I am running a confirmatory analysis on several outcomes that are best modeled in semi-continuous fashion (i.e. there is a large proprotion of each observed variable that is at the censored value). Of the original four observed variables, reviewers have commented that they have concerns about two, to the extent that they recommend dropping them. All four variables required coding in semi-continuous fashion meaning that there were originally eight recoded variables loading on the factor. If I drop the two observed variables as suggested by the reviewers this will leave two variables which are recoded, in semicontinuous fashion, into four observed variables. This causes me some concern because I am not clear to what extent I am really modeling a doublet (i.e. a factor with only two observed variables) and to what extent I am actually modeling a a factor with four observed variables. Could you comment?
IF you have turned 8 observed variables into four by summing pairs, you have four factor indicators as far as model identification is concerned. If you drop two, you have two factor indicators which in my option is a very weak model because it is not identified on its own. What kind of variables were the original 8? And why did you sum them?
I'm not sure I understand your question. There were originally four observed variables. Each of these were distributed such that > 25% of the values were at zero (the lower censoring point). We conducted a semi-continuous analysis by modeling for each variable: a) the log-odds that it was zero/non-zero, and b) conditional on being non-zero, the value of the log of the continuous outcome. This essentially turned each observed variable into two variables: a binary outcome and a continuous outcome. This meant that, according to the MPlus specification, we had eight manifest variables loading on the factor. The reviewers recommended dropping two of the original four observed measures. Ordinarily the model would not be identified with only the two remaining observed measures. However, because the situation in which >25% of each of these variables' distributions were censored from below the variables were again treated in semi-continuous fashion (i.e. two binary outcomes and two continuous outcomes). Our concern is that this model may not be identified because the four variables we have created are not really four distinct variables.
I don't think you should have used the eight variables on one factor. I think you should have had one factor for each part of the original variable. So if you drop two of the original variables, you would be left with two factors with two indicators which is a weak model given that identification is not possible without borrowing from other parts of the model.
I want to create “problem alcohol use” factor scores from 3 indicators: frequency (alfreq; 13% 0s, 0-24, a sum of 4 vars ranging from 0=never to 7=every day), dependence symptoms (aldep; count 1-7, 73% 0s), and social consequences (alsoc; count 1-7, 71% 0s). I'm also creating factor scores for drugs. I will use the scores in later analyses. Model: alc by alfreq* aldep* alsoc*; alc@1; I have tried treating the indicators different ways: 1.continuous: LL= -951, BIC=1948 2.alsoc and aldep as censored: LL= -774, BIC=1594 censored are alsoc (b) aldep (b); 3.all 3 vars treated as censored: LL= -748, BIC=1542 4.alsoc and aldep treated as NON-inflated count vars, alfreq censored: LL= -728, BIC=1491 count are alsoc aldep; censored is alfreq (b); 5.alsoc and aldep as inflated count vars, alfreq as censored: LL= -726, BIC=1498, out of range mean for alsoc#1 count are alsoc (i) aldep (i); censored is alfreq (b) ; It seems #4 is the best way to create the factor scores (#4 also gave lowest LL & BIC for the drug factor scores). My questions are: 1.Is it OK to treat alsoc and aldep as count variables WITHOUT inflation? 2.Are the factor scores interpreted as continuous? A histogram looks abnormal, but skewness and kurtosis are normal. 3.Is it OK to treat alfreq as a censored variable?
Thanks for your help! Unfortunately, the negative binomial model gives error messages: : THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.811D-17. PROBLEM INVOLVING PARAMETER 1.
ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETERON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 4
Paremeter 1 is the observed mean for aldep (nu matrix). Parameter 4 is the loading for aldep. The results (I realize they are not trustworthy) indicate the dispersion parameter for alsoc is non-sig (p= .923).
Negative binomial works for the drug model when treating frequency as continuous (it has 65% 0s). The dispersion parameters are non-significant. If the drugfreq is treated as censored, I get several error messages.
2 Follow-up questions: Are there ways I can get the negative binomial model to run for the alcohol model?
For the drug model, is it OK to treat the frequency variable as continuous, despite the preponderance of zeros? Or should I treat it as censored (it is not a count variable).