Defining the distribution of dependen...
Message/Author
 Mario posted on Monday, January 11, 2016 - 11:28 pm
We are trying to define distribution of our dependent variables and we know that there are three main commands for that: CATEGORICAL, NOMINAL, and COUNT. However, we are encountering two problems:
1) One of our mediators is nominal but when we define it as NOMINAL the program gives an error message
“*** ERROR in MODEL command
A nominal variable may not appear on the right-hand side of an ON statement: SP”
We know that we must define the distribution of mediators and that mediators are on the right-side of an ON statement, so we don’t understand how ever one can define a Nominal dependent variable.

2) Another dependent variable has a left-skewed distribution but is not a count variable (has many zeros and the numbers are not integers). No transformation helps to make the distribution Normal. In a regular GLM, we would use a Gamma distribution (after changing the “0” to “0.00000001”), but I could not find a way in MPLUS to define a GAMMA distribution for the dependent variable. Could you offer me an alternative solution?

Thanks,
Mario
 Bengt O. Muthen posted on Tuesday, January 12, 2016 - 7:05 pm
1) Nominal mediators is an advanced topic that I cover in my 2011 paper on our website.

2) You can use two-part modeling which is described in our UG examples.
 Mario posted on Wednesday, February 03, 2016 - 1:23 am
Dear Muthen,
coming back to the point 2 of my equation. I will try to figure it out with an example.
The original model I want to run in this one:

USEVARIABLES sand GApdic dens Fbrdn Myc Bart Bartf Mycf RepSucc;
CATEGORICAL GApdic Myc Bart Bartf Mycf;
COUNT Fbrdn;
MISSING ARE *;

ANALYSIS: TYPE = random;
ALGORITHM = INTEGRATION;
ESTIMATOR = MLR;

MODEL: dens GApdic Mycf Bartf RepSucc on sand;
Myc Bart on dens GApdic MycF RepSucc Bartf;

Following two-part modeling example from 16.6 I tried the following

DATA TWOPART:
NAMES = RepSucc;
BINARY = bin1;
CONTINUOUS = cont1;

USEVARIABLES sand GApdic dens Fbrdn Myc Bart Bartf Mycf bin1 cont1;

CATEGORICAL GApdic Myc Bart Bartf Mycf bin1;
COUNT Fbrdn;
MISSING ARE *;

ANALYSIS: TYPE = random;
ALGORITHM = INTEGRATION;
ESTIMATOR = MLR;

MODEL: dens GApdic Mycf Bartf bin1 cont1 on sand;
Myc Bart on dens GApdic MycF bbin1 cont1 Bartf;

However, I receive the following message
*** ERROR
Categorical variable BIN1 contains less than 2 categories.
 Mario posted on Wednesday, February 03, 2016 - 3:16 am
Solve it!

I did not define a correct CUTPOINT.

Thanks
 Mario posted on Wednesday, February 03, 2016 - 6:26 am
Dear Dr. Muthen,

However, the output show the statistics (Est., s.e., P-value,…) of the continuous (bin1) and binary (count1) variables created from our GAMMA distributed variable (named Reproductive success).
In order to include the results in a paper, we wonder if it is possible to combine the values obtained for bin1 and count1 to obtain Est., s.e. of the original “Reproductive success” variable.

Thanks,
Mario
 Bengt O. Muthen posted on Thursday, February 04, 2016 - 6:52 pm
I would not try to combine the effects - the nice part if two-part is that you get a richer answer, one for each part.
 Mario posted on Friday, February 05, 2016 - 2:11 am
Dear Dr. Muthen,
thanks a lot for our answer. However, despite is atatistically interesting to show the answer from the two variables, we need to give a biological answer using the original variable. So, sorry to insist, is there any way to combine the values of the 2 new created variables to get values for the original one?

Thanks a lot
 Bengt O. Muthen posted on Friday, February 05, 2016 - 5:48 pm
If you used a Gamma distribution maybe you would have a single answer, but two-part modeling is different. You can use BIC to tell how much better two-part fits than Gamma. I think it is the nature of the two-part model that you don't get one answer, but a more detailed 2-part answer. I can see your dilemma if a Gamma distribution and its regression coefficient is the standard approach in your field but I can't think of a way to come up with a single combined answer from the two-part results. If instead you used a censored-normal model you would get a single answer, but a censored-inflated model might fit better and then again your have two answers.
 Bengt O. Muthen posted on Saturday, February 20, 2016 - 1:24 pm
The expected value of the two-part outcome has an expected value that involves both parts of the model.
 John C posted on Saturday, February 15, 2020 - 6:53 pm
Hello,

I have a similar question to the above. In my case, the dependent variable is a count variable in the context of GMM with 3 latent classes. This variable is a month-to-month count within a year, ranging from zero to 12, so that there are 13 categories, and is highly left-skewed.

Can I do two-part modeling with such a variable. My understanding of the syntax is that I would have to specify the category of interest, in this case category #13. Can this be done?
 Bengt O. Muthen posted on Sunday, February 16, 2020 - 11:36 am
So count=12 has the highest frequency? If so, a count model would not fit unless you work with 2 classes. Treating the counts as interval scaled, you could use two-part. If easier to handle, you can turn the scale around so that 12 becomes zero and 0 becomes 12.
 John C posted on Monday, February 17, 2020 - 9:06 pm
Yes, thanks, count 12 is the highest, followed by count zero. Because of this I thought it might be better to preserve the order after recoding the last category, so that the recoding would be as follows: 12->0, 0->1, 1->2, ...,11->12.
Would this be ok for a two-part model, even though the binary and continuous parts were not in the same direction?
Or would it still be advisable to just reverse code all the categories?
 Bengt O. Muthen posted on Tuesday, February 18, 2020 - 5:16 pm
Hmm; I think your scoring would be ok too. I would do both and check.