Asta B. posted on Tuesday, August 02, 2011 - 6:07 am
I just started working with Mplus (great program, actually) and want to compare results of cluster analysis in SPSS with more or less restrictive LPA.
I got no problem performing the exploratory kind but the confirmatory variations are difficult for me as a novice using Mplus. I managed to "reproduce" the response patterns but now I'd like to fix class proportions - just to see how that works.
Please correct me if I did it the wrong way.
I translated the 5 class proportions I found in the CA (42.1%, 11.4%, 2.2%, 1.5%, 42.9%) into thresholds (-0.319, -2.050, -3.795, -4.185) by using theta = -ln((1/p)-1). In the output I found my thresholds again as "means of categorical latent variables" but under the heading "final class counts and proportions for the latent classes based on the estimated model" (38.4%, 6.8%, 1.2%, 0.8%, 52.8%) and "...estimated posterior probabilities" (38.2%, 7.5%, 0.2%, 1.7%, 52.5%) the values are not like my original proportions. Thought they should be. What’s wrong: my analysis or my interpretation?
I think I read somewhere at this board about a similar problem but I couldn't transfer the answer to my case, so I would appreciate your help.
Btw: I am talking about alcohol use patterns and my cluster 5 (42.9%) are the abstainers.
If you have the class proportions, the correct formula is:
logit = log (pclassj/pclassJ)
where classJ is the last class.
Asta B. posted on Wednesday, August 03, 2011 - 2:03 am
Ah, okay, thank you! I tried that and the analyses ran well.
But I still got some difficulties interpreting the results. Would you mind explain that to me also?
The order of magnitude of the final class proportions is like the one in my CA, but I don't know why the values are different. If that is a consequence of estimation why should I fix the values in the first place?
It seems that the order in which I specify my (in)equality constraints for the classes affects the model fit. For instance, using a model with 2 continous indicator variables X1 and X2 and 3 classes, I find BIC=994.60 if I specify the constraints as [X1>X2], [X1=X2], and [X1<X2] for class 1, 2 and 3 respectively, whereas I find BIC=985.09 if I swap around the constraints for class 2 and 3.
I thought that the order of constraints should not matter? Would you have any idea why this could happen? Thanks in advance!
Thank you for thinking along! The weird thing is, I did specify starting values for each indicator variable and within each class (which I also swapped around between class 2 and 3 for the two analyses decribed above) and still I obtained the different BICs. Would you know an explanation? Thanks again!
Entropy was sufficient (>.80). However, when I export the most likely class membership, I find patterns in the data that are in direct contrast with the constraints. For instance, class 1 will contain members with V1 <v2,> V2.
I wonder how this can happen and whether the class solution is trustworthy? Thanks in advance!
The model constraint applies to the means of the variables and not to the variables themselves. Since observations are allowed to vary around the mean the inequality would not necessarily hold for the variables. To give you an example: p1=100, p2=99, p3=1,p 4=0. If you have V1=99 and V2=100 you would want that classified in class 1. If you want some kind of deterministic method to go into the classification, look up the training option in the user's guide.
Thanks so much! Could you maybe help me one bit more? I tried this training option but keep getting an error:
*** ERROR in VARIABLE command Unknown variable in TRAINING option: REACPURE1 *** ERROR in VARIABLE command Unknown variable in TRAINING option: REACPURE2
Now it is probably something very silly but I fail to find my mistake. The .dat file is correct: it does run without the TRAINING option. The training variables I created are dummy-coded 1 and 0, with each case having a 1 on only one dummy variable. This is the syntax:
VARIABLE: NAMES = ID IDclass IRPApro IRPAre Reacpure1 Reacpure2; USEVARIABLES = IRPApro IRPAre; MISSING = ALL (999); CLASSES = c(3); TRAINING = Reacpure1 Reacpure2; CLUSTER = IDclass; ANALYSIS: TYPE = MIXTURE COMPLEX;
Would you happen to see my mistake? Thanks in advance!