1. I am running some multi-group invariance tests in which I expect to see differences in the factor means between two groups. Some indicators manifest differences in their means but others do not. It seems to me that either, (1) the latent means are actually the same and the indicators with the variant means are potentially biased, or (2) the latent means are actually different and the indicators with the invariant means are potentially biased. If I trust my theory, is it reasonable to favor (2)? Is there further testing that I can do to settle the matter?
2. More generally, some in the literature on invariance (e.g., Steenkamp and Baumgartner) describe the intercept invariance test as one in which one tests whether the differences in indicator means between two groups are proportional to the difference in the latent mean. Is that what a model test of invariant item means actually does? I had thought that a test of invariant item means simply considered the means for two groups on an indicator (or set of indicators) regardless of whether the means are proportional to the difference in latent means. Is there, perhaps, some combination of constraints that tests this idea of the "proportionality" of differences in indicator means to differences in latent means.
If you are interested in whether factor indicator means vary across groups as a function of factors, the way to assess this is to hold the intercepts of the factor indicators equal across groups and assess the factor mean differences. If the factor means differ, this implies the factor indicators means differ. Some factor indicators may not follow this rule and this can be detected by modification indices.
My principal goal is to evaluate invariance in the indicator intercepts in a model in which the factor means are expected to differ. It seems to me that the test of such invariance SHOULD be a test of consistency in the pattern of mean to mean comparisons across items. For example, if ALL intercepts in one group were half of their counterparts in another group, that would suggest to me differences in the latent mean but NOT measurement bias and, therefore, it would be appropriate constrain the the intercepts to be equal across groups and estimate the difference in the latent mean. However, if only one or two indicator means were half of their counterparts and the other indicator means were equivalent, that would suggest scalar nonequivalence to me in at least one of those sets of indicators (though, since the latent means could differ, it still would not be clear WHICH set manifested bias).
But the basic question is that I don't know how one would set up a model comparison to test for such consistency in mean differences across items OR, whether that sort of test makes sense.