Mplus Discussion >> Number of observation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Number of observation

Mplus Discussion > Missing Data Modeling >

Message/Author

yawen posted on Thursday, November 12, 2009 - 5:04 am

Dear Dr Muthen,

I am doing a basic latent class analysis of cross-sectional data. Sample size is 2000. Some independent variables have missing values, but none of the indicators of latent class is missing.

In the section of variable, I typed: "Missing are all (-9999); and in the analysis section, I typed "type=mixture." I run a couple models. Each turns out has different number of observations since I use different sets of independent variable. The user guidebook said: The default is to estimate the model under missing data theory using all available data. I have the following questions.

1)Were all available data, including variables with missing values, utilized if I put command in the above-mentioned way? I would like to have all of the available information used.
2)My models have different number of observations, all of which are smaller than my sample size. Should I report both number of observations and sample size in my tables?

3)The reason I run different models is to see if the coefficient of a specific variable is stable. But is it appropriate to report models with different numbers of observation (the same data set) in one paper? Or should I drop all of the cases with missing variables to fix the number of observation?

Thank you in advance.

yawen

Linda K. Muthen posted on Thursday, November 12, 2009 - 9:06 am

You can report that results for different sample sizes or you can bring the covariates into the model and treat them as dependent variables. This is done by mentioning their variances in the MODEL command. This is similar to multiple imputation.

Oliver Arranz-Becker posted on Monday, February 22, 2010 - 6:24 am

Dear Dres Muthén,

I have a question about proper treatment of missingness due to panel attrition. I have a 2-variable-2-wave cross-lagged model with repeated measures for X and Y, i.e. X1 and Y1 (at t1) are associated with both X2 and Y2 (at t2). Some missingness at t2 is due to panel attrition. My questions:

1. Is it sensible to use a "FIML" approach with panel data?
2. What N am I supposed to report? Giving the full sample size reported by Mplus looks weird because it would suggest that there is no panel attrition.

I would appreciate any suggestions.
Thank you in advance,
Oliver Arránz Becker

Linda K. Muthen posted on Monday, February 22, 2010 - 8:40 am

1. Yes.
2. Report the full sample size along with the fact that you are using FIML. Also give the coverage.

Oliver Arranz-Becker posted on Wednesday, February 24, 2010 - 2:11 am

Thanks a lot for your advice to the last questions.
In the same paper, we also present a discrete time event history analysis (with a dichotomous event indicator as DV and several covariates). On the basis of other postings on this board, we think there are two options to avoid listwise deletion of missings: 1. probit model (WLS) including predictors by mentioning their variances, or 2. estimate the model via ML with Monte Carlo integration. My questions:

1. Which of the two methods would you recommend (if you know of any references, that would be great)?
2. Is the second approach essentially a FIML estimation? Or how are we supposed to describe the method in the paper?
3. Can you provide any reference as to why Monte Carlo integration is necessary in this case?

Thank you so much for your help,
best regards,
O. Arránz Becker

Linda K. Muthen posted on Wednesday, February 24, 2010 - 10:17 am

You must use maximum likelihood to estimate a discrete-time survival model. This means that the missing data estimation is done using FIML. If you don't want observations with missing values on observed exogenous variables removed from the analysis, you can mention the variances in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Monte Carlo integration is required when there are missing data on mediators.

Oliver Arranz-Becker posted on Thursday, February 25, 2010 - 4:59 am

I guess I misstated my previous question. We do not use a latent variable framework for the discrete-time event history analysis but rather have just one event indicator (i.e., there are only 2 timepoints, covariates are measured at t1, event at t2). So the analysis is essentially a path analysis with a dichotomous outcome.

What is the difference between an FIML and a WLS estimator in this case, and which one is preferable? How are missings treated in WLS if exogenous variables are mentioned in the model command?

Thank you so much for your support,
best regards,
O. Arránz Becker

Linda K. Muthen posted on Thursday, February 25, 2010 - 10:14 am

With weighted least squares, missing data are handled using pairwise present information. Given the choice, I would use FIML.

Melanie Wall posted on Monday, August 30, 2010 - 6:27 am

There appears to be a difference in the way that Mplus 6 reads in "Number of observations" compared to Mplus 5 with regard to missing values. I have found that in Mplus 6 my "Number of observations" was reduced to only those individuals who had complete "x-variables" while in Mplus 5, it gives the total n in the dataset regardless of what actual model is being fit. It appears that whereas before these observations with missing x-variables would be included in the "Number of missing data patterns", now those individuals simply are not included at all in analyses.

Hence for a basic SEM, I am getting different results whether I use Mplus 5 or Mplus 6 presumably because of this difference in the way missing data are handled. Is there a way in Mplus 6 to have individuals with missing x-variables included in FIML without resorting to multiple imputation?

Linda K. Muthen posted on Monday, August 30, 2010 - 6:37 am

If you specify the variances of the covariates in the MODEL command, you will obtain the Version 5 results. In Version 6, with all continuous outcomes and maximum likelihood estimation, we started estimating the model conditioned on the covariates to be in line with the rest of Mplus and regular regression. It is the case that with all continuous outcomes, maximum likelihood, and no missing data estimating the model conditioned on x or jointly obtains the same results.

Melanie Wall posted on Monday, August 30, 2010 - 6:58 am

Hi Linda

So I did as you suggested and "specified the variances of the covariates in the MODEL command" simply by adding another line listing them with a semicolon, e.g.
x1 x2 x3;

I see this change in the default from Mplus 5 to Mplus 6 as a substantial change in the handling of missing data. Basically, if I understand correctly, now by default missings on x-variables are handled with listwise deletion, while missings on y-varaibles are handled with FIML. Previously all variables were handled with FIML. Perhaps I missed it, but it might be useful to make this change more prominent on your UPDATES in Mplus 6 page.

Linda K. Muthen posted on Monday, August 30, 2010 - 7:49 am

The handling of missing data has not changed. Missing data theory applies to dependent variables only. It has never applied to independent variables. Prior to Version 6 for models with all continuous variables and maximum likelihood estimation, all variables were treated as dependent variables for the reason stated above.

JEP posted on Wednesday, October 24, 2012 - 5:41 pm

I have a dataset from a survey I conducted with missing data (N=610). I ran a CFA with 12 binary variables and the WLSMV estimator.

f1 by y1 y2 y3;
f2 by y4 y5 y6 y7 y8 y9;
f3 by y10 y11 y12;

My resulting N for the CFA was 572.

I followed up my CFA with the following:

f1 by y1 y2 y3;
f2 by y4 y5 y6 y7 y8 y9;
f3 by y10 y11 y12;

f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9;

which results in an N=418 because there are 192 cases with missing on x-variables. However, I receive an error that says:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL.

When I re-run it like this (per advice I received):

f1 by y1 y2 y3;
f2 by y4 y5 y6 y7 y8 y9;
f3 by y10 y11 y12;

f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9;

x1 x2 x3 x4 x5 x6 x7 x8 x9;

my N=610.

This model runs successfully and has good model fit. However, I'm not sure I can justify presenting a CFA with 572 cases followed by a structural model with 610 cases - or can I?

Is the latter method appropriate/ justifiable? I am new to SEM and Mplus so any thoughts on this matter would be greatly appreciated.

Thank you.

Linda K. Muthen posted on Thursday, October 25, 2012 - 9:52 am

I would use 572 for both analyses. This would eliminate the observations with missing on the y's and having data for only the x's.

JEP posted on Thursday, October 25, 2012 - 10:22 am

Hi Dr. Muthen,

That would be fantastic. I'm not at a computer with Mplus right now but would the syntax be:

f1 by y1 y2 y3;
f2 by y4 y5 y6 y7 y8 y9;
f3 by y10 y11 y12;

f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9;

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12;

Thank you.

Linda K. Muthen posted on Thursday, October 25, 2012 - 2:12 pm

You would need to use the USEOBSERVATIONS option to exclude observations with missing on all of the y's and then use the following:

f1 by y1 y2 y3;
f2 by y4 y5 y6 y7 y8 y9;
f3 by y10 y11 y12;

f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9;
f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9;

x1 x2 x3 x4 x5 x6 x7 x8 x9;

John Perry posted on Friday, April 11, 2014 - 4:06 am

Dear Drs Muthen,

hopefully I am missing something obvious here... I have a sample with 342 participants and no missing data. However, in a simple ML SEM, I am getting 171 observations in the output (exactly half). I'm really not sure what I'm doing wrong here because I've not come across this before.

John

Bengt O. Muthen posted on Friday, April 11, 2014 - 6:12 am

Probably some data reading issue. Please send data, input, output, and license number to Support.