Anonymous posted on Wednesday, March 24, 2004 - 11:19 am
I am estimating a SEM using a 50% sample of the total population. The model fits and all path coefficients are statistically significant at p<=0.05. The model fits for both 50% subpopulation (the model fits for the full sample). When I introduce survey weights things get difficult, some path coefficients are insignificant in one of the 2 50% subpopulations while nothing changes in the other 50% subpopulation. The model still fits the full data. What does that mean for my model? Which results should pay attention to, the results for the full model or the results from the 2 50% subpopulations?
It sounds strange that the weighted samples behave so differently. Are these random samples? If the data were generated with unequal probabilities of selection, then weights should be used.
Anonymous posted on Thursday, March 25, 2004 - 6:42 am
Thank you very much for your response. I should have paid more attention in my programming. I missed to include an error covariance in one of my groups. The inclusion of this error covariance makes a world of difference.
Here is another question. There appears to be an open discussion as to wheter or not to include any error covariances to increase model fit. I included error covariances for variables that were highly correlated withing each latent factor. The highly correlated variables had similar word stems or represented a series of questions on the same specific topic. I only included the error covariances after I explored alternative versions of creating the latent factor(s). In other words I expolored whether or not the latent factor(s) that contained the highly correlated varaibles was/were robust. Does this appear to be a valid approach or am I just fitting my model to the data?
Anonymous posted on Tuesday, March 29, 2005 - 7:47 am
I'd like to weight both levels of analysis: individuals and neighbourhoods (racially diverse neighbourhoods have been oversampled). How do I specify that? It seems that WEIGHT option can be used only once.
Tihomir, thanks a lot for this advice, I'll look into it. I have another question though: I'd like to construct a cross-level interaction term, as I'd like to test whether neighbourhood's low status amplifies the negative effect of individual-level deprivation on interpersonal trust. I'll be very grateful for your advice on how to do this (neighbourhood status is a latent level-2 variable).
%within% s | individual-level deprivation on interpersonal trust
%between% s on neighbourhood's status
Anonymous posted on Tuesday, April 26, 2005 - 9:13 am
I just read over the tech 9 report, and we're working with data that have sampling weights and stratification at 2 levels. There are roughly 15,000 primary stratum (PSU) and 2 replicates (rep)within each stratum.
I'm having problem fitting a CFA with ordinal measured variables.
I used the following setup to estimate a CFA with 6 factors. I also used the grouping function to identify the replicates and the cluster and type=complex options. The program works for continous observed variables, but does not produce estimates when I add the cagetorical option. Given that these data are likert type, perhaps the ordinal option is better.
Title: all groups 4/6/05; Data: File is c:\obs_short.dat; variable: names are rep psu var1 var2 var3 var4 var5 var6 wt; usevar= psu rep wt var1 var2 var3 var4 var5 var6; missing= var1 var2 var3 var4 var5 var6 (-999); categorical are var1 var2 var3 var4 var5 var6; grouping is rep (1=repone,2=reptwo); cluster is psu; weight is wt; analysis: type=complex; model: latent by var1*1 var2 var3 var4 var5 var6; latent@1; output: sampstat tech1 standardized;
The error message is:
Cluster ID cannot appear in more than one group. Problem with cluster ID: 10101
I looked at the data and here are the first 6 or so lines:
Iím confused as to why the program wonít run. I removed the categorical option and the program produces results! I also added: estimator=MLR and with the categorical option and the error was that I needed to specify type=mixture.
Following is an explanation of what you are experiencing. In multiple group analysis, the groups should be composed of independent observations. When observations from the same cluster are in more than one group, this is violated. For continuous outcomes using TYPE=COMPLEX;, Mplus takes this lack of independence into account. For categorical outcomes, it does not and therefore stops the analysis. You can trick the program so that it does not complain as described below. In our experience, this violation does not result in large differences in the results.
The trick is to define new cluster values that are unique for each group with the following:
DEFINE: psu = rep*100000 + psu; ! 100000 should be a number with more digits than !the cluster value ! in this case, 100000 is used because original !cluster ID has 5 digits !(10101)
A colleague has asked me an interesting question about a situation that I suspect is rather common when one analyzes weighted data in SEM: If the weighting variable is computed for the whole population of respondents to a survey, yet the analysis model considers only a subset of those respondents (e.g.., sexually active female adolescents rather than sexually inactive and active males and females), what is the recommended course of action? Would one include the original weight variable for those cases in the Mplus analysis as the WEIGHT variable? Or must the weight variable be somehow recomputed?
I think this is dealt with by the new SUBPOPULATION option of the VARIALBE command. Following is an excerpt from the user's guide (see page 403):
The SUBPOPULATION option is used with TYPE=COMPLEX to select observations for an analysis when a subpopulation (domain) is analyzed. When the SUBPOPULATION option is used, all observations are included in the analysis although observations not in the subpopulation are assigned weights of zero (see Korn & Graubard, 1999, pp. 207-211).
I had previously addressed this issue (examining subpopulations) with the Useobservations command and defined my subpopulation this way. I had also excluded participants with missing values for the weight variable. [NELS data]
With the new upgrade to version 4, I am interested in using the subpopulation command to examine the same subpopulation.
weight is f3f1pnwt; cluster is sch_id; stratification is sstratid;
subpopulation = f2race1 ne 4 and ses1band eq 1 and f2evdost eq 0;
However, when I do so, I receive the following error message:
*** ERROR Weight variable has missing value at observation 19. *** WARNING Data set contains unknown or missing values for GROUPING, PATTERN, COHORT and/or CLUSTER variables. Number of cases with unknown or missing values: 759 *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 13510
Any guidance? I have tried defining my subpopulation to those cases without missing values for the weight variable as well, with similar results.
If you are using the same input and data as you used with USEOBSERVATIONS and are just substituting SUBPOPULATION for USEOBSERVATIONS, this should not be happening. If you are using a similar input and different data, you may be reading your data incorrectly. If this does not help, send your input, data, output, and license number to email@example.com.
I am running a multigroup model with TYPE=COMPLEX. I am testing for measurement invariance across group a and group b.
When I run group a only I get 40 degrees of freedom. Logically I also get the same 40 degrees of freedom for group b only (using subpopulation command).
When I test for complete measurement non-invariance using for example:
Model: f1 by x1 x2 x3 x4;
Model b: f1 by x2 x3 x4;
I get 88 degrees of freedom. I was under the impression that the low group chi-square* and df plus the high group chi-square* and df should equal the chisquare* and df on the complete measurement non-invariance. The 88 degrees of freedom should be (from what I calculated) the numebr I get for just factor loading invariance. Is the Type=Complex changing the factor loading in the complete measurement invariance model?
*assuming converting the MLR chi-square to the ML chi-square via the scale correction factor.
I would need to see the three outputs to comment. TYPE=COMPLEX; doesn't change anything. Please send the outputs, data if you don't have TECH1 in the outputs, and your license number to firstname.lastname@example.org.
Iím using the strata, psu and weight options, along with type=complex, to analyze a subpopulation with NELS data. I'm having a problem with preliminary MPlus analyses where clusters are nesting within more than one strata (I think this is what this error message means). I do have both continuous and categorical indicators/variables in the model.
This is the MPlus error message I am receiving:
ERROR Each stratum must contain unique cluster IDs. Clusters are not nested within strata.
My understanding is that this may have happened because the PSUs (clusters) in current data file are not unique; i.e. the same identifier for a cluster (school), such as 29, could be nested into two different strata. I think that MPlus assumes that each cluster can only be contained within one strata, while NELS data might use the same cluster identifier across different strata.
Following the earlier post and helpful response about this issue earlier in this thread, Iíve tried to address this issue by creating a new unique id for cluster (school) as follows (following is SPSS code):
COMPUTE PSURECODED = STRATUM*1000+PSU . EXECUTE .
[Multiplied by 1000 because the largest cluster value is 999]
1. Is this the correct procedure to create unique cluster identifiers? 2. What unintended consequences might this procedure have in the estimation of standard errors?
I have a similar problem to Matthew Diemer (March 15 2006). I am trying to compute a CFA model using the subpopulation command. I am using data from NESARC and I only want to use current or former drinkers (u4). When I use the following options:
missing are all (-9); weight is weight; stratification is stratum; cluster is psu; subpopulation is u4 == 1 or 2; type = complex missing;
I get the following error message:
*** ERROR Weight variable has missing value at observation 259. *** WARNING Data set contains unknown or missing values for GROUPING,PATTERN, COHORT and/or CLUSTER variables. Number of cases with unknown or missing values: 8238
Could you please explain what this error message means and how to amend it.
It sounds like you may be reading your data incorrectly or declaring your missing value flags incorrectly. Please send your input, data, output, and license number to email@example.com.
David Bard posted on Friday, December 08, 2006 - 7:05 am
Regarding the April 26, 2005 Anonymous post & Linda's response on the 28th: Has the procedure for multiple-group SEM with categorical data and complex sampling changed with version 4.0? I noticed that one of the new 4.0 features was improved MG complex sampling analysis for the WLS estimators. Does this mean that dependency across multiple groups with categorical DVs can now be handled using the cluster option, type-complex, & one of the WLS estimators?
The Modified Pfefferman Test of Informativeness is not yet available.
Cecily Na posted on Tuesday, February 15, 2011 - 9:16 am
Dear professors, Does Mplus incorporate sampling weight in SEM? If so, do I just include the weights within the data file (variable wt) and use the syntax weight is wt. And Mplus will automatically incorporate it in SEM? Thanks a lot!
Hi, I'm using the PIRLS Data for an estimation of a two level SEM. L1=students, L2=classes. The data set has two population weights: 1. totwgt inflates the sample to the population and in result underestimates the standard error 2. houwgt is created for population analysis as well, but it doesnít underestimate the standard error and a school weight, which controls for the stratification factors due to sample procedure. My question is: How can I implement the weights using the ďcomplex twolevel randomĒ models? 1. Should I use two cluster variables: IDCLASS and IDSCHOOL (because the students are clustered in classes, that are clustered in schools)? 2. Should I then use weights on L1 and L2? (Iím modeling L2 effects on L1).
You say that students are clustered in classes that are clustered in schools, and it sounds like you want to do 2-level modeling of students within classes, while taking into account clustering in schools. If I am understanding that correctly, you then say
cluster = school class;
where school goes with the Complex part of Type=Complex Twolevel and class goes with the Twolevel part of Type= Complex Twolevel.
Mplus can use weigths for both level 1 (students) and level 2 (classes): Weight and Bweight. There is not a weigth option to go with Complex of Type=Complex Twolevel. You have to decide from the PIRLS data description which choice of weights is the most suitable.
Jinni Su posted on Tuesday, August 21, 2012 - 11:48 am
I am running a multilevel regression analysis with students clustering in 35 schools (all schools in a county). During data collection, census strategy was used in some schools whereas in other schools random sampling strategy was used. There are individual weights available in the data set but in some schools the individual weights are all 1 whereas in other schools the weights vary. In multilevel analysis, is it sufficient/correct to just use weight=weight to take into account the sample weights? or do I need to do something more such as WTSCALE?
Rachel V posted on Tuesday, April 09, 2013 - 2:57 pm
I am messaging because I am doing a path analysis over 5 time points. I have sampling weights (from a publicly available data set) to be used, however, these weights vary over time, and so when I reshaped my data to be wide (to model the same variable at 5 different time points within students), this weight also had to be reshaped. As such, I do not have a single weight to be applied, but a weight that corresponds to the specific time period in question (e.g. wgt1-wgt5). The WEIGHT= samplwgt command seems to only work for a stable weight. Is there an alternative I should be considering?
If you have multiple weights, you must use long format.
Rachel V posted on Tuesday, April 09, 2013 - 6:09 pm
Thank you. This is helpful. I see the description of the long option on page 522. I originally had my data in long format and reshaped it wide in Stata before importing it into Mplus for the sake of path analysis over time. Is there a way to do the path analysis in non-wide format? Or is there a way to reshape the data wide again after specifying the weight? for example, my models currently have the following kinds of statements:
math4 ON skill3 skill4 ON math3
To represent cross-predictors (the numbers at the end of the variable specifying the time of measurement). I don't know how the analysis would be written in long format.
In the long format, you cannot do the model you describe if the 4 and 3 refer to repeated measures. In the long format you have access to only math and skill.
Rachel V posted on Wednesday, April 10, 2013 - 4:30 pm
Thanks Linda. Yes, I understand long and wide. I guess what I am trying to figure out is how I can apply a time-varying sample weight to a path model. Is that possible? It seems like perhaps not given what you are telling me.
Dan Feaster posted on Wednesday, April 10, 2013 - 5:46 pm
If you only need one time period lags, there is no reason you could not use the long format approach. You would need to include lagged variables at each time point. You have 5 time points so you would have 4 long records. Times 5 & 4; times 4 & 3, Times 3 &2 and Times 2 &1; I would call this a stacked autoregressive/cross-lagged model. I would recommend that in addition to the math4 on skill3 that you also include autoregressive terms, i.e. Math 4 on Math 3; You would use the weighting variable associated with the dependent measure at each time ( i.e. 5, 4, 3, & 2). Time one weight should not be needed since you would only condition on time 1.
Rachel V posted on Thursday, April 11, 2013 - 6:24 pm
Thank you. This is helpful.
Liz C posted on Saturday, March 12, 2016 - 7:35 pm
I'm conducting a subpopulation analysis from a study with complex survey data. The dataset has 2 weights one for the Latino subsample and one for the Asian subsample. Since am only conducing analysis with the Latino sample, I only included the latino weight. But I get the following error message "Weight variable has missing value at observation 2." Is this happening because I need to include both weights? how do I fix this?
*** WARNING in VARIABLE command "Clusters with the same IDs have been found in different strata. These clusters are assumed to be different because clusters are not allowed to appear in more than one stratum."
I receive model results despite this warning, but was wondering about validity of the results given the warning. Isn't it common/expected to have multiple PSUs within each stratum? I applied the weight/strat/cluster variables as instructed in the NESARC-III "Data Notes" manual (p.7-8 of link below). Many thanks!
The way clusters are coded in the data set is the reason for the message. Probably your data set looks like this
stata cluster 1 1 1 2 1 3 2 1 2 2 2 3
Mplus would prefer if you code the clusters as
1 1 1 2 1 3 2 11 2 12 2 13
so there is no confusion about what is meant by clusters. The assumption of this estimation is that each stratum consist of clusters/PSU and there is no overlap somehow where one PSU is in two or more stara. Each PSU should be entirely contained in exactly one stratum. You don't need to change anything however since Mplus is treating PSU with code name 1 as two separate PSU's one in stratum 1 and one in stratum 2.