I am trying to figure out what is the correct way to sort data for a crossclassified DSEM with continuous time?
clustering variables are: (1) subject ID (subject) (2) time bins we'll call TID. Each day of the week (Monday - Sunday) is broken up into 6 chunks (e.g., Monday 8a - noon; Monday noon - 4pm, etc). This is because effects may vary by day of week and time of day. I do not have any systematic trends across weeks.
The time measure for TINTERVAL is exact hours since start of the study for each person an observation was taken. We'll call this "time".
I know Mplus wants the data sorted by ID and by TINTERVAL variable, "time". However, if I sort by: (1) subject, time, TID - TID will be out of order and Mplus will complain (2) subject, TID, time - time will be out of order and Mplus will complain
A temporary solution is to make TID based off the actual time, but this is not optimal as participants start different days of the week and at different times, so that 6 hours after study start might be Wednesday night for one person and Monday afternoon for another. I wan the random effects by time to account for potential week / time of day effects, not study time effects.
I don't think Mplus requires you to sort the data in any way. You do have to provide the ID and Time variable (real time). I see your concern about the different starting points but you don't need to worry about it because Mplus will treat the Wednesday observation as if it is Wednesday even if it is the first observation (as long as you are using type=crossclassified). The best way to see that Mplus interprets the data the way you intend is to use savedata: file=1.dat; There you will find the actual data Mplus analyzed which would include missing values for Monday and Tuesday observations.
I’m interested in implementing a two-level DSEM on data from an EMA study. I’m not directly interested in the effect of time, which is why I have selected the two-level model rather than the cross-classified model. My challenge is that the predictor and outcome are measured at different intervals. The predictor is categorical (but has 16 levels, so conceivably could be treated as continuous), and was measured once per day during an “evening assessment”. The outcome is a 5-level categorical variable that was measured at 5 intervals throughout each day (intervals were random but within a specific time range so are approximately evenly spaced). Participants completed 10 to 14 days of assessments, so there are about 12 responses on the predictor and around 60 responses on the outcome. Would it be feasible to use DSEM in this situation? If so, would I just use the TINTERVAL approach, allowing the approximately 48 (60-12) missing values on the predictor to be treated as missing?
Lastly, there are only about 50 participants in the study. If I am able to use all of the outcome observations, there are a substantial number of observations in total, but I still wonder if you think this is a sufficient sample for a DSEM?
The sample size should be sufficient. One simple approach is to assume that the covariate is constant over the entire day and so you can use the same value of the covariate for the five observations of the dependent variable. I don't see the need for the TINTERVAL command if the observations are approximately evenly spaced (all you would need to do is make sure the observations are sequentially ordered in the data set. You can also insert missing values for those covariates but I would recommend modeling the covariate as a lag variable as well in that case and using X on X&1 in the model for the covariate. Also, you might consider using RDSEM instead of DSEM (^ instead of &) in your situation as RDSEM is somewhat less dependent on the time stamp, see section 6 http://www.statmodel.com/download/RDSEM.pdf although I am not quite sure if this point of view applies to your situation.
Thanks, Tihomir - This is very helpful! Based on your feedback, I think I'll go with the RDSEM. And apologies ahead of time for this lengthy follow up. Below is an excerpt of my syntax, which I'm fairly sure is misspecified because I get a non-convergence warning. DB0 is the categorical dependent variable, and SM is the continuous predictor.
VARIABLE: NAMES = ID_num DB0 DB1 DB2r DBscale DR0 SM DA; USEVARIABLES = DB0 SM; CATEGORICAL = DB0; MISSING=.; CLUSTER = ID_num; LAGGED = DB0(1); ANALYSIS: TYPE = TWOLEVEL RANDOM; ESTIMATOR = BAYES; PROCESSORS = 2; BITERATIONS = (2000); MODEL: %WITHIN% sDB0 | DB0^ ON DB0^1; sSM | DB0 ON SM; %BETWEEN% DB0 ON SM; sDB0 ON SM; sSM ON SM; DB0 sDB0 sSM WITH DB0 sDB0 sSM;
In the output, I also get the warnings: *** WARNING in MODEL command In the MODEL command, the x variable on the WITHIN level has been turned into a y variable to enable latent variable decomposition. This variable will be treated as a y-variable on both levels: SM *** WARNING Data set contains cases with missing on x-variables converted to dependent variables. The autocorrelation of these variables should be included in the model and the variables added to the LAGGED option.
I was attempting to do latent centering, because I have missingness on the predictor, but I'm unsure: 1) Where the model is misspecified leading to the non convergence; 2) Whether I'm specifying the latent centering correctly, given the warnings (I thought it would be something like SM^ ON SM^1 based on the example in the Version 8.1 Language Addendum (p. 3), but I might have misunderstood); 3) How to substantively interpret the output given the flipping of x and y variables (in particular the DB0 on SM output and the thresholds) noted in the first warning. Relatedly, I'm assuming the output is based on a probit function given the categorical dependent variable - is that correct?
Finally, once I get this up and running, what is the thinking on assessing comparative model fit? Do you have suggestions for (perhaps simpler) models to compare this one too, using the DIC?