Mplus Discussion >> Defining a new variable

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Defining a new variable

Mplus Discussion > Structural Equation Modeling >

Message/Author

Susan Scott posted on Wednesday, November 22, 2006 - 10:39 am

I have the feeling I may be missing something very obvious, but when I add the following statement to my model

DEFINE:
sf36 = (pcs_ob + mcs_ob)/2;

with sf36 used in the following model statement:

MODEL:
HRQL BY eqsum eq5dw10 hu3m eqtherm sf36;

I get the error message:

** FATAL ERROR

THE SAMPLE COVARIANCE MATRIX COULD NOT BE INVERTED. THIS CAN OCCUR IF A
VARIABLE HAS NO VARIATION, OR IF TWO VARIABLES ARE PERFECTLY CORRELATED, OR
IF THE NUMBER OF OBSERVATIONS IS NOT GREATER THAN THE NUMBER OF VARIABLES.
CHECK YOUR DATA. THIS PROBLEM IS DUE TO:

VARIABLE : SF36

I am able to run the model with each of these variables individually, ie. pcs_ob and mcs_ob instead of sf_36. Also, I tried defining sf36 as sf36=pcs_ob, and I still get this message. So I am guessing the problem is with my definition statement.

Thank you,
Susan

Linda K. Muthen posted on Wednesday, November 22, 2006 - 10:52 am

Please send your input, data, output, and license number to support@statmodel.com. It is not obvious what the problem is without further information.

David Bard posted on Wednesday, November 22, 2006 - 9:44 pm

It's not obvious to me why, but adding a define command to my syntax resulted in loss of observations for a model that did not even contain the variables used in the define command. The define syntax was used for a previous model and was simply left in this new model after a cut and paste. It's not a big deal, I will simply exclude define commands not needed for a given model, but can you explain why this happens? tx, db.

If it helps, here's the define syntax:
DEFINE:
citpreg = citynum*pregnant;
twkhighc = twkhigh;
if twkhigh > 5 then twkhighc = 5;
twkamhdc = twkamhd2;
if twkamhd2 > 5 then twkamhdc = 5;
formrbc = formrb;

Linda K. Muthen posted on Thursday, November 23, 2006 - 6:35 am

I would need more information to answer this question. Please send your input, data, output, and license number to support@statmodel.com.

Roxana Dragan posted on Monday, June 08, 2009 - 9:13 am

I want to add a random variable to an existing variable. I do not plan to do this repeatedly like in a Monte Carlo study. I just need to use a command like DEFINE y=x+e, where:
y - new variable
x - old variable
e - random variable from the normal distribution with zero mean and a standard error I can choose. Can/(how do) I do this in Mplus? Thank you.

Bengt O. Muthen posted on Monday, June 08, 2009 - 11:33 am

If e is latent, you don't need Define but can do it in the Model command as:

e BY;

If e is observed, Mplus does not currently offer this capability.

PETER TOYINBO posted on Saturday, July 17, 2010 - 11:24 am

I wish to create MD_A from G12A and G15A where G12A tests presence of a condition (yes=1, no=5) while G15A scores severity scale' at 3 levels if checked 'yes' on G12A but assigns 'missing' if checked 'no'.

I wish to combine G12A and G15A in a new 4-level scale MD_A where 0 indicates absence of condition. Below is the partial syntax and error I am getting.

USEVARIABLES ARE

G18A G18B G18C G18D G18E G18F G18G G18H G18I G18J
MD_A MD_B MD_C MD_D MD_E MD_F MD_G MD_H MD_I ;

CATEGORICAL ARE

G18A G18B G18C G18D G18E G18F G18G G18H G18I G18J
MD_A MD_B MD_C MD_D MD_E MD_F MD_G MD_H MD_I ;

MISSING=. ;

DEFINE:

IF (G12A == 1) THEN MD_A == 0 ;
IF (G12A == 5 AND G15A == 1) THEN MD_A == 1 ;
IF (G12A == 5 AND G15A == 2) THEN MD_A == 2 ;
IF (G12A == 5 AND G15A >= 3) THEN MD_A == 3 ;

IF (G12B == 1) THEN MD_B == 0 ;
IF (G12B == 5 AND G15B == 1) THEN MD_B == 1 ;
IF (G12B == 5 AND G15B == 2) THEN MD_B == 2 ;
IF (G12B == 5 AND G15B >= 3) THEN MD_B == 3 ;

....

ANALYSIS: TYPE = EFA 1 4;

PLOT: TYPE IS PLOT3 ;

*** ERROR
Variable names must begin with an alphabet character:
EQ

Linda K. Muthen posted on Saturday, July 17, 2010 - 11:31 am

See the DEFINE option of the user's guide. You can't use == on both the right and left-hand sides of THEN. On the right-hand side use =.

PETER TOYINBO posted on Saturday, July 17, 2010 - 12:19 pm

I spotted an error in my syntax above and corrected it to now read:

IF (G12A == 5) THEN MD_A == 0 ;
IF (G12A == 1 AND G15A == 1) THEN MD_A == 1 ;
IF (G12A == 1 AND G15A == 2) THEN MD_A == 2 ;
IF (G12A == 1 AND G15A >= 3) THEN MD_A == 3 ;

But I am still getting the same error (below) about variable names which I could not figure out :

*** ERROR
Variable names must begin with an alphabet character:
EQ

Thanks for your help.

Linda K. Muthen posted on Saturday, July 17, 2010 - 12:34 pm

On the right-hand side use = not ==.

Spencer James posted on Tuesday, November 16, 2010 - 11:03 am

Why does Mplus use a degree of freedom to compute a new variable? How do I ensure that the added degree(s) of freedom does not influence my measures of model fit? Thanks for your help.

Linda K. Muthen posted on Tuesday, November 16, 2010 - 2:11 pm

I don't understand your question. Please send outputs that illustrate what you are saying along with your license number to support@statmodel.com.

Kerry Lee posted on Monday, December 06, 2010 - 3:15 am

Dear Dr. Muthen,

Is it necessary to list both the original variables and new variables created using DEFINE in USEVARIABLES? I thought this was the case after reading the User's Guide. However, when I did this, both original and new were included in a CFA model even though only the new variables were specified in MODEL.

On a related matter, I ran the same analysis using either the original and new variables (original/x to bring the scale back to 1 - 10). The raw bivariate correlations naturally remain the same, but the standardized CFA factor loadings and correlations are different.
Are such differences to be expected?

The difference in time needed to run the two analyses was astonishing: 37 min versus 56 sec (scaled).

Sincerely,
Kerry.

Linda K. Muthen posted on Monday, December 06, 2010 - 6:34 am

Every variable specified on the USEVARIABLES list is included in the model to be estimated. New variables created using DEFINE must be placed on the USEVARIABLES list if they are used in the MODEL command. If any original variables are used in the MODEL command, the new variables created in DEFINE must follow them on the USEVARIABLES list.

Large variances make model convergence more difficult so this could increase the time. I would have to see the two outputs and your license number at support@statmodel.com to comment on the standardized coefficients.

Luo Wenshu posted on Thursday, February 19, 2015 - 2:53 am

Dear Dr. Muthen,

I am using Mplus 7.3 doing twolevel analysis. I created cluster means for some observed variables and want to use these cluster means at level 2. I then listed these cluster means under original variables on the Usevariables list. Then running analysis led to the error that the number of record is 0.
Currently I put Define comment down following the usevariables list. How should I position usevariables and define command?

Linda K. Muthen posted on Thursday, February 19, 2015 - 5:40 am

DEFINE should precede or follow another command. It should not be placed among the options of another command.

Jane Doe posted on Monday, March 02, 2015 - 7:49 am

I know how to include an interaction of two latent variables in my analysis. But how about a linear combination of two latent variables.

Say, I want to define a latent variable (call it f3) which is a linear combination of two other latent variables (e.g. f3=f1+f2) and then use this f3 as (for example):

x ON z f3;

where x and z are observed variables.

How can I do this?

Bengt O. Muthen posted on Monday, March 02, 2015 - 11:01 am

Try

f3 BY;

f3 ON f1@1 f2@1; f3@0; ! this is f3=f1+f2
f3 with f1-f2@0;

x ON z f3;

Jane Doe posted on Monday, March 02, 2015 - 1:19 pm

Thank you. It worked.
In the meanwhile I also tried:

x On z
f1 (a1)
f2 (a2);

MODEL CONSTRAINT a1=a2;

This gave me identical results.

Are these doing the same thing basically?

Thanks a lot.

Bengt O. Muthen posted on Monday, March 02, 2015 - 4:25 pm

Yes.

Jane Doe posted on Thursday, March 12, 2015 - 11:11 am

Is it possible to use her absolute value of a latent variable. Say I want to estimate the following model:

f1 BY x1 x2 x3;
z ON f1 x4;

But in the regression of z on f1 and x4 I want to use the obsolete value of f1. Is this possible?

Thanks.

Jane Doe posted on Thursday, March 12, 2015 - 11:13 am

The question above is full of typos! Apologies!

Is it possible to use the absolute value of a latent variable. Say I want to estimate the following model:

f1 BY x1 x2 x3;
z ON f1 x4;

But in the regression of z on f1 and x4 I want to use the absolute value of f1. Is this possible?

Thanks.

Linda K. Muthen posted on Thursday, March 12, 2015 - 11:14 am

No, this is not possible.

Jane Doe posted on Thursday, March 12, 2015 - 11:35 am

Ok.

Let me elaborate the question a bit. How about I have two latent variables f1 and f2. I generate a third latent variable which is the difference between these two latent variables: f3=f1-f2. (With your help I can now do this.) And I use f3 in my model further on as: x ON z f3;

But what I am interested in is the absolute difference between f1 and f2. Hence the previous question: can I use the absolute value of f3?

If the answer is still no. Then is it reasonable to save the factor scores for f3, take their absolute value and use that subsequently?

OR can I atleast generate a variable that takes value 1 if f3 is positive, 0 when f3 is 0 and -1 if f3 is negative?

Apologies for the long question.
Thanks in advance.

Thanks.

Bengt O. Muthen posted on Thursday, March 12, 2015 - 3:35 pm

Using plausible values (sets of factor scores for each subject) and then getting the absolute difference would seem the way to go. No automatic option of the absolute kind.

Jane Doe posted on Friday, March 13, 2015 - 5:12 am

Thanks a lot. This is helpful.

Lisa M. Yarnell posted on Wednesday, March 18, 2015 - 12:56 pm

Hi Linda and Bengt,

Is it possible to use a "by" statement on the DEFINE line--or something that will achieve the same result as a "by" statement?

For instance, I want to calculate the Black-White achievement gap separately each of the numerous schools in my sample by calculating Black students' average achievement by school; and White studens' average achievement by school; and subtracting one from the other (also by school).

I will then use this variable in my model.

Is there a "by" statement available for the DEFINE line, in order to do this? I did not see one mentioned in the User's Guide.

Thank you,
Lisa

Linda K. Muthen posted on Wednesday, March 18, 2015 - 2:29 pm

No, the DEFINE command is for observed variables only.

Lisa M. Yarnell posted on Wednesday, March 18, 2015 - 2:52 pm

Hi Linda,
I do have School ID as an observed variable. Can you clarify?
Thank you sincerely.

Linda K. Muthen posted on Wednesday, March 18, 2015 - 5:44 pm

There is no BY statement in DEFINE. A BY statement defines a latent variable. If you want the difference between two observed variables, say

DEFINE:
diff = y - x;

Abbas Firoozabadi posted on Tuesday, March 24, 2015 - 5:55 am

Dear Linda,
Below I described my data and what I am working on:
My main hypothesis is: the effect of recovery during weekend (positive activation change over weekend) on health over time.
I had three measurements over 1 year, for each I have one score of Health and three scores of positive affect before weekend (PAb), during weekend (PAd) and end of weekend (PAe) respectively. In a within person design I have to define the slope of positive affect change over weekend as the proxy of recovery that in turn will be used as the predictor of Health. I have to analysis my data in two levels of Between and Within person:
Variables are: Health1 Health2 Health3 gender age and PAb1 PAd1 PAe1 (will define slope of recovery1)
PAb2 PAd2 PAe2 (will define slope of recovery2) PAb3 PAd3 PAe3 (will define slope of recovery3)

For %between%: intercept and slope of health: I S |Health1@0 Health2@1 Health3@2
Then: I S ON gender age
For %within%:
Health1 ON slope of recovery1
Health2 ON slope of recovery2
Health3 ON slope of recovery3
So I need to DEFINE the slope of recovery (as a new variable) by taking 3 points of positive affect over each weekend.
How can I have all of these analyses in one syntax of Mplus?

Bengt O. Muthen posted on Tuesday, March 24, 2015 - 11:31 am

One approach is to do it in a wide, single-level format. Then you have 3*3 recovery outcomes and 3 health outcomes, so 12 columns in the data, not counting any covariates. You can formulate 3 growth models for the recoveries and let their growth factors predict the 3 health outcomes. I am not sure you need/want a growth model for the 3 health outcomes.

IW posted on Sunday, August 23, 2015 - 4:08 pm

Is there a way to check defined variables by exporting the raw data set without having to run an analysis?

Linda K. Muthen posted on Tuesday, August 25, 2015 - 7:29 am

You can use TYPE=BASIC; in the ANALYSIS command with no MODEL command. Then use SAVEDATA and the FILE option to save the variables.

Krisztian Posch posted on Friday, February 19, 2016 - 12:12 pm

Dear Linda,

I've encountered some difficulties when I wanted to derive new variables from latent variables. I understand that I cannot use the define command for latent variables, but to show what I am aiming for I will put it as if I could.

m by m1 m2 m3;
l by l1 l2 l3;

DEFINE:
m2 = m**2;
l2 = l**2;

In other words could you please let me know how I could derive the different transformations (e.g. squared, cubic) of latent variables in Mplus?

Thank you in advance for your kind help.

Krisztian Posch posted on Friday, February 19, 2016 - 4:58 pm

Just to add to the previous question: I understand how to specify quadratic functions, which would look like this:

m2 | m XWITH m;
l2 | l XWITH l;

I am still wondering whether there is any less computation intensive way to estimate these for latent variables?

Thanks so much again.

Linda K. Muthen posted on Friday, February 19, 2016 - 5:09 pm

Interactions among latent variables are specified using the XWITH option, for example,

int | f1 XWITH f2;

Kieran Mepham posted on Tuesday, September 27, 2016 - 2:15 am

Dear Dr. Muth�n,
In a model I use for producing correlation tables I would like to define a new variable (bias) which is the difference between an affective latent variable (feels) and an observed variable (v9_6). To do this, I attempted a slightly modified version of Prof. Bengt's approach used above (post on March 02, 2015 - 11:01 am) for summing latent variables:

bias BY;
bias ON feels@-1 v9_6@1;
bias@0;
bias WITH feels@0 v9_6@0;

As you can see, the main change is that I altered the value of the regression coefficient of one of the variables to -1 so that the score would be the difference.

However, I received the message that psi is not positive definite. TECH4 output doesn't reveal anything I recognise as odd. Have I perhaps modified something from the original suggestion incorrectly?

Thanks very much in advance,

Kieran

Linda K. Muthen posted on Tuesday, September 27, 2016 - 6:20 am

Please send the output and your license number to support@statmodel.com.

'Alim Beveridge posted on Sunday, November 20, 2016 - 5:23 am

Dear Linda and Bengt,

I have an ordinal variable for size, but the order is wrong:

1: 1-9
2: 10-49
3: 1000+
4: 250-999
5: 50-249

Also, the values are strings.
Can I transform into a new variable using DEFINE like so:

IF (SizeforCurrentAssessment == "1-9") THEN size = 0;

I get an error when I try.
thanks,
'Alim

'Alim Beveridge posted on Sunday, November 20, 2016 - 5:46 am

Please ignore or delete my question. I figured out that Mplus can't deal with string data.

Dan Y. posted on Thursday, March 23, 2017 - 1:13 pm

Dear Linda and Bengt,

Is there a way to add scores across rows of a data file? I want to define a new variable for persons' total scores. Thank you.

Bengt O. Muthen posted on Thursday, March 30, 2017 - 9:26 am

Yes, use Define:

Define:

sum = y1 + y2;

Sara Namazi posted on Thursday, February 22, 2018 - 7:48 pm

Hi Dr. Muthen,

I want to dichotomize one of my variables in Mplus. Below are the detailed steps that I used and I wanted to be sure it is correct:

The variable I want to dichotomize is called ChildC:

1. I have no children under 18 at home
2. Another adult has primary responsibility
3. I share responsibility with another adult
4. I have primary responsibility

I am creating a dichotomous variable called, CC1, where I want to lump the following:

I have no children under 18 at home = 0

Another adult has primary responsibility; I share responsibility with another adult; I have primary responsibility (all lumped together) = 1

Below is the syntax I used in Mplus:

CC1=ChildC;
if (ChildC eq 1) then CC1 = 0;
if (ChildC eq 2) then CC1 = 1;
if (ChildC eq 3) then CC1 = 1;
if (ChildC eq 4) then CC1 = 1;

Is this a correct way to create a dichotomous variable in mplus?

Thanks,
Sara

Bengt O. Muthen posted on Friday, February 23, 2018 - 4:41 pm

Yes, it looks okay. You could instead use

CC1=0; ! assume no children under 18 at home before checking conditions
to change this variable.
if (ChildC eq 2) then CC1 = 1;
if (ChildC eq 3) then CC1 = 1;
if (ChildC eq 4) then CC1 = 1;

Thomas Rodebaugh posted on Thursday, April 12, 2018 - 8:13 am

I'm running into a problem that's baffling me while trying to create a new variable. Here's what I have under define:

first = 0;
if (time eq 1) then first = 1;

Mplus returns an error that nearly every case has data missing for the variable "first." I'm having trouble seeing how that can be, since the define command specifies that every case should have 0 unless the other variable (which has no missing data and varies between 1 and 12) is 1.

I'm running an ML-DSEM model, in case that matters. Any help much appreciated.

Thomas Rodebaugh posted on Thursday, April 12, 2018 - 9:05 am

Any thoughts about why these lines in the define command:

first = 0;
if (time eq 1) then first = 1;

would produce an error message that "first" is mostly missing data? The variable "time" has no missing data. I'm perplexed. I'm running an ML-DSEM model, in case that has anything to do with it. Any help much appreciated.

Thomas Rodebaugh posted on Thursday, April 12, 2018 - 9:05 am

Apologies for the duplicate post: It had seemed like the first one didn't go through.

Bengt O. Muthen posted on Thursday, April 12, 2018 - 12:21 pm

We need to see your data and full output - send to Support along with your license number.

Pevitr S. Bansal posted on Monday, August 03, 2020 - 9:22 am

Hello,
I'm overlooking something obvious but need assistance.
Creating a scores using 0 to 3 likert scale. Scores of 2 or 3 = symptom present.
Disorder A: 4 or more endorsed symptoms. Disorder B: 3 or more endorsed symptoms.

grouping is CPstat (0=no, 1=yes);
DEFINE:
ODDComp = 0; 8 items, for brevity, showing only a few:
IF (DBD03 GE 2) THEN ODDComp = 1;
IF (DBD13 GE 2) THEN ODDComp = 1;
IF (DBD15 GE 2) THEN ODDComp = 1;
...
IF (DBD39 GE 2) THEN ODDComp = 1;
ODDSum = SUM(ODDComp);

CDComp = 0; 15 items; for brevity, showing only a few:
IF (DBD02 GE 2) THEN CDComp = 1;
IF (DBD04 GE 2) THEN CDComp = 1;
IF (DBD06 GE 2) THEN CDComp = 1;
IF (DBD08 GE 2) THEN CDComp = 1;
...
IF (DBD45 GE 2) THEN CDComp = 1;
CDSum = SUM(CDComp);

CPstat = 0;
IF (ODDSum GE 4 OR CDSum GE 3 AND IRSover GE 3) THEN CPstat = 1;

Error: "YES group has 0 observations."
Thank you.

Bengt O. Muthen posted on Monday, August 03, 2020 - 5:30 pm

We need to see your full output and data to diagnose this - send to Support along with your license number.