I just bought a new computer (AMD Athlon 64 X2 4000+) I am not sure how to take advantage of the Multi-processing speed gains. Do I have to do a special set up for MPLUS or MPLUS 4.21 automatically does? I have notice that when I run my models (for instance I am running some MIMIC models with zero-inflated poison endogenous indicators in a sample of 4000 subjects) that I am only using one of the processors but never two. Is it normal? Thanks Caridad
You should specify the number of logical processors that you want to use. So on a quad-core computer with hyperthreading, you can specify PROCESSORS=8.
Peter posted on Sunday, January 29, 2017 - 8:28 am
Dear Drs. Methuen,
1) Could you update (Jan 30, 2017) specific M-Plus commands which run faster on multiple cores. i.e.,
PROCESSORS = 12
2) Have all of these listed M-Plus commands in 1) above been parallelized in your code so that they can take advantage of multiple cores?
Would you please comment on the following:
3) In benchmarking timing tests within Stata and Geekbench, we found faster parallel processing times when only hardware cores were specified to be used; that is, while virtual cores were left free.
Our best impression for this difference, while watching Mac OSX and Windows 8 allocate the work load among multiple processors (on the same dual boot machine), is that it is best to leave all virtual cores free for operating system calls and other "housekeeping" chores that are required while floating point, matrix algebra math computation is ongoing on the hardware cores.
Have tests WITHIN M PLUS on your multiple core machine, confirmed or refuted point 3) above?
If I understand Thuy Nguyen's post of April, 2016 properly, the processor command can "pick up" threads and cores as well as physical processors. So, even on a single processor machine with hyperthreading, I can specify processors = 2. If I had a dual core (single physical CPU) machine with hyperthreading, I could specify processors = 4.
I am considering purchasing a new machine primarily to do some multilevel mixture models with some larger samples (e.g., 3,000 level 1, and 250 level 2). I assume the chips will provide hyperthreading. Do you have any sense about where (in terms of the number of processors) the performance curve dips? That is, I assume 2 is better than 1, 4 is better than 2, and so forth; however, the law of diminishing returns may apply here.
Along these lines, what is the impact of RAM? Of course, more is usually better, but is there an inflection point in terms of performance here also? I assume a starting point of 16gb, but I will be running win7 pro, which allows up to 192gb.
Thank you for considering these questions, and please accept my apologies if they are poorly informed.
While we hesitate to comment on hardware, we feel it is better to get the fastest multi-core processor instead of the processor with the most number of cores for Mplus. Not all Mplus computations are done in parallel – so having the fastest single processor speed avoids running into computational bottleneck.
The ideal number of RAM to have is dependent on the size of the model and data and how much multitasking is done while Mplus is running. Our best computer has 32GB of RAM. We have run into memory limitations with that much RAM. But the number of variables in those models was very large.
Kurt Beron posted on Tuesday, August 29, 2017 - 12:24 pm
I thought I'd add to this thread (sorry) as I've been dealing with the question of the optimal number of processors/threads myself. Below are my benchmarks for my 4 core/8 thread PC with 32gb ram. I can open quite a few Mplus threads, but the ones that match my CPU work better than increasing the number. Obviously I am running a specific program, but I repeated with a different program with similar results:
Thanks for the post. As I understand things, with a quad-core processor with hyperthreading you have 8 cores (processors) So, the fact that specifying 8 processors provides the most efficient solution makes sense. That is, although you can specify more processors than you have, it seems that it does no good.
Dear Linda, dear Bengt, I have a few questions regarding the processor specifications for THREELEVEL analysis. 1) Is it correct that for TYPE=THREELEVEL, I can specify the number of processors, e.g. PROCESSORS=4, but for TYPE=THREELEVEL RANDOM, I need to specify the number of processors and threads in conjunction with the STARTS option, e.g. PROCESSORS = 4 2; STARTS= 200 20; Otherwise, only one processor would be used? 2) If this is correct, how would I be able to increase computation speed for TYPE=THREELEVEL RANDOM if I use a multi-core processor without hyperthreading? Would PROCESSORS = 4 1; in conjunction with the STARTS option work? 3) If I use a quad-core processor with hyperthreading, would a sensible specification of processors and threads be PROCESSORS = 4 2; for TYPE=THREELEVEL RANDOM? 4) I am not sure if I understand the STARTS option correctly. If I do not specify this option, what are the default values for the number of random sets of starting values and the number of optimizations to use in the final stage in TYPE=THREELEVEL RANDOM? Can I find them in the output?
1) This does not sound correct. The starts command is not usually used with type=threelevel. It is mostly used for mixture models. The processor command for type=threelevel is available for the bayes estimator and for the ML estimator when numerical integration is performed, for example with categorical variables. It is not available with the ML estimator and continuous variable where the computation is quite fast usually. You should be able to see the number of processors Mplus uses in the task manager.
2) If the processor command is available (bayes or ML with algo=int) then specifying processor=4 would help. You might also be experiencing slow convergence due to too many random slopes that. What can help in that case is to run one random slope at a time to see if it is really needed (significance variance) and just keep those that are needed in the final model.
3) For ML with numerical integration I would recommend proc=8 for your computer. For Bayes I would recommend proc=4.
4) We would use the default starting values (see tech1) and perform just one optimization. Starts is generally not needed for type=threelevel but is available.