If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:

the way mrcc was invoked
the way build.mrcc was invoked
the output of build.mrcc
compiler version (for example: ifort -V, gfortran -v)
blas/lapack versions
as well as gcc and glibc versions

This information really helps us during troubleshooting

OpenMP performance for CCSDTQ, etc.

kipeters
Topic Author
Offline
Senior Member

3 years 1 week ago #1243 by kipeters

OpenMP performance for CCSDTQ, etc. was created by kipeters

What sort of parallel performance should I be seeing with OpenMP in modules such as CCSDT(Q), CCSDTQ, and/or CCSDTQ(5) ? If for example, I specify 10 OpenMP threads but look at the process running in top, I never see more than 200%, i.e., 2 threads, being used. This doesn't seem right to me. This version was built with an older version of the Intel fortran compiler, so perhaps I just need to upgrade or it is perhaps something else with my system?

Please Log in or Create an account to join the conversation.

kallay
Offline
Administrator
Mihaly Kallay

3 years 1 week ago #1245 by kallay

Replied by kallay on topic OpenMP performance for CCSDTQ, etc.

Are the OMP_NUM_THREADS and MKL_NUM_THREADS environmental variables correctly set?

Best regards,
Mihaly Kallay

The following user(s) said Thank You: runeberg

Please Log in or Create an account to join the conversation.

kipeters
Topic Author
Offline
Senior Member

3 years 1 week ago #1246 by kipeters

Replied by kipeters on topic OpenMP performance for CCSDTQ, etc.

As far as I know. Before invoking mrcc, I simply do:

setenv OMP_NUM_THREADS $cpuct
setenv MKL_NUM_THREADS $cpuct

where the $cpuct variable has been previously set to the number of cores. The number of cores is displayed correctly in the mrcc output.

Please Log in or Create an account to join the conversation.

kipeters
Topic Author
Offline
Senior Member

3 years 1 week ago #1247 by kipeters

Replied by kipeters on topic OpenMP performance for CCSDTQ, etc.

in retrospect, setting MKL_NUM_THREADS to be the same as OMP_NUM_THREADS probably isn't a great idea, but I've since tried lots of combinations but can never get past what looks to me like only 2 threads being used. Also tried setting OMP_PLACE to various values, but also no effect.

Please Log in or Create an account to join the conversation.

nagypeter
Offline
Premium Member
MRCC developer

3 years 1 week ago #1248 by nagypeter

Replied by nagypeter on topic OpenMP performance for CCSDTQ, etc.

Dear Kirk,

You should set both OMP_NUM_THREADS and MKL_NUM_THREADS to the maximum number of cores, that is, $cpuct in your case. Some codes in MRCC use OpenMP directives, some use parallel MKL, some use both.

As we note in the MRCC paper,
doi.org/10.1063/1.5142048 Sect. II.L,
the OpenMP parallel scaling of the iterative higher-order CC methods, like CCSDTQ, is somewhat disappointing due to heavy disk I/O, but the perturbative parts, like (Q), should have a good scaling with 10 cores.

Due to this, could you maybe test some other methods in MRCC, which spend most of the time in OpenMP parallel code regions, like calc=DF-CCSD(T) and let us know the number of processes/threads shown in top?

Perhaps this could be a scheduler issue, allowing only 2 threads/cores for the mrcc process? Do you see the same problem running MRCC in command line without a scheduler?

Do you run the dmrcc executable standalone and call mrcc this way, or do you use an interface via an other package?

I guess this is the same ifort 17.9 compile and 11-year-old CentOS 6.2 system you noted in a previous post. We similarly struggle with older software environments. Unfortunately, the latest and free Intel OneAPI compilers are only compatible back to glibc 2.17, so that was the problem when you tried to run the 2022 MRCC binary that we compiled.
In principle, ifort 17.9 should work, but you may try to update ifort, just not all the way to the free OpenAPI versions post 2021, as I guess your glibc is older than 2.17.

The binary code of our previous, 2020 MRCC release could still be compatible with your OS, so maybe it is worth a try to see if the 2020 code utilizes all the threads properly. This could give us a clue if this is a compiler issue. Since there were no major changes in the higher-order CC parts since the 2020 release, this could also be a practical solution.

Sorry for the long message without a specific solution but so far it is hard to see what could be the problem.

Best wishes,
Peter

Please Log in or Create an account to join the conversation.

kipeters
Topic Author
Offline
Senior Member

3 years 1 week ago #1249 by kipeters

Replied by kipeters on topic OpenMP performance for CCSDTQ, etc.

Dear Peter,

thank for the extensive reply, I appreciate the time this took. Yes, CentOS 6.2 is getting pretty old these days. Updating my whole cluster to a new kernel needs to be done at some point, but I admit to not finding the time yet (and I use the Rocks environment too, so that limits how far I can go as well).

I'm working to update my glibc and such so that perhaps I can use the newer oneapi versions of the Intel compiler. I'm also updating gcc so that perhaps a gfortran build might work as well.

My scheduler (torque/maui) seems to correctly allocate cpu cores and even when I set OMP_PLACE to cores (rather than threads) it still didn't help. Of course there could be something still not set right - I'm much more familiar with running/building MPI apps rather than thread-based applications. I will try to run MRCC from the command line instead of though the Molpro interface. Since MRCC echoes the correct number of cores, I was thinking at least the environment variables were being followed correctly.

-Kirk

Please Log in or Create an account to join the conversation.

Time to create page: 0.043 seconds

OpenMP performance for CCSDTQ, etc.

MRCC Login