× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
• the way mrcc was invoked
• the way build.mrcc was invoked
• the output of build.mrcc
• compiler version (for example: ifort -V, gfortran -v)
• blas/lapack versions
• as well as gcc and glibc versions

This information really helps us during troubleshooting :)

OpenMP performance for CCSDTQ, etc.

9 months 5 days ago #1243 by kipeters
What sort of parallel performance should I be seeing with OpenMP in modules such as CCSDT(Q), CCSDTQ, and/or CCSDTQ(5) ?  If for example, I specify 10 OpenMP threads but look at the process running in top, I never see more than 200%, i.e., 2 threads, being used.  This doesn't seem right to me. This version was built with an older version of the Intel fortran compiler, so perhaps I just need to upgrade or it is perhaps something else with my system?

9 months 5 days ago #1245 by kallay

Best regards,
Mihaly Kallay
The following user(s) said Thank You: runeberg

9 months 5 days ago #1246 by kipeters
As far as I know. Before invoking mrcc, I simply do:

setenv OMP_NUM_THREADS $cpuct setenv MKL_NUM_THREADS$cpuct

where the $cpuct variable has been previously set to the number of cores. The number of cores is displayed correctly in the mrcc output. Please Log in or Create an account to join the conversation. 9 months 3 days ago #1247 by kipeters in retrospect, setting MKL_NUM_THREADS to be the same as OMP_NUM_THREADS probably isn't a great idea, but I've since tried lots of combinations but can never get past what looks to me like only 2 threads being used. Also tried setting OMP_PLACE to various values, but also no effect. Please Log in or Create an account to join the conversation. 9 months 3 days ago #1248 Dear Kirk, You should set both OMP_NUM_THREADS and MKL_NUM_THREADS to the maximum number of cores, that is,$cpuct in your case. Some codes in MRCC use OpenMP directives, some use parallel MKL, some use both.

As we note in the MRCC paper,
doi.org/10.1063/1.5142048 Sect. II.L,
the OpenMP parallel scaling of the iterative higher-order CC methods, like CCSDTQ, is somewhat disappointing due to heavy disk I/O, but the perturbative parts, like (Q), should have a good scaling with 10 cores.

Due to this, could you maybe test some other methods in MRCC, which spend most of the time in OpenMP parallel code regions, like calc=DF-CCSD(T) and let us know the number of processes/threads shown in top?

Perhaps this could be a scheduler issue, allowing only 2 threads/cores for the mrcc process? Do you see the same problem running MRCC in command line without a scheduler?

Do you run the dmrcc executable standalone and call mrcc this way, or do you use an interface via an other package?

I guess this is the same ifort 17.9 compile and 11-year-old CentOS 6.2 system you noted in a previous post. We similarly struggle with older software environments. Unfortunately, the latest and free Intel OneAPI compilers are only compatible back to glibc 2.17, so that was the problem when you tried to run the 2022 MRCC binary that we compiled.
In principle, ifort 17.9 should work, but you may try to update ifort, just not all the way to the free OpenAPI versions post 2021, as I guess your glibc is older than 2.17.

The binary code of our previous, 2020 MRCC release could still be compatible with your OS, so maybe it is worth a try to see if the 2020 code utilizes all the threads properly. This could give us a clue if this is a compiler issue. Since there were no major changes in the higher-order CC parts since the 2020 release, this could also be a practical solution.

Sorry for the long message without a specific solution but so far it is hard to see what could be the problem.

Best wishes,
Peter

9 months 3 days ago #1249 by kipeters
Dear Peter,

thank for the extensive reply, I appreciate the time this took. Yes, CentOS 6.2 is getting pretty old these days. Updating my whole cluster to a new kernel needs to be done at some point, but I admit to not finding the time yet (and I use the Rocks environment too, so that limits how far I can go as well).

I'm working to update my glibc and such so that perhaps I can use the newer oneapi versions of the Intel compiler. I'm also updating gcc so that perhaps a gfortran build might work as well.

My scheduler (torque/maui) seems to correctly allocate cpu cores and even when I set OMP_PLACE to cores (rather than threads) it still didn't help. Of course there could be something still not set right - I'm much more familiar with running/building MPI apps rather than thread-based applications. I will try to run MRCC from the command line instead of though the Molpro interface. Since MRCC echoes the correct number of cores, I was thinking at least the environment variables were being followed correctly.

-Kirk