If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:

the way mrcc was invoked
the way build.mrcc was invoked
the output of build.mrcc
compiler version (for example: ifort -V, gfortran -v)
blas/lapack versions
as well as gcc and glibc versions

This information really helps us during troubleshooting

drpa_mpi dies with forrtl file not found

rfanta
Topic Author
Offline
New Member

3 months 3 weeks ago - 3 months 3 weeks ago #1563 by rfanta

drpa_mpi dies with forrtl file not found was created by rfanta

Dear MRCC developers,

I have encountered a recurring issue with MRCC (version 25.1.1) during an LNO-CCSD(T) calculation on a single-node HPC setup (480 GB RAM, Intel MPI environment, AMD Rome 7702 - 120 usable cores).

Calcul-ation settings:
basis = atomtype
Fe:def2-TZVPP
N,O,C:def2-TZVP
H:def2-SVP
calc = LNO-CCSD(T)
dfbasis = def2-TZVPP-RI
dfbasis_scf = def2-QZVPPD-RI-JK
dfbasis_cor = def2-TZVPP-RI
dft = B3LYP
lccoporder = trffirst
lcorthr = Normal
mem = 32GB
mpitasks = 12
mult = 6
qro = on
scfalg = direct
scftype = uks

(Full MINP is in the attached file.)

The job fails during the three-index DF transformation step (drpa), causing some MPI ranks to be killed, followed by “forrtl: file not found” errors related to fort.1010.

Job execution and memory info are in the uploaded slurm-8449818.output.txt file.

What I’ve tried:

Reducing the number of ranks, increasing threads (e.g., 4/30) - same error.
Matching RI basis to orbital basis (def2-TZVPP-RI) and using smaller bases - still fails.
Changing lccoporder to trffirst - no improvement.

Memory usage monitoring and peak memory utilization are included in the output log.

Please let me know if you need further information. Thank you for your help!

Best regards,
Roman

Attachments:

Last edit: 3 months 3 weeks ago by rfanta.

Please Log in or Create an account to join the conversation.

nagypeter
Offline
Premium Member
MRCC developer

3 months 3 weeks ago #1564 by nagypeter

Replied by nagypeter on topic drpa_mpi dies with forrtl file not found

Dear Roman,

MPI parallelization is not implemented for LNO-CCSD(T). Please remove mpitasks from MINP, if you want to use LNO-CCSD(T).
You can employ OpenMP threaded paralelization for LNO-CCSD(T) (it will not scale so many cores that you want to use).
In the released version, MPI parallelization is only available for DF-CCSD(T) and FNO-CCSD(T) methods, for closed-shell systems.
Best regards,
Peter

Please Log in or Create an account to join the conversation.

rfanta
Topic Author
Offline
New Member

3 months 3 weeks ago - 3 months 3 weeks ago #1565 by rfanta

Replied by rfanta on topic drpa_mpi dies with forrtl file not found

Thank you for the clarification. I was confused about MPI support for LNO-CCSD(T) because of the recent Chemical Science article's Supporting Information. My bad, sorry.

Is the MPI layer only available in the development/internal version? Is there any roadmap for enabling MPI for LNO-CCSD(T) in the public release?

In the meantime, I will run with pure OpenMP as you recommend. Could you please advise on best practices for running large LNO-CCSD(T) jobs using OpenMP? From the examples in the SI (e.g., wall-time measurements for the 63-atom and 90-atom cases), it seems that scaling is optimal up to about 16 cores per node, with diminishing returns beyond that.

Thank you!

Best regards,
Roman

Last edit: 3 months 3 weeks ago by rfanta.

Please Log in or Create an account to join the conversation.

nagypeter
Offline
Premium Member
MRCC developer

3 months 3 weeks ago #1566 by nagypeter

Replied by nagypeter on topic drpa_mpi dies with forrtl file not found

There is nothing written in this paper doi.org/10.1039/D4SC04755A
about the parallel scaling performance of LNO-CCSD(T). Neither about MPI nor OpenMP.

For large jobs, you may try 10-20 cores with OpenMP, but in general there is unfortunately not much speedup after that. If the integral transformation step is the slowest (eg. large molecule+large/QZ basis) then the scaling is a bit worse. If the CCSD and (T) steps are slower (usual case, especially with tighter settings and complicated electronic structure), then the OpenMP scaling should be better.

Advice:
you can check the " Maximum memory requirement" block of the output and set your memory to a few 10% larger than what is the maximum in that output part. Then you can set the number of cores according to this minimal memory requirement for the LNO-CCSD(T) job, as you often have to allocate a n number of cores to get the n times the memory/core value in your cluster.

LNO-CCSD(T) is restartable with frequent checkpointing, useful feature for large jobs.

Thanks for using our methods and code, I hope this will be useful.

The following user(s) said Thank You: rfanta

Please Log in or Create an account to join the conversation.

Time to create page: 0.046 seconds

drpa_mpi dies with forrtl file not found

MRCC Login