- Posts: 2
- Thank you received: 0
If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
This information really helps us during troubleshooting
- the way mrcc was invoked
- the way build.mrcc was invoked
- the output of build.mrcc
- compiler version (for example: ifort -V, gfortran -v)
- blas/lapack versions
- as well as gcc and glibc versions
This information really helps us during troubleshooting

drpa_mpi dies with forrtl file not found
- rfanta
- Topic Author
- Offline
- New Member
-
Less
More
1 day 7 hours ago - 1 day 7 hours ago #1563
by rfanta
drpa_mpi dies with forrtl file not found was created by rfanta
Dear MRCC developers,
I have encountered a recurring issue with MRCC (version 25.1.1) during an LNO-CCSD(T) calculation on a single-node HPC setup (480 GB RAM, Intel MPI environment, AMD Rome 7702 - 120 usable cores).
Calcul-ation settings:
basis = atomtype
Fe:def2-TZVPP
N,O,C:def2-TZVP
H:def2-SVP
calc = LNO-CCSD(T)
dfbasis = def2-TZVPP-RI
dfbasis_scf = def2-QZVPPD-RI-JK
dfbasis_cor = def2-TZVPP-RI
dft = B3LYP
lccoporder = trffirst
lcorthr = Normal
mem = 32GB
mpitasks = 12
mult = 6
qro = on
scfalg = direct
scftype = uks
(Full MINP is in the attached file.)
The job fails during the three-index DF transformation step (drpa), causing some MPI ranks to be killed, followed by “forrtl: file not found” errors related to fort.1010.
Job execution and memory info are in the uploaded slurm-8449818.output.txt file.
What I’ve tried:
Please let me know if you need further information. Thank you for your help!
Best regards,
Roman
I have encountered a recurring issue with MRCC (version 25.1.1) during an LNO-CCSD(T) calculation on a single-node HPC setup (480 GB RAM, Intel MPI environment, AMD Rome 7702 - 120 usable cores).
Calcul-ation settings:
basis = atomtype
Fe:def2-TZVPP
N,O,C:def2-TZVP
H:def2-SVP
calc = LNO-CCSD(T)
dfbasis = def2-TZVPP-RI
dfbasis_scf = def2-QZVPPD-RI-JK
dfbasis_cor = def2-TZVPP-RI
dft = B3LYP
lccoporder = trffirst
lcorthr = Normal
mem = 32GB
mpitasks = 12
mult = 6
qro = on
scfalg = direct
scftype = uks
(Full MINP is in the attached file.)
The job fails during the three-index DF transformation step (drpa), causing some MPI ranks to be killed, followed by “forrtl: file not found” errors related to fort.1010.
Job execution and memory info are in the uploaded slurm-8449818.output.txt file.
What I’ve tried:
- Reducing the number of ranks, increasing threads (e.g., 4/30) - same error.
- Matching RI basis to orbital basis (def2-TZVPP-RI) and using smaller bases - still fails.
- Changing lccoporder to trffirst - no improvement.
Please let me know if you need further information. Thank you for your help!
Best regards,
Roman
Attachments:
Last edit: 1 day 7 hours ago by rfanta.
Please Log in or Create an account to join the conversation.
- nagypeter
- Offline
- Premium Member
-
- MRCC developer
1 day 5 hours ago #1564
by nagypeter
Replied by nagypeter on topic drpa_mpi dies with forrtl file not found
Dear Roman,
MPI parallelization is not implemented for LNO-CCSD(T). Please remove mpitasks from MINP, if you want to use LNO-CCSD(T).
You can employ OpenMP threaded paralelization for LNO-CCSD(T) (it will not scale so many cores that you want to use).
In the released version, MPI parallelization is only available for DF-CCSD(T) and FNO-CCSD(T) methods, for closed-shell systems.
Best regards,
Peter
MPI parallelization is not implemented for LNO-CCSD(T). Please remove mpitasks from MINP, if you want to use LNO-CCSD(T).
You can employ OpenMP threaded paralelization for LNO-CCSD(T) (it will not scale so many cores that you want to use).
In the released version, MPI parallelization is only available for DF-CCSD(T) and FNO-CCSD(T) methods, for closed-shell systems.
Best regards,
Peter
Please Log in or Create an account to join the conversation.
- rfanta
- Topic Author
- Offline
- New Member
-
Less
More
- Posts: 2
- Thank you received: 0
1 day 3 hours ago - 1 day 3 hours ago #1565
by rfanta
Replied by rfanta on topic drpa_mpi dies with forrtl file not found
Thank you for the clarification. I was confused about MPI support for LNO-CCSD(T) because of the recent Chemical Science article's Supporting Information. My bad, sorry.
Is the MPI layer only available in the development/internal version? Is there any roadmap for enabling MPI for LNO-CCSD(T) in the public release?
In the meantime, I will run with pure OpenMP as you recommend. Could you please advise on best practices for running large LNO-CCSD(T) jobs using OpenMP? From the examples in the SI (e.g., wall-time measurements for the 63-atom and 90-atom cases), it seems that scaling is optimal up to about 16 cores per node, with diminishing returns beyond that.
Thank you!
Best regards,
Roman
Is the MPI layer only available in the development/internal version? Is there any roadmap for enabling MPI for LNO-CCSD(T) in the public release?
In the meantime, I will run with pure OpenMP as you recommend. Could you please advise on best practices for running large LNO-CCSD(T) jobs using OpenMP? From the examples in the SI (e.g., wall-time measurements for the 63-atom and 90-atom cases), it seems that scaling is optimal up to about 16 cores per node, with diminishing returns beyond that.
Thank you!
Best regards,
Roman
Last edit: 1 day 3 hours ago by rfanta.
Please Log in or Create an account to join the conversation.
- nagypeter
- Offline
- Premium Member
-
- MRCC developer
1 day 3 hours ago #1566
by nagypeter
Replied by nagypeter on topic drpa_mpi dies with forrtl file not found
There is nothing written in this paper
doi.org/10.1039/D4SC04755A
about the parallel scaling performance of LNO-CCSD(T). Neither about MPI nor OpenMP.
For large jobs, you may try 10-20 cores with OpenMP, but in general there is unfortunately not much speedup after that. If the integral transformation step is the slowest (eg. large molecule+large/QZ basis) then the scaling is a bit worse. If the CCSD and (T) steps are slower (usual case, especially with tighter settings and complicated electronic structure), then the OpenMP scaling should be better.
Advice:
you can check the " Maximum memory requirement" block of the output and set your memory to a few 10% larger than what is the maximum in that output part. Then you can set the number of cores according to this minimal memory requirement for the LNO-CCSD(T) job, as you often have to allocate a n number of cores to get the n times the memory/core value in your cluster.
LNO-CCSD(T) is restartable with frequent checkpointing, useful feature for large jobs.
Thanks for using our methods and code, I hope this will be useful.
about the parallel scaling performance of LNO-CCSD(T). Neither about MPI nor OpenMP.
For large jobs, you may try 10-20 cores with OpenMP, but in general there is unfortunately not much speedup after that. If the integral transformation step is the slowest (eg. large molecule+large/QZ basis) then the scaling is a bit worse. If the CCSD and (T) steps are slower (usual case, especially with tighter settings and complicated electronic structure), then the OpenMP scaling should be better.
Advice:
you can check the " Maximum memory requirement" block of the output and set your memory to a few 10% larger than what is the maximum in that output part. Then you can set the number of cores according to this minimal memory requirement for the LNO-CCSD(T) job, as you often have to allocate a n number of cores to get the n times the memory/core value in your cluster.
LNO-CCSD(T) is restartable with frequent checkpointing, useful feature for large jobs.
Thanks for using our methods and code, I hope this will be useful.
The following user(s) said Thank You: rfanta
Please Log in or Create an account to join the conversation.
Time to create page: 0.042 seconds