If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Questions about running the mpi version of MRCC

  • bkwx97
  • Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 3 months ago #1309 by bkwx97
I have a few questions on how to run the MPI version of MRCC/ best practices. 

Lets say I have a cluster with 4 nodes and 128 cores on each node. How would I use mpitasks? Would I set it mpitasks=4 and then in my slurm script set open_num_threads and mkl_threads to 128? I'm attempting to use the mpi version of MRCC with the 2022 binary and all current patches. I set mpitasks=2 in the input file. I am using impi/2021.2.0 . Here is part of my slurm submission script

My other question is the mem command when using MPI. Would that be memory per node then, or would that request a total pool of memory across X nodes?

#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=32
#SBATCH --nodes=2
#SBATCH --time=0-05:00:00


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 14214 RUNNING AT amr-193
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================


 

Please Log in or Create an account to join the conversation.

  • nagypeter
  • Offline
  • Premium Member
  • Premium Member
  • MRCC developer
More
1 year 3 months ago #1310 by nagypeter
Replied by nagypeter on topic Questions about running the mpi version of MRCC
Hello!

Best practices wold depend on which method do you want to use (e.g. what is the calc keyword?) You can find a couple of scaling benchmarks in the MRCC release paper and refs therein:
aip.scitation.org/doi/abs/10.1063/1.5142048

For 4 nodes and 128 cores you can set in MINP:
mpitasks=4
and in the scheduler script:
export OMP_NUM_THREADS=128
export MKL_NUM_THREADS=128
See more details is sect. 9 of the MRCC manual.

For the mem keyword you set the maximum memory available for a single MPI task.
If you have 1 task per one node, you can set (almost) the whole memory of the node.

However, except for maybe extremely large examples, the current codes will not scale well up to 128 OpenMP threads. So I would recommend to use more MPI tasks also within a single node and use less OpenMP threads, assuming that there is sufficient memory in the node for each MPI task on that node. (currently each MPI task replicates everything in the memory)
2/4 or if memory allows 8 MPI tasks per node will be much more efficient than 1.

Best regards,
Peter

Please Log in or Create an account to join the conversation.

  • bkwx97
  • Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 3 months ago #1311 by bkwx97
Peter

It would be with the density fitted CCSD(T) code calc=DF-CCSD(T).

I fixed a problem with my .bashrc that was causing the printed error. But it seems that if I set mpitasks=4 I don't see four instances of scf or ccsd when I use ps -u. The code itself executes and doesn't print any immediate errors.

Please Log in or Create an account to join the conversation.

  • nagypeter
  • Offline
  • Premium Member
  • Premium Member
  • MRCC developer
More
1 year 3 months ago #1312 by nagypeter
Replied by nagypeter on topic Questions about running the mpi version of MRCC
It is hard to say anything based on so little information.
Could you, please, check again sections 9.3 and 9.4 of the manual to see if there is any suggestion there which can help.

In sect. 9.4 there are also 4 links to forum discussions on troubleshooting MPI jobs. If the issue is not solved, could you provide more details based on what we asked previously in these forum posts? (e.g., slurm script, full MRCC output at least with verbosity=3 and I MPI DEBUG=5, runtime info from ps / top...)

For calc=DF-CCSD(T), in the case of relatively large number of occupied orbitals, you could also try to increase ccsdthreads and ptthreads to 4 (or maybe 8) and have cca mkl*ccsdthreads number of total OpenMP threads (mkl=8 or16) to better utilize the 128 cores of these large nodes.

Please Log in or Create an account to join the conversation.

Time to create page: 0.039 seconds
Powered by Kunena Forum