× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Running MRCC with MPI tasks on multiple compute nodes

More
6 months 1 week ago #1089 by ddatta
Hello,

I am trying to run CC calculations using a Slurm job script. The goal is to run the job on multiple nodes, e.g., using mpitasks=4 such that two MPI tasks run on one compute node and the other two run on another. In addition, each MPI task spawns a number of OpenMP threads. I guess the latter is simpler. I find that either the job hangs indefinitely or is unable to copy the input file. 

Any help/suggestion will be much appreciated.

Thanks in advance and best regards,
Dipayan Datta

Here are some specifications used in the Slurm job script.#SBATCH --nodes=2
#SBATCH --ntasks=4 
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=16

export WRKDIR=$PWD
export MRCCPATH=path-to-MRCC
export INPUTDIR=$WRKDIR# Set scratch
if [ $?TMPDIR ]; then
    export SCR=$TMPDIR
fi

JOB=...
cd $SCR
cp $INPUTDIR/$JOB.inp MINP

srun --output $INPUTDIR/$JOB.out $MRCCPATH/dmrcc

----

This script works when mpitasks is set to 1, i.e., the job is run on a single node with one mpitask and any number of OpenMP threads.

Please Log in or Create an account to join the conversation.

More
6 months 1 week ago #1090 by nagypeter
Dear Dipayan,

You should run MRCC with
> $MRCCPATH/dmrcc > $INPUTDIR/$JOB.out
instead of srun, or
> mpirun -np 1 [additional options] $MRCCPATH/dmrcc_mpi > $INPUTDIR/$JOB.out
if you need [additional options] for mpirun.

Additionally, for SLURM you must use with these settings:
#SBATCH --overcommit
because there will be mpitasks+2 processes running alltogether, with mpitasks doing the works and the +2 waiting mostly in the background.

If the problem remains, please, read these related forum posts:
www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1031
www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1032
and provide additional info
- according to the questions in #1032 and
- MRCC version, compiler version, MPI version, SLURM version
- complete input, output and error messages for "I find that either the job hangs indefinitely or is unable to copy the input file."
- does the MPI parallel version work in other scenarios? (there are MPI testjobs, you should try mpitasks=2 in a single node with your SLURM script and from command line...)


Note on your OpenMP setup:
For OpenMP threading you also have to set:
export OMP_NUM_THREADS=8
export MKL_NUM_THREADS=8
and we recommend also:
export OMP_PLACES=cores
export OMP_PROC_BIND="spread,close"
(Here I assume the node have 16 phyical cores and you want to use 8 for each task.
It the node have 32 phyical cores, at least for some versions of SLURM we find that --cpus-per-task have to be the total number of cores occupied by all (here 2) tasks of a node, and not just one task.)

I hope this help, let us know.
Best wishes,
Peter

Please Log in or Create an account to join the conversation.

More
6 months 1 week ago #1091 by ddatta
Thanks, Péter.

I could manage to get MRCC running on a single compute node with mpitasks=2 using mpirun -np 1 .... etc both using a Slurm script and also from command line.

I think the problem is with the connection between nodes. Following the discussions under

www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1031
www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1032

I tried changing the bootstrap server, but it did not solve the problem.

This problem arises only with MRCC, and only when using multiple compute nodes. Working on a single node with a number of OpenMP and MKL threads was fine. I have been using the specifications that you mentioned about the OpenMP thread pinning.

The MRCC version is the latest one, February 2020. I am using Intel compiler version 18.3 and the associated Intel MPI library. The SLURM version is 20.11.3.

I have also been using #SBATCH --overcommit.

Thanks,
Dipayan

Please Log in or Create an account to join the conversation.

More
6 months 1 week ago #1092 by nagypeter
Dear Dipayan,

Sorry to hear that the issue remains.
To my understanding you cannot submit a job with more than one nodes, right?
So 2 nodes with 2 MPI processes (1 process per node) does not work either?

Could you expand on what did you mean by "the job is unable to copy the input file"?

Next round of idead for you to try:

1) Please, upload the full input and output files. Or is there absolutely no output/error message? That would be very strange.
You can also try to increase the verbosity level using:
> mpirun -np 1 -genv I_MPI_DEBUG 5 ...
Do you see any processes (dmrcc, minp, integ, scf...) to start on any of the nodes allocated to your job by SLURM?

2) It should be allowed to ssh between nodes in your cluster, at least for the SLURM job between the nodes allocated to the job. It this correct?
Can you try
I_MPI_HYDRA_BOOTSTRAP=ssh
in your slurm script?

3) Could you share the cluster documentation webpage? Do all the nodes share a common network file system or does your job use different local directories in each nodes?

4) Did you try the binary version of MRCC? And you can try to update the Intel MPI library to Intel MPI 2019 Update 3.

5) What is the Libfabric version in your cluster? See page 26 of the manual on what we recommend.

6) You can also try to recompile MRCC with the OpenMPI library, but read the manual for a number of "Open MPI" comments first.

I hope there will be some progess along any of the above lines.
Bests,
Peter

Please Log in or Create an account to join the conversation.

More
6 months 5 days ago #1093 by ddatta
 

File Attachment:

File Name: coronene.tar.gz
File Size:9 KB


Hi Péter,

Thanks for your detailed and very helpful comments and suggestions. I used -genv I_MPI_DEBUG 5 as you suggested, this was much useful. 

I am attaching an input file and two output files enclosed in a tar.gz attachment. Both calculations used mpitasks=2, OMP_NUM_THREADS=MKL_NUM_THREADS=16 per MPI rank (our cluster has 36 physical cores per node and one hardware thread associated with each core).

One of the calculations was run on a single node with both MPI tasks on the same node. This calculation completed successfully. The other calculation intended to use 2 nodes with 1 MPI rank per node. I guess the output file names are suggestive of how these calculations were performed. The second calculation did not proceed beyond RI-MP2. I used 

I_MPI_HYDRA_BOOTSTRAP=slurm

This is the only value that allows at least the SCF calculation to complete. Using I_MPI_HYDRA_BOOTSTRAP=ssh makes the program hang indefinitely before the beginning of SCF iterations. 

With I_MPI_HYDRA_BOOTSTRAP=slurm, the program is able to map the two MPI ranks on two different compute nodes. I allowed 10 hours of run time. The calculation did not proceed beyond RI-MP2 and then eventually the allowed CPU time was over, and the job was terminated.

Please let me know if you need more information.

Thank you very much for your help. 

Best regards,
Dipayan
Attachments:

Please Log in or Create an account to join the conversation.

More
6 months 4 days ago - 6 months 4 days ago #1094 by nagypeter
Dear Dipayan,

Does this job work with 2 nodes 2 MPI tasks without using SLURM?
For that you should specify the -hosts option with
mpirun -np 1 $MRCCPATH/dmrcc_mpi


There might be incompatible settings for the process manager and PMI library.
You can:

7 ) try with mpirun unsetting the SLURM's PMI if that is set by default in your case via:
unset I_MPI_PMI_LIBRARY

In this option you might also need to set:
I_MPI_HYDRA_BOOTSTRAP=slurm
I_MPI_HYDRA_BOOTSTRAP_EXEC=srun

8 ) try to make it work with srun:
> srun -n 1 --mpi=pmi? $MRCCPATH/dmrcc_mpi
with ?=2 or x or what is available to you.

In this option I_MPI_PMI_LIBRARY should point to SLURM's PMI library.
If only this works, you might need to tweak the
I_MPI_FABRICS and other settings for optimal performance.

9 ) For additional details, please, send the output of

mpirun -np 1 -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_ENV all -genv I_MPI_HYDRA_DEBUG 1 -verbose dmrcc_mpi

Note:
If this is a production job, I would recommend to use the full resources of your nodes, i.e.
OMP_NUM_THREADS=18
MKL_NUM_THREADS=18
with 2 MPI tasks per a 36-core node. For larger jobs you should also set the mem keyword to close to 50% of the total none memory for 2 tasks per node.

I hope some of the above helps, or you can still look at suggestions 3-6 from the pervious post.
Bests,
Peter
Last edit: 6 months 4 days ago by nagypeter.

Please Log in or Create an account to join the conversation.

Time to create page: 0.022 seconds
Powered by Kunena Forum