Memory and number of processors

More
1 week 6 days ago #706 by ONVentura
Hi there!
Perhaps this is a little off-topic for this forum, but maybe some people had the same problem and can help.

I am invoking MRCC from CFOUR and in general it runs ok. However I am not able to pass along the memory and processor requirements from my pbs script file. It takes always 32 processors and 576.3 MB of memory (which, btw, I have no idea if it is the total memory or per processor).

I suspect that the memory is causing the program to abort when trying to calculate CCSDT(Q) even with a small basis set (cc-pCVTZ) on Cl.

Can anyone help?

Thanks

Please Log in or Create an account to join the conversation.

More
1 week 6 days ago #707 by kallay
The number of CPU corse used by mrcc can be controlled by the OMP_NUM_THREADS environmental variable, e.g.:
export OMP_NUM_THREADS=8
The amount of allocatable memory is received from cfour, so if you set the memory in the cfour input file, it should be passed over to mrcc.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

More
1 week 6 days ago #708 by ONVentura
Hello Mihaly, thanks for the reply. I exported the variable and it worked ok. Perhaps you can help me out with the other problem, which I guess it is a question of memory.

I am running a very simple input for CFOUR, the following

Chlorine Atom CCSDT pCVTZ
Cl

*CFOUR(CC_PROG=MRCC,MEM=16,MEM_UNIT=GB,CALC=CCSDT,BASIS=cc-pCVQZ,MULT=2)

So, just a CCSDT/cc-CVQZ single point calculation of a chlorine atom. However, the program stops like this

Starting CC iteration...
======================================================================
Norm of residual vector: 6.63797861
CPU time [min]: 34.638 Wall time [min]: 7.457

Iteration 1 CC energy: -460.06087320 Energy decrease: 0.57164190
======================================================================
Norm of residual vector: 0.54438456
CPU time [min]: 67.965 Wall time [min]: 13.467

Iteration 2 CC energy: -460.07490276 Energy decrease: 0.01402956
======================================================================

Child process recieved SIG# 15, exit

Parent process recieves SIG# 31
19219 mrcc
19242 sh

Signal handler function received SIG# 15
pgrep -l -P 19181
Sending signal 15 to child processes (last)
pkill -15 -P 19181
Sending SIGKILL to child processes (last)
pkill -9 -P 19181

I checked the memory and it is passed ok (2GB/cpu). Can you give me any hint of where to look for this error?

All the best. Oscar

Please Log in or Create an account to join the conversation.

More
1 week 6 days ago #709 by nagypeter
Dear Oscar!

Seems like the mrcc process received a termination signal from the OS or scheduler, which is often caused by a job running out of the allowed memory (of the entire node or the scheduled job.)

Are you using a scheduler? If so which one? How much is the available memory of the node/job?
Perhaps CFOUR is using up some of the memory. For that job I do not see a reason to use the interface. Can you try to run the same job with the standalone MRCC?

Hope some of this helps,
Peter

Please Log in or Create an account to join the conversation.

More
1 week 6 days ago #710 by ONVentura
Hello Peter!

Yes! It definitely did. I simply got rid of the man in the middle and ran directly MRCC instead of going through CFOUR. It ran without a glitch!

Thanks a lot!

Please Log in or Create an account to join the conversation.

Time to create page: 0.018 seconds
Powered by Kunena Forum