If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Problems requesting large memory for CCSDT(Q)

More
7 years 7 months ago #288 by Nike
Dear Mihaly,
Thank you very much for the prompt reply.
Code:
which integ

gives:
Code:
~/MRCC/integ,

But your first suggestion is probably what happened.

There is something strange going on:

I started using the stand-alone MRCC because with the MOLPRO interface, the job ran for 30 days and the end of the output file is still the same:
Code:
MRCC Input: 4 1 0 0 0 0 0 1 0 1 1 1 0 0 0 7 0 0 0.000 0 67200 ex.lev,nsing,ntrip, rest,method,dens,conver,symm, diag, CS ,spatial, HF, ndoub,nacto,nactv, tol, maxex, sacc, freq, dboc, mem 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 MRCC Input end Variable memory released

I also opened the file /mrcc_9011/iface, and it just says:
Code:
#property method sym st mul value CPU(sec) Wall(sec)

but no entries filled in after 30 days.

In CFOUR I got a bit further:
Code:
OpenMP parallel version is running. Number of CPUs: 24 Allocation of64533.8 Mbytes of memory...

followed by the iterations, but it took 5 days of wall time for the first iteration.

Maybe it was slow because MRCC really wanted 1TB of RAM:
Code:
Memory requirements /Mbyte/: Minimal Optimal Real*8: 1059696.3996 1059696.3996

But since I only allocated 98GB in the CFOUR ZMAT, it kept doing many iterations of goldstone/xmrcc reporting less and less RAM requirements until it got to 64GB, which ran successfully.

Maybe I could have tried to allocate 700GB in CFOUR, but I decided to just try with the stand-alone MRCC binaries. What I got was amazing:

At the FIRST iteration of xmrcc, it only needs 23GB (in CFOUR it needed 1TB and then kept doing iterations of goldstone and xmrcc to bring it down to 64GB):
Code:
Memory requirements /Mbyte/: Minimal Optimal Real*8: 23390.7539 23390.7539,

The iterations are taking only 17 minutes now!

So with CFOUR it first requires 1TB of RAM, then 64GB of RAM, and each iteration takes 5 days.

With standalone MRCC it first requires 23GB of RAM, only 1 execution of goldstone and xmrcc, no memory reduction, and only 17 minutes per iteration.

Does using CFOUR/MOLPRO really have that much overhead ?

Please Log in or Create an account to join the conversation.

  • kallay
  • Offline
  • Administrator
  • Administrator
  • Mihaly Kallay
More
7 years 7 months ago #289 by kallay
Dear Nike,
Did you use different number of CPU cores in the calculations? If so, it explains the problem. Unfortunately the iterative CC part requires a couple of big arrays in as many copies as the number of CPUs, so you should also increase the memory when increasing the number of cores.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago #290 by Nike
Dear Mihaly,
Thank you very much for your reply.

When using stand-alone MRCC, after just 1 iteration of goldstone and xmrcc, the output says:
Code:
OpenMP parallel version is running. Number of CPUs: 24 Allocation of23390.8 Mbytes of memory...

When using MRCC inside CFOUR, goldstone first wants 270,564 MB, but then after several iterations of goldstone and xmrcc, the main code starts and says:
Code:
OpenMP parallel version is running. Number of CPUs: 24 Allocation of64533.8 Mbytes of memory...

That's more RAM than stand-alone MRCC, but same number of cores.
I know the stand-alone MRCC case is indeed using 24 cores because I see:
Code:
CPU time [min]: 33155.161 Wall time [min]: 1614.613

I have attached the files MRCCinCFOUR.txt and MRCCstandAlone.txt.

Is there any temporary solution to the memory allocation problem?

We keep getting the error below:
Code:
Executing integ... Allocation of 700.0 Gbytes of memory... Fatal error in which dmrcc > mrccjunk1. Program will stop. Fatal error in echo " ************************ "`date +"%F %T"`" *************************". Program will stop. echo " ************************ "`date +"%F %T"`" *************************". Program will stop. echo " ************************ "`date +"%F %T"`" *************************". Program will stop. echo " ************************ "`date +"%F %T"`" *************************". Program will stop.

where the last 2 lines repeat several thousand times before we have the chance to kill the job.
We find that by allocating much less memory than the amount on the node, we can get MRCC to work, but this is after a lot of trial and error, and production of these huge output files.

With best wishes,
Nike Dattani

Please Log in or Create an account to join the conversation.

  • kallay
  • Offline
  • Administrator
  • Administrator
  • Mihaly Kallay
More
7 years 7 months ago #294 by kallay
Dear Nike,
Please note that in the standalone MRCC run you used cc-pVDZ basis with 42 orbitals, while in the Cfour-MRCC run you had aug-cc-pVDZ with 69 orbitals, thus the memory requirement is higher for the latter calculation.
Concerning the memory allocation problem on the big machine we have no idea. We can allocate up to 128GB without any problem (unfortunately this is the largest machine we have here). What compiler did you use for the compilation of mrcc or did you use the binary version?

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

Time to create page: 0.043 seconds
Powered by Kunena Forum