× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Restarts during perturbative corrections?

More
5 months 1 week ago #678 by Nike
Dear all,
I understand that if a job crashes during the perturbative corrections, then using rest=1 will mean MRCC goes back to the iterative step for 1 iteration, then starts the perturbative correction all the way from the beginning (spin case 1).

I have just had a job crash on "spin case 19" after about 22 days (32000 minutes) of calculations to do spin cases 1-18:
Perturbative corrections are calculated...
 ======================================================================
 Spin case  1   Alpha:  2   Beta:  4
 Number of excitations:        9674854
 CPU time [min]: 18814.564                   Wall time [min]:  9849.788
 ======================================================================
 Spin case  2   Alpha:  3   Beta:  3
 Number of excitations:        53660880
 CPU time [min]: 20286.653                   Wall time [min]: 10593.980
 ======================================================================
 Spin case  3   Alpha:  4   Beta:  2
 Number of excitations:        29030306
 CPU time [min]: 21454.478                   Wall time [min]: 11180.785
 ======================================================================
 Spin case  4   Alpha:  5   Beta:  1
 Number of excitations:        1377072
 CPU time [min]: 21644.188                   Wall time [min]: 11276.145
 ======================================================================
 Spin case  5   Alpha:  2   Beta:  4
 Number of excitations:        18634890
 CPU time [min]: 22963.355                   Wall time [min]: 11946.315
 ======================================================================
 Spin case  6   Alpha:  2   Beta:  4
 Number of excitations:        41794692
 CPU time [min]: 25978.119                   Wall time [min]: 13466.476
 ======================================================================
 Spin case  7   Alpha:  3   Beta:  3
 Number of excitations:        160982640
 CPU time [min]: 31526.117                   Wall time [min]: 16290.082
 ======================================================================
 Spin case  8   Alpha:  3   Beta:  3
 Number of excitations:        167173416
 CPU time [min]: 36412.186                   Wall time [min]: 18741.087
 ======================================================================
 Spin case  9   Alpha:  4   Beta:  2
 Number of excitations:        120744756
 CPU time [min]: 41132.122                   Wall time [min]: 21124.516
 ======================================================================
 Spin case 10   Alpha:  4   Beta:  2
 Number of excitations:        58067420
 CPU time [min]: 43282.776                   Wall time [min]: 22209.112
 ======================================================================
 Spin case 11   Alpha:  5   Beta:  1
 Number of excitations:        7463460
 CPU time [min]: 44156.062                   Wall time [min]: 22646.983
 ======================================================================
 Spin case 12   Alpha:  5   Beta:  1
 Number of excitations:        1327932
 CPU time [min]: 44337.005                   Wall time [min]: 22737.989
 ======================================================================
 Spin case 13   Alpha:  2   Beta:  4
 Number of excitations:        8318486
 CPU time [min]: 44885.268                   Wall time [min]: 23013.016
 ======================================================================
 Spin case 14   Alpha:  2   Beta:  4
 Number of excitations:        80495544
 CPU time [min]: 51382.765                   Wall time [min]: 26288.737
 ======================================================================
 Spin case 15   Alpha:  2   Beta:  4
 Number of excitations:        62686156
 CPU time [min]: 55646.355                   Wall time [min]: 28444.712
 ======================================================================
 Spin case 16   Alpha:  3   Beta:  3
 Number of excitations:        149058000
 CPU time [min]: 60427.369                   Wall time [min]: 30840.410
 ======================================================================
 Spin case 17   Alpha:  3   Beta:  3
 Number of excitations:        501528416
 CPU time [min]: 77089.610                   Wall time [min]: 39213.258
 ======================================================================
 Spin case 18   Alpha:  3   Beta:  3
 Number of excitations:        160979568
 CPU time [min]: 82699.806                   Wall time [min]: 42054.491
 ======================================================================
 Spin case 19   Alpha:  4   Beta:  2
 Number of excitations:        174126680

 Fatal error in mrcc.
 Program will stop.

 ************************ 2019-02-04 08:03:39 *************************
                   Error at the termination of mrcc.
 **********************************************************************

I wonder if it's possible to make it so that we can save the results from spin cases 1-18, and then when we restart, just start with spin-case 19 immediately?

Also, I wonder if the energy contributions for each spin-case can be printed when the spin-case finishes?

The reason why is that I don't want to wait another 22 days to do spin-cases 1-18 all over again, and in the cases where there's only 19 spin-cases in total, then an estimate based on spin-cases 1-18 seems good enough. (of course ideally we'd do all 19 spin-cases, but if it means we have to restart from spin-case 1 and wait 22 days, we would rather just use the energy estimate coming from the first 18/19 spin cases).

Finally, is there an equation we can refer to in order to understand the different spin cases? I have looked at:

M. Kállay and J. Gauss (2008) Approximate treatment of higher excitations in coupled-cluster theory. II. Extension to general single-determinant reference functions and improved approaches for the canonical Hartree–Fock case. J. Chem. Phys. 129, pp. 144101, in the context of the notation in:

M. Kállay and J. Gauss (2005) Approximate treatment of higher excitations in coupled-cluster theory. J. Chem. Phys. 123, pp. 214105.

and:
Y. J. Bomble, J. F. Stanton, M. Kállay and J. Gauss (2005) Coupled cluster methods including non-iterative approximate quadruple excitation corrections. J. Chem. Phys. 123, pp. 054101.

I understand usually we only have 3 spin-cases and it becomes more spin-cases when we don't have enough RAM to do the 3 spin-cases fully, but what are the new spin-cases and how are they designed?

With best wishes!
Nike

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #679 by kallay
Dear Nike,
Unfortunately, it is not possible to restart the perturbative corrections.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

More
4 months 3 weeks ago #681 by Nike
Dear Mihaly,
Is it at least possible to print out the energy contribution from each spin case? I have a case where 125 spin cases out of 127 completed, before the crash. The calculation took about 100 days. I started the calculation again, but if it crashes after 125 or 126 spin cases, I'd like to be able to add up the energy contributions from spin cases 1-125 as an "estimate" of what I would get with 127 spin cases.

With best wishes,
Nike

Please Log in or Create an account to join the conversation.

More
4 months 3 weeks ago #685 by kallay
Dear Nike,
Please edit file pert.f, print out variables corr1 and corr2 after line 601, and recompile the code. This will write out the cumulated perturbative corrections.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

More
3 months 3 weeks ago #687 by Nike
Dear Mihaly,
Thanks for the suggestion.
I have added the two WRITE statements below:
corr1=corr1+fct*sum1
          corr2=corr2+fct*sum2
         else
          corr2=corr2+sum2
         endif
       endif
        write(iout,"(' corr1 contribution:    ',f18.12)") corr1
        write(iout,"(' corr2 contribution:    ',f18.12)") corr2
       enddo !while

and re-compiled successfully but did not see any new output when I ran the program.


Also, I can see why one would want to parallelize within each spin case if there's only 1 or 2 spin cases, but when the # of spin cases is about the same as (or greater than) the number of cores, should we parallelize over the spin cases rather than parallelizing within the spin cases? Here is an example with 2 cores:
Perturbative corrections are calculated...
 ======================================================================
 Spin case  1   Alpha:  3   Beta:  2
 Number of excitations:        464494585
 CPU time [min]: 10291.275                   Wall time [min]:  5419.409
 ======================================================================
 Spin case  2   Alpha:  4   Beta:  1
 Number of excitations:        113869560
 CPU time [min]: 12329.565                   Wall time [min]:  6445.038
 ======================================================================
 Spin case  3   Alpha:  3   Beta:  2
 Number of excitations:        1117504190
 CPU time [min]: 24282.021                   Wall time [min]: 12538.886
 ======================================================================
 Spin case  4   Alpha:  3   Beta:  2
 Number of excitations:        755944353
 CPU time [min]: 31961.789                   Wall time [min]: 16453.636
 ======================================================================
 Spin case  5   Alpha:  4   Beta:  1
 Number of excitations:        368830759
 CPU time [min]: 39072.617                   Wall time [min]: 20057.350
 ======================================================================
 Spin case  6   Alpha:  4   Beta:  1
 Number of excitations:        91754365
 CPU time [min]: 40736.924                   Wall time [min]: 20900.330
 ======================================================================
 Spin case  7   Alpha:  3   Beta:  2
 Number of excitations:        876483136
 CPU time [min]: 49811.214                   Wall time [min]: 25510.303
 ======================================================================
 Spin case  8   Alpha:  3   Beta:  2
 Number of excitations:        1818685774
 CPU time [min]: 69571.559                   Wall time [min]: 35771.351
 ======================================================================
 Spin case  9   Alpha:  3   Beta:  2
 Number of excitations:        300908380
 CPU time [min]: 72551.242                   Wall time [min]: 37269.068
 ======================================================================
 Spin case 10   Alpha:  4   Beta:  1
 Number of excitations:        438212395
 CPU time [min]: 80907.590                   Wall time [min]: 41475.333
 ======================================================================
 Spin case 11   Alpha:  4   Beta:  1
 Number of excitations:        297219400

Fatal error in mrcc.
 Program will stop.

 ************************ 2019-03-18 18:24:40 *************************
                   Error at the termination of mrcc.
 **********************************************************************

The speed-up we get by dividing CPU Time by Wall Time, is only 1.89. Wouldn't it be a perfect 2x speed-up if we did each spin case on a different core (perhaps load balanced so that each core has to do roughly the same number of total excitations?

The situation is much more pronounced when we have more cores, here I used 11 cores:

Perturbative corrections are calculated...
 ======================================================================
 Spin case  1   Alpha:  3   Beta:  2
 Number of excitations:        1539990258
 CPU time [min]:110395.983                   Wall time [min]: 14322.505
 ======================================================================
 Spin case  2   Alpha:  3   Beta:  2
 Number of excitations:        1540225714
 CPU time [min]:137626.477                   Wall time [min]: 16946.664
 ======================================================================
 Spin case  3   Alpha:  4   Beta:  1
 Number of excitations:        759732452
 CPU time [min]:153466.773                   Wall time [min]: 18457.198
 ======================================================================
 Spin case  4   Alpha:  3   Beta:  2
 Number of excitations:        3901717692
 CPU time [min]:226253.656                   Wall time [min]: 25559.211
 ======================================================================
 Spin case  5   Alpha:  3   Beta:  2
 Number of excitations:        2625110920
 CPU time [min]:273343.943                   Wall time [min]: 30067.927
 ======================================================================
 Spin case  6   Alpha:  3   Beta:  2
 Number of excitations:        3902968798
 CPU time [min]:342327.476                   Wall time [min]: 37179.708
 ======================================================================
 Spin case  7   Alpha:  3   Beta:  2
 Number of excitations:        2625652424
 CPU time [min]:388852.180                   Wall time [min]: 41649.681
 ======================================================================
 Spin case  8   Alpha:  4   Beta:  1
 Number of excitations:        2583772872
 CPU time [min]:443175.096                   Wall time [min]: 46691.254
 ======================================================================
 Spin case  9   Alpha:  4   Beta:  1
 Number of excitations:        643242402
 CPU time [min]:456705.232                   Wall time [min]: 47999.934
 ======================================================================
 Spin case 10   Alpha:  3   Beta:  2
 Number of excitations:        3247203320

 Fatal error in mrcc.
 Program will stop.

 ************************ 2019-01-28 11:48:09 *************************
                   Error at the termination of mrcc.
 **********************************************************************

We can see the CPU time vs Wall time for the first spin case is only 7.7x speed-up (110395.983/14322.505 = 7.7). Whereas if we did 11 spin cases on 11 different cores, would get a perfect 11x speed-up. In this case there was 22 total spin cases to do, so we could have easily had each core doing a separate spin case.

With best wishes!
Nike

Please Log in or Create an account to join the conversation.

More
3 months 3 weeks ago #688 by kallay
Dear Nike,
What sort of calculation are you running? I have tested it for CCSDT(Q), and it works correctly.
Concerning the spin cases: please note that the division of the spin cases is related to the memory requirements and not to parallelization.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

Time to create page: 0.020 seconds
Powered by Kunena Forum