× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Problems with CCSDT restart

More
5 years 1 month ago #143 by bakowies
Dear developers,
dear Mihaly,

sorry to come back to you so soon again. But there is
another problem that has been bugging me from time to
time. Running CCSDT(Q) calculations I often just converge
the CCSDT part using only a few CPU cores, abort the calculation and
then restart using 16 or more cores. This is because queue
time limits force me to split calculations and also because
I experienced that the (Q) part seems to benefit significantly
more from parallel execution than the preceding CCSDT iterations.
I typically abort the initial calculation once it has printed
the timing for "Spin case 1", because by observation I have
been able to estimate the full (Q) time from the time needed
just for this initial step and the scaling with the number of
cores used.

What I always do is just copy over the "fort.16" into the new
work directory on scratch (identity checked with md5sum) and
then restart the calculation with an appropriately modified
input file (restart option, increased memory for more cores)
and of course the same basis set file.

This often works fine, after SCF and integral transformation
I see the CCSDT converged (or almost so) at the first cycle
of the CCSDT, and the perturbational (Q) part would start right
after that.

In a number of cases, however, this setup has failed. The output
does confirm that it has read start vectors from fort.16
but then seems to start from vectors very far from the solution.
The initial energy and residual norm are different from the
initial energy and residual norm of the first calculation, but
not usually by much.

I attach outputs for the last example where I saw the mentioned
problem. The same binary (2014 version, build.mrcc Intel -OMP
with Intel version 2014 (14.0.1) and MKL libraries) has been used
for both runs (on different nodes of the same cluster).

What I note is that the

a) initial run reports as energy change for the first CCSDT
iteration the difference with respect to the HF solution,
the follow-up run, however, an energy-change that does
not seem to correspond to either the HF solution or any number
printed in either run.

b) a tiny difference between the runs: Only the initial run
prints a "date" line immediately after the statement that
integrals are written to fort.55

Any idea of what might be going wrong? For this particular case
I still have the fort.16 that was written during the initial run
and current at the time of aborting it, so I could test any
suggestion....

Thanks a lot.

Best regards,

Dirk Bakowies

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #144 by bakowies
For some reason I do not see the files that I tried to attach.
I shall try to attach them again....
Dirk

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #145 by kallay
Replied by kallay on topic Problems with CCSDT restart
Dear Dirk,
I would suggest running a CCSDT calculation first, then restarting the CCSDT(Q) calculation from the converged CCSDT amplitudes. That is safe.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

Time to create page: 0.037 seconds
Powered by Kunena Forum