"Fatal error in ovirt" during big CCSDT(Q)

More
1 year 7 months ago - 1 year 7 months ago #430 by Nike
I am getting an error with no description when running CCSDT(Q) on a system with 754 spatial orbitals. It took 7 days to do the AO to MO transformation, and then it crashed with no information about what went wrong:
RETURNING FROM SCF ALGORITHM
 ======================================================================

 ************************ 2017-04-02 04:31:15 *************************
 Executing ovirt...

 ovirt, the routine of orbital optimization and integral transformation
Sun Apr  2 04:31:15 EDT 2017
 Allocated memory:                  716800  Mb
 UHF calculation!
 integral transformation: AOs --- MOs (alpha-alpha)
 # of basis functions, # of int. blocks   754     4
 25 %
 50 %
 75 %
100 %
 second part
 25 %
 50 %
 75 %
100 %
 integral transformation is completed!
 integral transformation: AOs ------ MOs (beta-beta)
 # of basis functions, # of int. blocks   754     4
 25 %
 50 %
 75 %
100 %
 second part
 25 %
 50 %
 75 %
100 %
 integral transformation is completed!
 integral transformation: AOs ----- MOs (alpha-beta)
 size                2274064                     1
# of basis functions, # of int. blocks   754     4
 integral transformation
========================================

 Fatal error in ovirt.
 Program will stop.

 ************************ 2017-04-09 21:18:17 *************************
                   Error at the termination of mrcc.
 **********************************************************************

Is there any way to determine what the cause of the error was?
I suspect it might just be too big of a problem, since this error did not occur when doing just CCSDT.
However, if you know already that there is some limit beyond 1000 spin-orbitals (for example) due to something like int32 rather than int64, for example, I would be interested to understand what is going on.

My MINP and GENBAS are below:
# TITLE
basis=aug-cc-pCV7Zi
uncontract=off
calc=CCSDT(Q)
mem=700GB
core=corr
cctol=9
ccmaxit=999
scfmaxit=9999
scftype=ROHF
scfiguess=restart
rohftype=semicanonical
mult=3
rest=2
geom
Li
Li 1 R

R=4.1700

unit=angstroms

LI:aug-cc-pCV7Zi
k,Li,0.3320,0.1719

 14
    0    0    1    1    2    2    3    3    4    4    5    5    6    6
    9    6    8    6    7    5    6    4    5    3    4    2    3    1
   19    6   11    6    7    5    6    4    5    3    4    2    3    1

          245172. 35782.1 7999.22 2229.05 715.097
          254.300 98.2097 40.4968 17.5830 7.94333
          3.68942 1.74673 0.839107 0.399736 0.120210
          0.0680452 0.0348770 0.0174420 0.0054

0.00000130 0.00000020 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00001044 0.00000163 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00005612 0.00000877 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00024246 0.00003788 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00090369 0.00014136 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00299226 0.00046826 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00894604 0.00140646 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.02436390 0.00385493 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.05971695 0.00962266 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.12761085 0.02118297 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.22760455 0.04028382 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.31591578 0.06437613 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0        0.0        1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0        0.0        0.0 1.0 0.0 0.0 0.0 0.0 0.0
0.0        0.0        0.0 0.0 1.0 0.0 0.0 0.0 0.0
0.0        0.0        0.0 0.0 0.0 1.0 0.0 0.0 0.0
0.0        0.0        0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0        0.0        0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0        0.0        0.0 0.0 0.0 0.0 0.0 0.0 1.0

          72.0571 39.6354 21.8016 11.9921 6.5963 3.6283

1.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 1.0

          57.1550 13.5136 4.32567 1.58025 0.610225
         0.250245 0.110681 0.0519838 0.0253946 0.0124506
         0.0039

0.00008775 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00073321 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00353008 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.01153326 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00000000 1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00000000 0.0 1.0 0.0 0.0 0.0 0.0 0.0
0.00000000 0.0 0.0 1.0 0.0 0.0 0.0 0.0
0.00000000 0.0 0.0 0.0 1.0 0.0 0.0 0.0
0.00000000 0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.00000000 0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.00000000 0.0 0.0 0.0 0.0 0.0 0.0 1.0

         78.7240 41.3309 21.6991 11.3923 5.9811 3.1401

1.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 1.0

         0.9900 0.5445 0.2995 0.1648 0.0906 0.0498 0.0274

1.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 1.0

         51.9421 25.1052 12.1341 5.8648 2.8346

1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0

         0.8899 0.4895 0.2692 0.1481 0.0815 0.0448

1.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 1.0

         36.4855,16.4805,7.4442,3.3626

1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0

         0.6957 0.3827 0.2105 0.1158 0.0639

1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0

         25.5554 10.4854 4.3022

1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0

         0.4960 0.2728 0.1501 0.0825

1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0

        18.4523 6.8097

1.0 0.0
0.0 1.0

         0.3922 0.2157 0.1187

1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0

       14.8820

1.0

If you want to test and need access to a node with sufficient RAM, I would be happy to set you up.

With best wishes!!
Nike Dattani
Last edit: 1 year 7 months ago by Nike.

Please Log in or Create an account to join the conversation.

More
1 year 7 months ago #431 by rolik
Dear Nike,

the ovirt does the integral transformation. The ovirt should work exactly in the same way for the CCSDT and the CCSDT(Q) calculations. As it worked for CCSDT it is either a bug causing a non-deteministic error or the ovirt simply run out of disk space.
Can you check this second option?

Best wishes, Zoltan Rolik

Please Log in or Create an account to join the conversation.

More
1 year 7 months ago #434 by Nike
Dear Zoltan,
Thank you very much for your reply !!!
I have more disk space in the /scratch/ directory, but the files are written in my main directory.
Is there a way to change the path of where the fort.* files are written? I searched "scratch" and "directory" in the MRCC documentation but could not find anything.
With best wishes!
Nike Dattani

Please Log in or Create an account to join the conversation.

More
1 year 7 months ago #442 by jcsontos
Dear Nike,

Each and every file generated during an MRCC run is written into the directory where the MINP file is located (this is the folder where you started the job). Therefore, if you want them in your scratch drive you should start the MRCC job there.
If you have a job queuing system, please make sure that you copy the output back into your home directory before the scratch is purged by the system.

Best,
Jozsef

Please Log in or Create an account to join the conversation.

More
1 year 6 months ago #445 by Nike
Dear Jozsef,
Thank you very much for the reply!
I just got back from a conference and am now catching up on emails.

Many programs, such as MOLPRO and DIRAC, allow us to choose a directory for the temporary files, different from where the input, output and data (basis set) are. It would be convenient in MRCC as well since the fort.* can get huge.

However, I'm happy for now to run the jobs in /scratch/ and then move the final results to /home/ later.

I still believe there is a problem when getting to huge basis sets, in which more disk space will not help, but I will try everything again to make myself completely sure before writing again.

With best wishes!
Nike

Please Log in or Create an account to join the conversation.

Time to create page: 0.023 seconds
Powered by Kunena Forum