× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Threading-related segfaults in higher-order CC(n)(n+1) calculations

More
2 months 3 weeks ago #1118 by MXvo5e35
I'm run into consistent segfaults when running some CC(n)(n+1) calculations using "many" threads (errors most persistently occurring above OMP_NUM_THREADS > 16). Truncated output of a sample run with OMP_NUM_THREADS=32 follows (see bottom of post for MINP):
 ************************ 2021-07-25 22:37:40 *************************
 Executing xmrcc...

 **********************************************************************
 CC(5)(6) calculation 
 
 
 Allocation of 100.0 Mbytes of memory...
 Number of spinorbitals:                    26
 Number of alpha electrons:                        5
 Number of beta electrons:                         5
 Spin multiplicity:                     1
 z-component of spin:  0.0
 Spatial symmetry is not used.
 Convergence criterion:  1.0E-06
 Construction of occupation graphs...
 Number of                     0 -fold excitations:                      1
 Number of                     1 -fold excitations:                     80
 Number of                     2 -fold excitations:                   2160
 Number of                     3 -fold excitations:                  23520
 Number of                     4 -fold excitations:                 123900
 Number of                     5 -fold excitations:                 341712
 Total number of configurations:                 491373
 Calculation of coupling coefficients...
 Length of intermediate file (Mbytes):      60.5
 
 ======================================================================
 
 Spin case  1   Alpha:  1   Beta:  5
 Number of excitations:                   2240
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
xmrcc              0000000001505BD3  Unknown               Unknown  Unknown
xmrcc              00000000014F5A70  Unknown               Unknown  Unknown
xmrcc              0000000000442431  Unknown               Unknown  Unknown
xmrcc              000000000043CD2C  Unknown               Unknown  Unknown
xmrcc              000000000041CF83  Unknown               Unknown  Unknown
xmrcc              00000000004074D2  Unknown               Unknown  Unknown
xmrcc              0000000000401E52  Unknown               Unknown  Unknown
xmrcc              00000000015BD146  Unknown               Unknown  Unknown
xmrcc              00000000015BD735  Unknown               Unknown  Unknown
xmrcc              0000000000401D29  Unknown               Unknown  Unknown
 
 Fatal error in exec xmrcc.
 Program will stop.
 
 ************************ 2021-07-25 22:37:40 *************************
                   Error at the termination of mrcc.
 **********************************************************************

The issue doesn't seem to be related to bad compilation, as I can reproduce it with the pre-built binaries as well as a hand-compiled version of MRCC (using various recent versions of the Intel suite). I've also tried it on multiple machines (Ubuntu and CentOS), and have tested both the patched binaries and the patched source files. The error persists consistently.

I notice that for lower numbers of threads -- where these runs don't segfault -- the output explicitly indicates that OpenMP is being used. This is absent in a failure case. Also, in a failure case, the "allocated memory" seems to be (at least from a quick check) always a round 100MB, which doesn't match the patterns I see for correctly executing cases. For example, with OMP_NUM_THREADS=8:
 ************************ 2021-07-25 22:42:37 *************************
 Executing mrcc...

 **********************************************************************
 CC(5)(6) calculation                                                   
 
 
 OpenMP parallel version is running.
 Number of CPU cores:   8
 Allocation of   82.2 Mbytes of memory...
 Number of spinorbitals:  26
 Number of alpha electrons:  5
 Number of beta  electrons:  5
 Spin multiplicity: 1
 z-component of spin:  0.0
 Spatial symmetry is not used.
 Convergence criterion:  1.0E-06
 Construction of occupation graphs...
...

I'm a brand-new MRCC user so it is possible (likely!) that I'm making a configuration error somewhere. If so, I'd be really grateful to know. With that in mind, here is an MINP file that can reproduce the issue. When run with OMP_NUM_THREADS=32, this breaks on every machine I've tried, from a two-core laptop up to a 32-core cluster node.
basis=6-31g
calc=cc(5)(6)
mem=4000MB

gauss=spher
symm=off
core=corr

unit=angstrom
geom=xyz
3

O 0.0         0.0 0.0
H +0.75695206 0.0 0.58588362
H -0.75695206 0.0 0.58588362

Please Log in or Create an account to join the conversation.

More
2 months 3 weeks ago #1119 by kallay
Please replace the following lines in goldstone.f (lines 3085-86):

      write(gfile,*) op,nvirt,nocc,min(max(maxmem,100.d0),
     $dble(memory)),ihf

with

      write(gfile,*) op,nvirt,nocc,dble(memory),ihf

and recompile the code. That will fix your problem.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

More
2 months 3 weeks ago #1120 by MXvo5e35
Thank you for the quick response!

That does indeed seem to fix the issue. Very much appreciated.

Please Log in or Create an account to join the conversation.

More
2 months 3 weeks ago #1121 by Nike
Will this fix be automatically present in the next version, or is it something we should all note down to change ourselves when we download the next version? Has there been any consideration to make available a Git repository so that each user can sync their own branch to the Master repository? I have quite a few custom-made changes in my version (many of them were suggestions made here in the MRCC forum for when I was having a problem with something very specific).

Please Log in or Create an account to join the conversation.

More
2 months 3 weeks ago #1122 by kallay
Yes, it will be included in the next release.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

Time to create page: 0.021 seconds
Powered by Kunena Forum