If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Threading-related segfaults in higher-order CC(n)(n+1) calculations

  • MXvo5e35
  • Topic Author
  • Offline
  • New Member
  • New Member
More
3 years 4 months ago #1118 by MXvo5e35
I'm run into consistent segfaults when running some CC(n)(n+1) calculations using "many" threads (errors most persistently occurring above OMP_NUM_THREADS > 16). Truncated output of a sample run with OMP_NUM_THREADS=32 follows (see bottom of post for MINP):
Code:
 ************************ 2021-07-25 22:37:40 *************************  Executing xmrcc...  **********************************************************************  CC(5)(6) calculation       Allocation of 100.0 Mbytes of memory...  Number of spinorbitals:                    26  Number of alpha electrons:                        5  Number of beta electrons:                         5  Spin multiplicity:                     1  z-component of spin:  0.0  Spatial symmetry is not used.  Convergence criterion:  1.0E-06  Construction of occupation graphs...  Number of                     0 -fold excitations:                      1  Number of                     1 -fold excitations:                     80  Number of                     2 -fold excitations:                   2160  Number of                     3 -fold excitations:                  23520  Number of                     4 -fold excitations:                 123900  Number of                     5 -fold excitations:                 341712  Total number of configurations:                 491373  Calculation of coupling coefficients...  Length of intermediate file (Mbytes):      60.5    ======================================================================    Spin case  1   Alpha:  1   Beta:  5  Number of excitations:                   2240 forrtl: severe (174): SIGSEGV, segmentation fault occurred Image              PC                Routine            Line        Source              xmrcc              0000000001505BD3  Unknown               Unknown  Unknown xmrcc              00000000014F5A70  Unknown               Unknown  Unknown xmrcc              0000000000442431  Unknown               Unknown  Unknown xmrcc              000000000043CD2C  Unknown               Unknown  Unknown xmrcc              000000000041CF83  Unknown               Unknown  Unknown xmrcc              00000000004074D2  Unknown               Unknown  Unknown xmrcc              0000000000401E52  Unknown               Unknown  Unknown xmrcc              00000000015BD146  Unknown               Unknown  Unknown xmrcc              00000000015BD735  Unknown               Unknown  Unknown xmrcc              0000000000401D29  Unknown               Unknown  Unknown    Fatal error in exec xmrcc.  Program will stop.    ************************ 2021-07-25 22:37:40 *************************                    Error at the termination of mrcc.  **********************************************************************

The issue doesn't seem to be related to bad compilation, as I can reproduce it with the pre-built binaries as well as a hand-compiled version of MRCC (using various recent versions of the Intel suite). I've also tried it on multiple machines (Ubuntu and CentOS), and have tested both the patched binaries and the patched source files. The error persists consistently.

I notice that for lower numbers of threads -- where these runs don't segfault -- the output explicitly indicates that OpenMP is being used. This is absent in a failure case. Also, in a failure case, the "allocated memory" seems to be (at least from a quick check) always a round 100MB, which doesn't match the patterns I see for correctly executing cases. For example, with OMP_NUM_THREADS=8:
Code:
 ************************ 2021-07-25 22:42:37 *************************  Executing mrcc...  **********************************************************************  CC(5)(6) calculation                                                         OpenMP parallel version is running.  Number of CPU cores:   8  Allocation of   82.2 Mbytes of memory...  Number of spinorbitals:  26  Number of alpha electrons:  5  Number of beta  electrons:  5  Spin multiplicity: 1  z-component of spin:  0.0  Spatial symmetry is not used.  Convergence criterion:  1.0E-06  Construction of occupation graphs... ...

I'm a brand-new MRCC user so it is possible (likely!) that I'm making a configuration error somewhere. If so, I'd be really grateful to know. With that in mind, here is an MINP file that can reproduce the issue. When run with OMP_NUM_THREADS=32, this breaks on every machine I've tried, from a two-core laptop up to a 32-core cluster node.
Code:
basis=6-31g calc=cc(5)(6) mem=4000MB gauss=spher symm=off core=corr unit=angstrom geom=xyz 3 O 0.0         0.0 0.0 H +0.75695206 0.0 0.58588362 H -0.75695206 0.0 0.58588362

Please Log in or Create an account to join the conversation.

  • kallay
  • Offline
  • Administrator
  • Administrator
  • Mihaly Kallay
More
3 years 4 months ago #1119 by kallay
Please replace the following lines in goldstone.f (lines 3085-86):

      write(gfile,*) op,nvirt,nocc,min(max(maxmem,100.d0),
     $dble(memory)),ihf

with

      write(gfile,*) op,nvirt,nocc,dble(memory),ihf

and recompile the code. That will fix your problem.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

  • MXvo5e35
  • Topic Author
  • Offline
  • New Member
  • New Member
More
3 years 4 months ago #1120 by MXvo5e35
Thank you for the quick response!

That does indeed seem to fix the issue. Very much appreciated.

Please Log in or Create an account to join the conversation.

More
3 years 4 months ago #1121 by Nike
Will this fix be automatically present in the next version, or is it something we should all note down to change ourselves when we download the next version? Has there been any consideration to make available a Git repository so that each user can sync their own branch to the Master repository? I have quite a few custom-made changes in my version (many of them were suggestions made here in the MRCC forum for when I was having a problem with something very specific).

Please Log in or Create an account to join the conversation.

  • kallay
  • Offline
  • Administrator
  • Administrator
  • Mihaly Kallay
More
3 years 4 months ago #1122 by kallay
Yes, it will be included in the next release.

Best regards,
Mihaly Kallay

Please Log in or Create an account to join the conversation.

Time to create page: 0.043 seconds
Powered by Kunena Forum