- Posts: 10
- Thank you received: 0
If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
This information really helps us during troubleshooting
- the way mrcc was invoked
- the way build.mrcc was invoked
- the output of build.mrcc
- compiler version (for example: ifort -V, gfortran -v)
- blas/lapack versions
- as well as gcc and glibc versions
This information really helps us during troubleshooting
Threading-related segfaults in higher-order CC(n)(n+1) calculations
- MXvo5e35
- Topic Author
- Offline
- New Member
Less
More
3 years 4 months ago #1118
by MXvo5e35
Threading-related segfaults in higher-order CC(n)(n+1) calculations was created by MXvo5e35
I'm run into consistent segfaults when running some CC(n)(n+1) calculations using "many" threads (errors most persistently occurring above OMP_NUM_THREADS > 16). Truncated output of a sample run with OMP_NUM_THREADS=32 follows (see bottom of post for MINP):
The issue doesn't seem to be related to bad compilation, as I can reproduce it with the pre-built binaries as well as a hand-compiled version of MRCC (using various recent versions of the Intel suite). I've also tried it on multiple machines (Ubuntu and CentOS), and have tested both the patched binaries and the patched source files. The error persists consistently.
I notice that for lower numbers of threads -- where these runs don't segfault -- the output explicitly indicates that OpenMP is being used. This is absent in a failure case. Also, in a failure case, the "allocated memory" seems to be (at least from a quick check) always a round 100MB, which doesn't match the patterns I see for correctly executing cases. For example, with OMP_NUM_THREADS=8:
I'm a brand-new MRCC user so it is possible (likely!) that I'm making a configuration error somewhere. If so, I'd be really grateful to know. With that in mind, here is an MINP file that can reproduce the issue. When run with OMP_NUM_THREADS=32, this breaks on every machine I've tried, from a two-core laptop up to a 32-core cluster node.
Code:
************************ 2021-07-25 22:37:40 *************************
Executing xmrcc...
**********************************************************************
CC(5)(6) calculation
Allocation of 100.0 Mbytes of memory...
Number of spinorbitals: 26
Number of alpha electrons: 5
Number of beta electrons: 5
Spin multiplicity: 1
z-component of spin: 0.0
Spatial symmetry is not used.
Convergence criterion: 1.0E-06
Construction of occupation graphs...
Number of 0 -fold excitations: 1
Number of 1 -fold excitations: 80
Number of 2 -fold excitations: 2160
Number of 3 -fold excitations: 23520
Number of 4 -fold excitations: 123900
Number of 5 -fold excitations: 341712
Total number of configurations: 491373
Calculation of coupling coefficients...
Length of intermediate file (Mbytes): 60.5
======================================================================
Spin case 1 Alpha: 1 Beta: 5
Number of excitations: 2240
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
xmrcc 0000000001505BD3 Unknown Unknown Unknown
xmrcc 00000000014F5A70 Unknown Unknown Unknown
xmrcc 0000000000442431 Unknown Unknown Unknown
xmrcc 000000000043CD2C Unknown Unknown Unknown
xmrcc 000000000041CF83 Unknown Unknown Unknown
xmrcc 00000000004074D2 Unknown Unknown Unknown
xmrcc 0000000000401E52 Unknown Unknown Unknown
xmrcc 00000000015BD146 Unknown Unknown Unknown
xmrcc 00000000015BD735 Unknown Unknown Unknown
xmrcc 0000000000401D29 Unknown Unknown Unknown
Fatal error in exec xmrcc.
Program will stop.
************************ 2021-07-25 22:37:40 *************************
Error at the termination of mrcc.
**********************************************************************
The issue doesn't seem to be related to bad compilation, as I can reproduce it with the pre-built binaries as well as a hand-compiled version of MRCC (using various recent versions of the Intel suite). I've also tried it on multiple machines (Ubuntu and CentOS), and have tested both the patched binaries and the patched source files. The error persists consistently.
I notice that for lower numbers of threads -- where these runs don't segfault -- the output explicitly indicates that OpenMP is being used. This is absent in a failure case. Also, in a failure case, the "allocated memory" seems to be (at least from a quick check) always a round 100MB, which doesn't match the patterns I see for correctly executing cases. For example, with OMP_NUM_THREADS=8:
Code:
************************ 2021-07-25 22:42:37 *************************
Executing mrcc...
**********************************************************************
CC(5)(6) calculation
OpenMP parallel version is running.
Number of CPU cores: 8
Allocation of 82.2 Mbytes of memory...
Number of spinorbitals: 26
Number of alpha electrons: 5
Number of beta electrons: 5
Spin multiplicity: 1
z-component of spin: 0.0
Spatial symmetry is not used.
Convergence criterion: 1.0E-06
Construction of occupation graphs...
...
I'm a brand-new MRCC user so it is possible (likely!) that I'm making a configuration error somewhere. If so, I'd be really grateful to know. With that in mind, here is an MINP file that can reproduce the issue. When run with OMP_NUM_THREADS=32, this breaks on every machine I've tried, from a two-core laptop up to a 32-core cluster node.
Code:
basis=6-31g
calc=cc(5)(6)
mem=4000MB
gauss=spher
symm=off
core=corr
unit=angstrom
geom=xyz
3
O 0.0 0.0 0.0
H +0.75695206 0.0 0.58588362
H -0.75695206 0.0 0.58588362
Please Log in or Create an account to join the conversation.
- kallay
- Offline
- Administrator
- Mihaly Kallay
3 years 4 months ago #1119
by kallay
Best regards,
Mihaly Kallay
Replied by kallay on topic Threading-related segfaults in higher-order CC(n)(n+1) calculations
Please replace the following lines in goldstone.f (lines 3085-86):
write(gfile,*) op,nvirt,nocc,min(max(maxmem,100.d0),
$dble(memory)),ihf
with
write(gfile,*) op,nvirt,nocc,dble(memory),ihf
and recompile the code. That will fix your problem.
write(gfile,*) op,nvirt,nocc,min(max(maxmem,100.d0),
$dble(memory)),ihf
with
write(gfile,*) op,nvirt,nocc,dble(memory),ihf
and recompile the code. That will fix your problem.
Best regards,
Mihaly Kallay
Please Log in or Create an account to join the conversation.
- MXvo5e35
- Topic Author
- Offline
- New Member
Less
More
- Posts: 10
- Thank you received: 0
3 years 4 months ago #1120
by MXvo5e35
Replied by MXvo5e35 on topic Threading-related segfaults in higher-order CC(n)(n+1) calculations
Thank you for the quick response!
That does indeed seem to fix the issue. Very much appreciated.
That does indeed seem to fix the issue. Very much appreciated.
Please Log in or Create an account to join the conversation.
- Nike
- Offline
- Premium Member
Less
More
- Posts: 98
- Thank you received: 3
3 years 4 months ago #1121
by Nike
Replied by Nike on topic Threading-related segfaults in higher-order CC(n)(n+1) calculations
Will this fix be automatically present in the next version, or is it something we should all note down to change ourselves when we download the next version? Has there been any consideration to make available a Git repository so that each user can sync their own branch to the Master repository? I have quite a few custom-made changes in my version (many of them were suggestions made here in the MRCC forum for when I was having a problem with something very specific).
Please Log in or Create an account to join the conversation.
- kallay
- Offline
- Administrator
- Mihaly Kallay
3 years 4 months ago #1122
by kallay
Best regards,
Mihaly Kallay
Replied by kallay on topic Threading-related segfaults in higher-order CC(n)(n+1) calculations
Yes, it will be included in the next release.
Best regards,
Mihaly Kallay
Please Log in or Create an account to join the conversation.
Time to create page: 0.043 seconds