If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

Best practices for parallel performance/scaling

  • MXvo5e35
  • Topic Author
  • Offline
  • New Member
  • New Member
More
3 years 2 months ago #1132 by MXvo5e35
OK, thanks for the clarification. I do appreciate the pointers!

Your idea re: the use of DF for extrapolation is indeed interesting. In fact, the basic setting of my problem revolves around calibration of a composite scheme, so I'm already considering these approaches. As I mentioned, DF adds some more caveats to the extrapolation process, and I have to admit that I'm not entirely up to speed on the theory. Do you possibly have a reference investigating the accuracy of e.g. CBS extrapolation schemes using DF vs. those without?

(Also, a very beginner question... Does the use of DF adjust the overall scaling of the various schemes? For example, CCSD(T) goes as O(nocc^3 nvirt^4) -- does DF change this somehow? I would guess not...?)

Re: CCSD and CCSD(T) scaling. I've been running some more test jobs. The optimised (conventional, disk-based) CCSD code is certainly competitive for performance in a shared-memory setting, but disk I/O is indeed the bottleneck, even using a fast local SSD for storage. As you suggest, the on-disk load of the various integrals becomes prohibitively high around about 650 basis functions, and at that point, performance suffers relative to less disk-intensive CCSD(T) implementations in other codes such as NWChem. (Not a complaint, just an observation.)

For the higher-order calculations, I've been able to get up to CCSDT(Q) for 200+ basis functions (water with cc-pV5Z, so relatively few occupied orbitals) done without a problem. There seems to be more room to scale up here too.

Again, thanks for the info!

Please Log in or Create an account to join the conversation.

  • nagypeter
  • Offline
  • Premium Member
  • Premium Member
  • MRCC developer
More
3 years 2 months ago #1133 by nagypeter
Replied by nagypeter on topic Best practices for parallel performance/scaling
There are a number of studies assessing also the accuracy of DF-CCSD(T), you can start e.g. with these:
pubs.acs.org/doi/10.1021/ct400250u
aip.scitation.org/doi/10.1063/1.4820484
aip.scitation.org/doi/10.1063/1.4905005


DF-CCSD(T) still scales the same as conventional CCSD(T).
The prefactor of some DF-CCSD steps is a bit lower, but the main benefit comes from the much smaller storage requirement of the integrals. In our implementation the I/O is basically eliminated (meaning you will hit an operation count or memory bottleneck much sooner).
Consequently the parallel scaling is also great and not limited by I/O or network speed, thus 1000-1500 orbitals become reachable with your hardware. Many more details about our code are given here:
pubs.acs.org/doi/abs/10.1021/acs.jctc.9b00957
pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01077

Please Log in or Create an account to join the conversation.

Time to create page: 0.039 seconds
Powered by Kunena Forum