# Speeding up the reading of data with HDF5

11 months 3 weeks ago #714 by Nike
Greetings!
I'm doing frozen core H2O in cc-pV8Z, which has 1384 spin orbitals and a fort.55 that is 2.4T in size. I am currently at this part:
Reading integral list from unit 55...
Warning! Executing out-of-core algorithm!
Reading integral list: cycle  1 of  6

We can't avoid using the out-of-core algorithm because the node doesn't have 2.4T of RAM, so we are reading the integral lists in 6 separate cycles. This takes an extremely long time. The only executable running right now is "mrcc" which I presume is the executable that reads the integrals, but it is only using one core !!!

I regularly read 2.4T check-point files when doing FCIQMC calculations, and it does not need to take very long at all to read and write 2.4T files when using Parallel-HDF5. This is an MPI library that allows I/O to be done in parallel with hundreds of cores. What is currently taking days for MRCC to read, should be possible in just a few minutes.

I would be keen to have the I/O in MRCC to use Parallel-HDF5. This would not just help with reading integrals, but also with the I/O for fort.30, fort.12, fort.13, fort.18, fort.19 and fort.16 which can be very huge, and can take up almost all of the program's wall time.

Is it going to be possible to have someone work on this?
I would also be happy to help with this, but I believe it would be difficult to coordinate remotely so it might require for me to spend another week or two in Budapest.

With best wishes,
Nike