Speeding up the reading of data with HDF5

More
2 months 2 weeks ago #714 by Nike
Greetings!
I'm doing frozen core H2O in cc-pV8Z, which has 1384 spin orbitals and a fort.55 that is 2.4T in size. I am currently at this part:
Reading integral list from unit 55...
 Warning! Executing out-of-core algorithm!
 Reading integral list: cycle  1 of  6

We can't avoid using the out-of-core algorithm because the node doesn't have 2.4T of RAM, so we are reading the integral lists in 6 separate cycles. This takes an extremely long time. The only executable running right now is "mrcc" which I presume is the executable that reads the integrals, but it is only using one core !!!

I regularly read 2.4T check-point files when doing FCIQMC calculations, and it does not need to take very long at all to read and write 2.4T files when using Parallel-HDF5. This is an MPI library that allows I/O to be done in parallel with hundreds of cores. What is currently taking days for MRCC to read, should be possible in just a few minutes.

I would be keen to have the I/O in MRCC to use Parallel-HDF5. This would not just help with reading integrals, but also with the I/O for fort.30, fort.12, fort.13, fort.18, fort.19 and fort.16 which can be very huge, and can take up almost all of the program's wall time.

Is it going to be possible to have someone work on this?
I would also be happy to help with this, but I believe it would be difficult to coordinate remotely so it might require for me to spend another week or two in Budapest.

With best wishes,
Nike

Please Log in or Create an account to join the conversation.

More
2 months 2 weeks ago #715 by Nike
I may also like to point out, that while the fort.55 ascii file is 2.4T, the CFOUR binary file MOABCD is only 418G. So we could certainly save a lot of space (and a huge amount of time with reading and writing) if we switch to HDF5, which would be even more compact than the 418G binary file, and faster to read and write when using multiple cores. I suspect the HDF5 integrals file would be about 300G and would take less than a minute to read. The 2.4T fort.55 file took 16 hours (with a single core using 100% cpu) to read cycle 1/6. Then it took 1 hour of sorting (also with only 1 core with 100% cpu usage). This means it will take almost 6 days to read the fort.55 even though with HDF5 it would likely take less than 1 minute.

Please Log in or Create an account to join the conversation.

More
1 month 3 weeks ago #718 by Nike
Update: The 2.4T fort.55 for H2O/cc-pV8Z which took 16 hours x 6 cycles to read, became 8.5T at cc-pV9Z. It took 53 hours to finish cycle 1/30 !! I will be lucky if the CCSD iterations can start in 66 days (assuming the cluster doesn't have to be shut down for maintenance and the job doesn't crash). The situation isn't as bad in CFOUR, where the integrals are in binary format and take up much less space, but the problem in both cases is that the integrals are being read with only 1 CPU. With the Parallel-HDF5 MPI library, the integrals can be read with hundreds of cores spread across many nodes (even the 40 cores on my single node may bring the 66 days down to 2 days).

Please Log in or Create an account to join the conversation.

Time to create page: 0.019 seconds
Powered by Kunena Forum