CalculiX with Intel's MPI cluster sparse solver

I have completed the integration of Intel’s cluster sparse solver into CalculiX. To install , run the install script from here and select option 3:

wget https://feacluster.com/install/install
perl install

The prerequisite is you must have Intel’s oneAPI compiler with base and HPC components. This is now free for everyone and does not require root privileges to install.

This graph shows the speed-up you can expect:

All verification models except for one did not converge ( oneel201fi3.inp ) . I believe it is due to round-off error as the model seems to generate very small numbers from the factorization.

Example commands to run on two hosts, where each host has 15 cpus:

export OMP_NUM_THREADS=15
mpirun -np 2 -ppn 1 -hosts=host1,host2 ./ccx_2.18_MPI input_deck

There are some things to be aware of when using it. For best results, keep the number of MPI processes ( -np ) a multiple of the power of 2, i.e 2, 4 or 8 . Using -np 3 can be slower than -np 2 or -np 4 .

Also, you can experiment with more than one MPI process per host and reducing the number of OMP_NUM_THREADS per process accordingly.

For example, on a 40 cpu machine you can try:

export OMP_NUM_THREADS=10
mpirun -np 4 ./$executable input_deck

Some models may crash due to a known bug with Intel’s cluster sparse solver. If that happens, try setting this environment variable:

export PARDISO_MPI_MATCHING=1
2 Likes

Thank you for sharing this!