Can Calculix run across multiple nodes?

I work in the Scientific Consulting Group at NAS. I need to install Calculix locally for one of my users. He wants to use it on our HPC Cluster. Can we use Calculix across nodes? I don’t see any MPI calls in the source. How does this work?

1 Like

I have not yet tried it, but I believe precice should help with that: https://precice.org/fundamentals-literature-guide.html#parallel-and-high-performance-computing

Excellent! Many thanks! Now all I need is a little example which does not require the guy. I am not a calculix user myself, just building and optimizing for my users.

1 Like

It can use MPI with a version I wrote using Intel’s cluster sparse solver . One can expect 10-30% speedup when running on two compute nodes vs. one node. See:

https://www.feacluster.com/calculix.php#4

1 Like

Thanks so much. what processors are you running on?

Dear All,

I’ve been trying to run coupled conjugate heat transfer (CHT) cases using CalculiX, OpenFOAM, and PreCICE on an HPC system. My question concerns running CCX efficiently on an HPC, both on a single node and across multiple nodes, for optimal performance.

For additional details, please check my post on the PreCICE forum: How to run CCX on HPC.

I successfully installed precice_ccx with its dependencies (along with SPOOLES, SPOOLES-MT) on the HPC—some via building from source and others using Spack. I ensured that everything is compiled with the same GCC and MPI versions. I was able to run coupled simulations using SPOOLES with the following SLURM script:

cd solid
export OMP_NUM_THREADS=50
export CCX_NPROC_EQUATION_SOLVER=50
taskset -c 64-127 ccx_preCICE -i solid -precice-participant Solid > log.ccx
cd ..

Although it reports running on 50 cores, the speed-up is lower than expected. (Please check my precice post for complete slurm script.)

My questions:

  1. How can I properly run CCX on an HPC using a SLURM script?
  2. What is the best configuration or setup to achieve maximum speed-up (scalability) for CCX on an HPC?
  3. How can I report execution time or clock time at each timestep?

I would greatly appreciate any guidance, suggestions, or examples from those who have tackled similar issues.

Sincerely,
Umut

Increasing the number of cores above 4 or 8 does not significantly further reduce the elapsed time of ccx jobs. This has been reported by several forum users independent of the Operating System (Windows or LINUX OS). Hence the situation may apply as well for your HPC environment. OMP_NUM_THREADS=4 or OMP_NUM_THREADS=8 may be a good choice in your case to make most efficient use of the HPC hardware.

1 Like

These are my best recommendations:

  1. Based on your setup and test runs, it looks like you’re correctly configuring CCX with SLURM or PBS. As @johanngil mentioned, beyond a certain number of cores, the performance gains tend to plateau.

  2. The optimal setup depends heavily on your hardware. That said, SPOOLES wouldn’t be my first choice for CCX. You might see significantly better performance using Intel’s Pardiso (available via the MKL library / oneAPI). Alternatively, PaStiX is a newer solver that may also provide advantages, though it can be challenging to compile. If you’re interested, this guide might help: guide.

  3. To report execution or clock time at each timestep, you can modify the non-linear geometry function (nonlingeo.c). Specifically, look around line 1480, where the loop for increments is handled. Adding timing functions there should allow you to log execution time per timestep.

Best of luck!

See my post above regarding the MPI version of Calculix I wrote.

Dear all,

Thank you for the valuable insights you’ve shared! I’ve provided a more detailed explanation of my case in this discussion: Running CalculiX and OpenFOAM on HPC.

@johanngil, thank you for your reply. I also suspected that 4–8 nodes might be the best I can achieve, but I’m not fully convinced yet. There are still many aspects I haven’t explored. For example, I’ve failed to run ccx_precice on one node and OpenFOAM on another due to my limited understanding of Slurm script structuring and hostfile usage. Additionally, I haven’t tested Pardiso or PaStiX yet as alternatives to Spooles. Therefore, I plan to dedicate more effort to refining my setup.

@jbr, I appreciate the guide you shared—I will look into it. Regarding comprehensive profiling, @fsimonis in the preCICE forum also suggested an alternative method, which I’ll explore alongside your recommendation.

@feacluster, thank you for your suggestion. I found the link you mentioned and briefly reviewed the content. If I understand correctly, the script installs CCX with the necessary libraries to run it with MPI on multiple nodes, but not ccx_precice specifically. It also appears to require CCX version 2.22 and the removal of related dependencies (ARPACK, Spooles, etc.) before installation. Is this correct? (Apologies if I misunderstood!)

From what I gather, to run a coupled simulation using OpenFOAM, CCX, and preCICE, I need the ccx_precice adapter, which currently only supports CCX 2.20 or earlier versions.

I’m not sure if this is the right thing to ask, but it might be more appropriate to continue discussing this in my initial post here: Running CalculiX and OpenFOAM on HPC.

Looking forward to your thoughts!

Best regards,
Umut

Mostly correct, except my MPI version only works with 2.18, not 2.22… Arpack is not removed though.

1 Like