Calculix build with Pardiso ILP64 for large models

Hello CalculiX community!!

First, I just want to express how much I appreciate this discourse and its contributors, as they have helped me immensely. So, thank you all.

I am trying to run linear elastic models with a relatively high number of elements (>1M elements) and having what seems to be a common issue where the analysis runs without throwing an error, but writes all zeros to the results when non zero results are expected as lower element count models produce more realistic but unrefined results.

I linked similar threads below, but I cannot seem to find a definitive resolution.

Following suggestions in those discussions, I compiled CalculiX 2.22 with Intel MKL PARDISO on Ubuntu/WSL2 using Intel oneAPI 2025.3, based on the build instructions from @feacluster’s site. My goal is to obtain the most stable direct solver configuration rather than the fastest one, so I am currently running single-threaded with ILP64 MKL PARDISO.

Unfortunately, I cannot significantly reduce the mesh size, as this level of refinement is required for the problems. The analyses themselves is relatively simple:

  • Linear elastic
  • Static direct steps
  • Dirichlet symmetry boundary conditions
  • Initial stress field applied

There are two analysis steps, and both remove elements to simulate stages of stress relaxation. A substantial number of elements are removed in the second step, which makes me suspect this may be related to PARDISO memory usage or out-of-core behavior rather than a modeling error.

The runtime environment variables I am using are:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export MKL_DYNAMIC=FALSE

export MKL_PARDISO_OOC_PATH=/home/…/temp_solver
export MKL_PARDISO_OOC_MAX_CORE_SIZE=16000
export MKL_PARDISO_OOC_MAX_SWAP_SIZE=250000

I also configured the WSL environment to allow sufficient memory for the analysis.

The solver runs to completion without reporting an error, but the resulting displacement field is entirely zero. Smaller models behave correctly.

For reference, the executable is linked against MKL ILP64:

libmkl_intel_ilp64
libmkl_core
libmkl_intel_thread

This is the first time I have built any project from source in Linux, so I apologize if I have overlooked something obvious.

If anyone has encountered similar behavior with large models or has suggestions for debugging this issue, I would greatly appreciate any guidance.

Thank you again for your help.

The threads I have referenced.

If you can share the model I can test it out as there could one of many things going on…( you mentioned WSL, Intel, Pardiso, out of core etc ).. Each can be a problem…

1 Like

Way back I had a similar problem. Turned out to be an issue that compilation needing I8 flag set for larger models with MKL pardiso. An array somewhere in this version of that solver is not set to I8. Craps out somewhere over 500,000 nodes. Is slower and requires more memory with I8 than I4 default. I used virtual memory too, but at about 1,500,000 nodes it was unusably slow, so the I8 solution is appropriate for a limited range of model sizes. Your problems, versions, results and your system vary, of course.

1 Like

Maybe also relevant: RESTART does not write resutls with ccx_i8 - Analysis issues

pardiso_64() should be called in that case. We’ll adress this in the future.

1 Like

Thank you for the help @feacluster and sorry my delayed response.

Here is a link to generic analyses with coarse and finer meshes. Please excuse the poor mesh refinement, but the finer mesh yields the results described above.

It is interesting that the issue is present in step 1 (relaxation step) and step 2 removal of elements (Elset = HoleElems). I have also tried reordering the steps, since it is a linear analysis, but the problem persists. This brings up a question, does Calculix physically remove elements from the global stiffness matrix when *MODEL CHANGE, REMOVE is called, or does it push the element stiffness to zero for those corresponding elements?

Thank you @MichaelPE for the reply. I do not see anything with an I4 flag in the Makefile or pardiso.c at first glance but I will keep looking?

These are my current flags in the Makefile:

CFLAGS = -w -D_POSIX_C_SOURCE=199309L -O3 -std=c90 -fopenmp -DARCH=“Linux” -DINTSIZE64 -DMKL_ILP64 -DPARDISO -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1
FFLAGS = -w -O2 -fopenmp -i8

Thank you @Durbul. Sorry I am a little confused. Are you suggesting pardiso should be called in pardiso.c as:

FORTRAN(pardiso_64,

where as it is currently called as:

FORTRAN(pardiso,

?

Yes, according to Intel’s developer reference, this is the consistent function to call (see pardiso_64). I cannot guarantee that it will fix your specific issue (in fact I haven’t experienced any issues so far with the standard pardiso call), but it’s worth a try and something I wanted to check anyway.

Okay, thank you. I will try it.

I built everything with ILP64 flags end-to-end so it does not seem necessary to call Pardiso as pardiso_64 based on intel’s documentation.

I don’t know. Someone else (3rav?) on the MecWay discourse recompiled it for me and the info is there (CCX_2.18_Pastix_static_i8). Note I have 64 GB of ram. Something about an overflow of a matrix index beyond I4 perhaps in one of the dll’s linked for pastix. I have not been running large FEM for quite a while so not the latest version of Calculix. Mostly used this when testing how to get the most from large models. Since it works well only for a limited range of model sizes, I just use the latest standard version.of Callculix as the default solver and let big models run during dinner or overnight. I discovered a few things though: Best results with only 4 or 6 cores (Not 5!), Speed for larger problems depends mostly on memory bandwidth, so two channel memory machines can be slower. Memory used can be somewhat larger than memory available due to different parts of the program or other programs being in memory at nearly the same time, but once your active arrays and the part of the code operating on them gets bigger than virtual memory size, the system approaches thrashing and progress slows drastically. A faster SSD does not seem to help much, probably system overhead with the pageing, pcie, or SATA is not good.

1 Like

Okay, thank you @MichaelPE

Any updates @feacluster? I am in the process of obtaining more memory. I will report back.

The latest I have found and tested ccx_static.exe, a pastix version executable for ccx 2.23 compiled Fri Oct 24 20:12:25. If I recall it was on github in the latest Calculix2.23_4Win zip. It seems to have been compiled for Computational models. Runs fine on my AMD 3700X. Seems to converge better than the ccx_2.18 and a bit faster than DP mode did in the past.
CPU: AMD Opteron 6180 - Intel MKL
GPU: Nvidia K40 GK1108L - CUDA 8.0
Low rank parameters:
Strategy No compression

2 Likes