Calculix solver breaks/terminates without an error with models larger 250k elements

Hi here,

I am using ccx to solve biomechanic problems.
Currently we have generated INP files with different sized geometries and doing static FE analysis.
This results in element counts between ~100.000 to 400.000 (quadratic tet)
We use PASTIX as a standard solver and most models run smoothly. But larger break up strangely.

The models which break up have element counts greater than 250.000 elements. Smaller models run smoothly. All have the same boundary conditions and contact formulations.
So I suggest, that there has to be a problem with the number of elements?

Normally I start ccx from a python routine. Here it returns
ccx_static.exe: returned non-zero exit status 3221225725
( → Stack overflow / exhaustion. Error can indicate a bug in the executed software that causes stack overflow, leading to abnormal termination of the software.)

If i start ccx directly in a PowerShell calculix and the solver start normally and then the solver just terminates, without any feedback or error (see code attached of one example)

What I tested so far:
Using SPOOLES, PASTIX, PARDISO and INTERATIVE CHOLESKY does not make a difference. All break up with no error at all.
Change the number of threads between 1 and 16 does not effect the breakup
Change between ccx 2.20. and ccx 2.21 did not affect anything
Starting ccx with “-i” did not change anything

My test machine has 16 Cores, 128 GB RAM and enough space on a SSD on Windows

Any search did not help me out of this problem. Does someone have an idea what could cause this breakup?

Thanks and best regards from south Germany, Lucas

…


CalculiX Version 2.21, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm


You are using an executable made on Sat Jul 29 17:18:34 2023

The numbers below are estimated upper bounds

number of:

nodes: 605775
elements: 373698
one-dimensional elements: 0
two-dimensional elements: 0
integration points per element: 4
degrees of freedom per node: 3
layers per element: 1

distributed facial loads: 0
distributed volumetric loads: 0
concentrated loads: 947
single point constraints: 4050
multiple point constraints: 1
terms in all multiple point constraints: 1
tie constraints: 53
dependent nodes tied by cyclic constraints: 0
dependent nodes in pre-tension constraints: 0

sets: 134
terms in all sets: 1661060

materials: 2
constants per material and temperature: 8
temperature points per material: 1
plastic data points per material: 0

orientations: 0
amplitudes: 2
data points in all amplitudes: 2
print requests: 0
transformations: 0
property cards: 0
Decascading the MPC’s

Determining the structure of the matrix:
Using up to 8 cpu(s) for setting up the structure of the matrix.
number of equations
1666587
number of nonzero lower triangular matrix elements
83208279

increment 1 attempt 1
increment size= 1.000000e-02
sum of previous increments=0.000000e+00
actual step time=1.000000e-02
actual total time=1.000000e-02

iteration 1

Number of contact spring elements=570693

Determining the structure of the matrix:
maximal possible contact elements =
570693

Using up to 8 cpu(s) for setting up the structure of the matrix.
number of equations
1666587
number of nonzero lower triangular matrix elements
84612387

Using up to 8 cpu(s) for the stress calculation.

Using up to 8 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
±------------------------------------------------+

  • PaStiX : Parallel Sparse matriX package     +
    

±------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 8
Number of GPUs: 0
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 1666587
nnz: 170891361

±------------------------------------------------+
Ordering step :
Ordering method is: Scotch
Time to compute ordering: 2.8737
±------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: -1298756719
Fill-in of L: -7.599897
Time to compute symbol matrix: 0.6063
±------------------------------------------------+
Reordering step:
Split level: 0
Stoping criteria: -1
Time for reordering: 3.7589
±------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L: 1697453858
Fill-in: 9.932941
Number of operations in full-rank LU : 39.42 TFlops
Prediction:
Model: AMD 6180 MKL
Time to factorize: 712.6018
Time for analyze: 0.0812
±------------------------------------------------+
Factorization step:
Factorization used: LU
Time to initialize internal csc: 2.7088

After using other search terms i found this post:

but have to try if this helps also in static calculations

Using the i8 compiled version of ccx 2.18 in the bconverged windows compilation did not help (CalculiX and PaStiX solver Windows version - #42 by rafal.brzegowy)

also the updated version with the “new frd.c” (Steady state dynamics randomly terminates without error - #43 by rafal.brzegowy) did not change anything

Try:
image

1 Like

Thanks rafal for your suggestion!
I do not see anything going wrong in the allocation?

using ccx 2.20 (same for ccx 2.21):

number of equations
1666587
number of nonzero lower triangular matrix elements
84612387

FREEING of variable next, file mastruct.c, line=869: oldaddress= 691068992
FREEING of variable ipkontot, file remastruct.c, line=106: oldaddress= 192761920
FREEING of variable kontot, file remastruct.c, line=106: oldaddress= 737931328
FREEING of variable lakontot, file remastruct.c, line=106: oldaddress= 803225664
FREEING of variable ipointer, file remastruct.c, line=119: oldaddress= 728141888
FREEING of variable mast1, file remastruct.c, line=119: oldaddress= -991297472
REALLOCATION of variable irow, file remastruct.c, line=120: size(bytes)=338449548, oldaddress= 244887616,address= 244887616
REALLOCATION of variable f, file remastruct.c, line=126: size(bytes)=13332696, oldaddress= 466546752,address= 466546752
REALLOCATION of variable fext, file remastruct.c, line=127: size(bytes)=13332696, oldaddress= 527761472,address= 527761472
REALLOCATION of variable b, file remastruct.c, line=128: size(bytes)=13332696, oldaddress= -691277760,address= -691277760
REALLOCATION of variable fini, file remastruct.c, line=129: size(bytes)=13332696, oldaddress= -658472896,address= -658472896
FREEING of variable nactdofinv, file nonlingeo.c, line=1986: oldaddress= 0
ALLOCATION of variable nactdofinv, file nonlingeo.c, line=1987, num=2423100, size=4, address= 728137792
ALLOCATION of variable nodorig, file nonlingeo.c, line=1988, size=2423100, address= 165703744
FREEING of variable nodorig, file nonlingeo.c, line=1991: oldaddress= 165703744
ALLOCATION of variable v, file nonlingeo.c, line=2017, num=2423100, size=8, address= 737902656
ALLOCATION of variable fn, file nonlingeo.c, line=2029, size=19384800, address= 757391424
ALLOCATION of variable stx, file nonlingeo.c, line=2030, num=22665384, size=8, address= 2119213120
ALLOCATION of variable neapar, file setpardou.c, line=46, num=8, size=4, address= 1175620960
ALLOCATION of variable nebpar, file setpardou.c, line=47, num=8, size=4, address= 1175621008
ALLOCATION of variable ithread, file setpardou.c, line=66, num=8, size=4, address= 1175621344
FREEING of variable ithread, file setpardou.c, line=74: oldaddress= 1175621344
FREEING of variable neapar, file setpardou.c, line=74: oldaddress= 1175620960
FREEING of variable nebpar, file setpardou.c, line=74: oldaddress= 1175621008
ALLOCATION of variable neapar, file setpardou.c, line=46, num=8, size=4, address= 1175620960
ALLOCATION of variable nebpar, file setpardou.c, line=47, num=8, size=4, address= 1175621344
ALLOCATION of variable ithread, file setpardou.c, line=66, num=8, size=4, address= 1175621008
FREEING of variable ithread, file setpardou.c, line=74: oldaddress= 1175621008
FREEING of variable neapar, file setpardou.c, line=74: oldaddress= 1175620960
FREEING of variable nebpar, file setpardou.c, line=74: oldaddress= 1175621344
ALLOCATION of variable neapar, file results.c, line=202, num=8, size=4, address= 1175620960
ALLOCATION of variable nebpar, file results.c, line=203, num=8, size=4, address= 1175621008
ALLOCATION of variable ipar, file elementcpuload.c, line=33, num=944391, size=4, address= 192802880
FREEING of variable ipar, file elementcpuload.c, line=62: oldaddress= 192802880
ALLOCATION of variable fn1, file results.c, line=206, num=19384800, size=8, address= -1994391488
ALLOCATION of variable qa1, file results.c, line=207, num=32, size=8, address= 1176009408
ALLOCATION of variable nal, file results.c, line=208, num=8, size=4, address= 1175621104
ALLOCATION of variable energysms1, file results.c, line=209, num=8, size=8, address= 548265152
Using up to 8 cpu(s) for the stress calculation.

ALLOCATION of variable ithread, file results.c, line=240, num=8, size=4, address= 1175621152
FREEING of variable fn1, file results.c, line=256: oldaddress= -1994391488
FREEING of variable ithread, file results.c, line=256: oldaddress= 1175621152
FREEING of variable neapar, file results.c, line=256: oldaddress= 1175620960
FREEING of variable nebpar, file results.c, line=256: oldaddress= 1175621008
FREEING of variable qa1, file results.c, line=295: oldaddress= 1176009408
FREEING of variable nal, file results.c, line=306: oldaddress= 1175621104
FREEING of variable energysms1, file results.c, line=316: oldaddress= 548265152
ALLOCATION of variable nkapar, file forparll.c, line=44, num=8, size=4, address= 1175620960
ALLOCATION of variable nkbpar, file forparll.c, line=45, num=8, size=4, address= 1175621008
ALLOCATION of variable ithread, file forparll.c, line=66, num=8, size=4, address= 1175621104
FREEING of variable ithread, file forparll.c, line=74: oldaddress= 1175621104
FREEING of variable nkapar, file forparll.c, line=74: oldaddress= 1175620960
FREEING of variable nkbpar, file forparll.c, line=74: oldaddress= 1175621008
ALLOCATION of variable neapar, file cpypardou.c, line=47, num=8, size=4, address= 1175621344
ALLOCATION of variable nebpar, file cpypardou.c, line=48, num=8, size=4, address= 1175621104
ALLOCATION of variable ithread, file cpypardou.c, line=67, num=8, size=4, address= 1175621152
FREEING of variable ithread, file cpypardou.c, line=75: oldaddress= 1175621152
FREEING of variable neapar, file cpypardou.c, line=75: oldaddress= 1175621344
FREEING of variable nebpar, file cpypardou.c, line=75: oldaddress= 1175621104
ALLOCATION of variable neapar, file cpypardou.c, line=47, num=8, size=4, address= 1175620960
ALLOCATION of variable nebpar, file cpypardou.c, line=48, num=8, size=4, address= 1175621008
ALLOCATION of variable ithread, file cpypardou.c, line=67, num=8, size=4, address= 1175621104
FREEING of variable ithread, file cpypardou.c, line=75: oldaddress= 1175621104
FREEING of variable neapar, file cpypardou.c, line=75: oldaddress= 1175620960
FREEING of variable nebpar, file cpypardou.c, line=75: oldaddress= 1175621008
FREEING of variable fn, file nonlingeo.c, line=2147: oldaddress= 757391424
FREEING of variable v, file nonlingeo.c, line=2147: oldaddress= 737902656
FREEING of variable stx, file nonlingeo.c, line=2148: oldaddress= 2119213120
ALLOCATION of variable resold, file nonlingeo.c, line=2167, num=1666587, size=8, address= 737886272
ALLOCATION of variable ad, file nonlingeo.c, line=2437, num=1666587, size=8, address= 751267904
ALLOCATION of variable au, file nonlingeo.c, line=2438, num=84612387, size=8, address= 2119213120
ALLOCATION of variable neapar, file mafillsmmain.c, line=150, num=8, size=4, address= 1175620960
ALLOCATION of variable nebpar, file mafillsmmain.c, line=151, num=8, size=4, address= 1175621008
ALLOCATION of variable ipar, file elementcpuload.c, line=33, num=944391, size=4, address= 192761920
FREEING of variable ipar, file elementcpuload.c, line=62: oldaddress= 192761920
ALLOCATION of variable ad1, file mafillsmmain.c, line=182, num=13332696, size=8, address= -1498804160
ALLOCATION of variable au1, file mafillsmmain.c, line=183, num=676899096, size=8, address= 1785344064
ALLOCATION of variable fext1, file mafillsmmain.c, line=186, num=13332696, size=8, address= -1392091072
ALLOCATION of variable nmethod1, file mafillsmmain.c, line=202, num=8, size=4, address= 1175621104
Using up to 8 cpu(s) for the symmetric stiffness/mass contributions.

ALLOCATION of variable ithread, file mafillsmmain.c, line=252, num=8, size=4, address= 1175621152
FREEING of variable ithread, file mafillsmmain.c, line=260: oldaddress= 1175621152
FREEING of variable neapar, file mafillsmmain.c, line=260: oldaddress= 1175620960
FREEING of variable nebpar, file mafillsmmain.c, line=260: oldaddress= 1175621008
FREEING of variable ad1, file mafillsmmain.c, line=286: oldaddress= -1498804160
FREEING of variable au1, file mafillsmmain.c, line=309: oldaddress= 1785344064
FREEING of variable fext1, file mafillsmmain.c, line=320: oldaddress= -1392091072
FREEING of variable nmethod1, file mafillsmmain.c, line=392: oldaddress= 1175621104
ALLOCATION of variable b_backup, file pastix.c, line=1015, num=1666587, size=8, address= 764653632
Not reusing csc.
ALLOCATION of variable icolPrev, file pastix.c, line=489, num=1666587, size=4, address= 778063936
ALLOCATION of variable irowPrev, file pastix.c, line=490, num=84612387, size=4, address= -1498779584
ALLOCATION of variable jqPrev, file pastix.c, line=491, num=1666588, size=4, address= 784797760
ALLOCATION of variable irowpastix, file pastix.c, line=526, num=170891361, size=4, address= -1005662144
ALLOCATION of variable icolpastix, file pastix.c, line=533, num=1666588, size=4, address= 791519296
ALLOCATION of variable irowacc, file pastix.c, line=539, num=1666587, size=4, address= 798265408
ALLOCATION of variable irowPrediction, file pastix.c, line=544, num=84612387, size=4, address= -1160257472
±------------------------------------------------+

  • PaStiX : Parallel Sparse matriX package     +
    

±------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 8
Number of GPUs: 0
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 1666587
nnz: 170891361

±------------------------------------------------+
Ordering step :
Ordering method is: Scotch
Time to compute ordering: 2.6951
±------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: -1318890835
Fill-in of L: -7.717715
Time to compute symbol matrix: 0.5433
±------------------------------------------------+
Reordering step:
Split level: 0
Stoping criteria: -1
Time for reordering: 4.3406
±------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L: 1657185626
Fill-in: 9.697305
Number of operations in full-rank LU : 38.58 TFlops
Prediction:
Model: AMD 6180 MKL
Time to factorize: 666.8213
Time for analyze: 0.0723
±------------------------------------------------+
Factorization step:
Factorization used: LU
Time to initialize internal csc: 2.8220

Can you provide an input file that causes this? So we can investigate further.

Is it possible to reduce your element/node count?

hi jbr, no the models size can not be reduced any more.

i do not want to share the model in public, but if someone beside Lucas want to give it a try, just tell me.
Thank you all!

Based on what you are describing and the known issues with the node/element counts, I would say that you are above the limits the solver can handle. If you want to debug this further, you may want to build ccx with debugging symbols and use gdb to know where things are breaking.
But if smaller models run fine, this is most likely a memory allocation/access issue. Good luck!

Could I take a look?. I also have an AMD processor but I compiled my own ccx executable.

I think this is a small number for size being the root cause, I’d consider exploring some other possibilities.

Thank you all for your help!

Lucas and Lucas investigated and found:

  • The main problem seems to be the factorization step. However, the matrix is simply too large for the PastiX solver to handle. Preallocation of the memory seems to be to large.
    As written above I already had this concern and tested with the other solvers.
    BUT i always used ccx_static. With ccx_dynamic it works when using PARDISO :slight_smile:
    (tested with CaculiX 2.20. and 2.21)

With a bit more of search i could have found this post:

Anyway it runs with ccx_dynamic and Pardiso solver:

Decascading the MPC’s

Determining the structure of the matrix:
Using up to 16 cpu(s) for setting up the structure of the matrix.
number of equations
1910259
number of nonzero lower triangular matrix elements
95283540

Using up to 16 cpu(s) for the stress calculation.

Using up to 16 cpu(s) for the symmetric stiffness/mass contributions.

Factoring the system of equations using the symmetric pardiso solver
number of threads = 16

Using up to 16 cpu(s) for the stress calculation.

Job finished


Total CalculiX Time: 693.189322

1 Like

Sounds about right :slight_smile:

Maybe try (test PaStiX):
set PASTIX_MIXED_PRECISION=0

and separately:
set PASTIX_SCHEDULER=1

Info(?):

“Looking at the top of the thread, the best performance should be with dynamic (with the fix included in 6.3.1) or StarPU if your matrix is big enough. For small ones, static should probably be better.”

2 Likes

I investigated more, as some models still do not run and break up with no error.
My observations:

the more cores i use by setting: set OMP_NUM_THREADS AND set OPENBLAS_NUM_THREADS
the more likely a model runs through.
ccx 2.21 with PASTIX breaks up more often than using the i8 compiled ccx 2.18 by rafal.
Changing PASTIX_MIXED_PRECISION=0 or PASTIX_SCHDEULER=1 did not influence the behavior.

My issue is, that my models are all below 500.000 elements,
but I use a lot of different material definitions and contact pairs.

Running the models either without many material definitions (so only with one material for all elements)
or without contact pairs leads to a “solvable model” for any solver.

But with my material and contact definitions, only models below 250.000 elements are solved robustly.

ATM i try to find a combination which works, or alternatively try to set up the Calculix with pardiso OUT-OF-CORE mode

Any other suggestions?
Thank you all!

Not really. Try the OoC mode with pardiso and see what you find out. I’ve tried that route before, and it did not work for my case, so I instead found a way to reduce my model size and complexity, and it still gave me the correct answer. Good luck!