Trouble compiling and running CalculiX with Pastix on Ubuntu 24.04

I had Calculix 2.20 with Pastix compiled and running on Ubuntu 20.04 + CUDA 11.1 at one point. However, this time it was like pulling teeth trying to compile the stock code of Calculix 2.22 and Pastix (along with CUDA 12.6, parsec, scotch, openblas etc) on Ubuntu 24.04. :sweat_smile:

Finally, after all the stars lined up, I was able to compile ccx, but no luck running the executable ccx on simple model file (beamp.inp). :disappointed: I would really appreciate if someone can look at the snip from the run and help me identify the main culprit here or what I could try diagnosing. It seems the ccx solver starts but crashes when it calls Pastix. The executable works fine and goes to completion with spooles and paradiso.

I am trying to journal my findings and make files just in case it helps someone else. Please understand the file paths for the libraries may be different in each system, please check them before you use them.

Here is my make_parsec.sh file:

#!/bin/bash

if ! [[ -d build ]]; then
    mkdir build
fi
cd build

INSTALLPATH="/usr/local/PaStiX/parsec_i8"

umask 022

# fixes
sed -i '/-1 == cpu/i return cpu;' parsec/bindthread.c

cmake \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_Fortran_COMPILER=gfortran \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=${INSTALLPATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.6 \
    -DCUDA_DIR=/usr/local/cuda-12.6 \
    -DCUDA_USE_STATIC_CUDA_RUNTIME=ON \
    -DCMAKE_CUDA_HOST_COMPILER=gcc \
    -DPARSEC_GPU_WITH_CUDA=ON \
    -DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8 \
    ..

make -j8

rm -rf ${INSTALLPATH}
make install

Here is my make_pastix.sh file:

#!/bin/bash
if ! [[ -d build ]]; then
    mkdir build
fi
cd build

INSTALLPATH="/usr/local/PaStiX/pastix_i8"
#CUDADIR="/usr/lib/cuda"
PARSECDIR="/usr/local/PaStiX/parsec_i8"
SCOTCHDIR="/usr/local/PaStiX/scotch_i8"
HWLOCDIR="/usr/local/PaStiX/hwloc_i8"
BLASDIR="/usr/local/PaStiX/OpenBLAS_i8"

cmake   \
    -DBLAS_DIR=/usr/local/OpenBLAS_i8 \
    -DHwloc_DIR=/usr/local/PaStiX/hwloc_i8 \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.6 \
    -DCUDA_DIR=/usr/local/cuda-12.6 \
    -DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/pastix_i8 \
    -DCMAKE_BUILD_TYPE=Release \
    -DPASTIX_WITH_PARSEC=ON \
    -DPASTIX_WITH_CUDA=ON \
    -DPASTIX_INT64=ON \
    -DPARSEC_DIR=/usr/local/PaStiX/parsec_i8 \
    -DTHREADS_PREFER_PTHREAD_FLAG=ON \
    -DCMAKE_C_FLAGS="-fopenmp" \
    -DCMAKE_EXE_LINKER_FLAGS="-L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi -pthread" \
    -DCMAKE_C_COMPILER=mpicc \
    -DCMAKE_CXX_COMPILER=mpicxx \
    -DCMAKE_Fortran_COMPILER=gfortran \
    -DCMAKE_THREAD_LIBS_INIT="-lpthread" \
    -DCMAKE_USE_PTHREADS_INIT=TRUE \
    -DCMAKE_C_FLAGS="-fopenmp" \
    -DMPI_INCLUDE_DIRS=/usr/lib/x86_64-linux-gnu/openmpi/include \
    -DMPI_LIBRARIES=/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so \
    -DMPI_EXECUTABLE=/usr/bin/mpiexec \
    -DMPI_C_COMPILER=/usr/bin/mpicc \
    -DPASTIX_WITH_MPI=ON \
    -DSCOTCH_DIR=/usr/local/PaStiX/scotch_i8 \
    ..

make -j8
make install

And here is the Makefile_i8 file that I used to compile ccx:


CFLAGS = -Wall -O2 -fopenmp -fpic -I ../../../SPOOLES.2.2 -I$(PASTIX_INCLUDE) -DARCH="Linux" -DSPOOLES -DARPACK -DMATRIXSTORAGE -DINTSIZE64 -DPASTIX -DPASTIX_FP32 -DPASTIX_GPU
FFLAGS = -Wall -O2 -fopenmp -fpic -fdefault-integer-8

CC=cc
FC=gfortran

.c.o :
	$(CC) $(CFLAGS) -c $<
.f.o :
	$(FC) $(FFLAGS) -c $<

include Makefile.inc

SCCXMAIN = ccx_2.22.c

OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)

DIR=../../../SPOOLES.2.2

PASTIX_INCLUDE = ../../../PaStiX/pastix_i8/include

PASTIX_LIBS = ../../../PaStiX/hwloc_i8/lib/libhwloc.so \
  ../../../OpenBLAS_i8/lib/libopenblas.a \
  ../../../PaStiX/pastix_i8/lib/libpastix.a ../../../OpenBLAS_i8/lib/libopenblas.a -lpthread -lm ../../../PaStiX/pastix_i8/lib/libspm.a \
  ../../../PaStiX/pastix_i8/lib/libpastix_parsec.a ../../../PaStiX/pastix_i8/lib/libpastix_kernels.a ../../../OpenBLAS_i8/lib/libopenblas.a -lrt \
  ../../../PaStiX/pastix_i8/lib/libpastix_kernels_cuda.a ../../../PaStiX/parsec_i8/lib/libparsec.so \
  ../../../PaStiX/scotch_i8/lib/libscotch.a ../../../PaStiX/scotch_i8/lib/libscotcherrexit.a -lpthread -lz -lm /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libgomp.so.1 -lhwloc -lmpi -lmpi_cxx \
  -lpthread -ldl -lrt \
  /usr/local/cuda-12.6/lib64/libcudart_static.a \
  /usr/lib/x86_64-linux-gnu/libcublas_static.a \
  /usr/local/cuda-12.6/lib64/libculibos.a \
  /usr/lib/x86_64-linux-gnu/libcublasLt_static.a \
  /usr/lib/x86_64-linux-gnu/libcuda.so \

LIBS = \
     $(DIR)/spooles.a \
     ../../../ARPACK_i8/libarpack_INTEL_i8.a \
     $(PASTIX_LIBS) \
     -lpthread -lm -lc

ccx_2.22_i8: $(OCCXMAIN) ccx_2.22.a $(LIBS)
	./date.pl; $(CC) $(CFLAGS) -c ccx_2.22.c; $(FC) -Wall -O2 -o $@ \
	$(OCCXMAIN) ccx_2.22.a $(LIBS)

ccx_2.22.a: $(OCCXF) $(OCCXC)
	ar vr $@ $?
                                                                               
1 Like

To paint an accurate picture of the situation, please document the exact versions of the components you used.

Also, make sure that you are actually using the components that you built, and not the versions that might already be on your Ubuntu install.

It looks like the crash happens during ordering of the matrix with Scotch, since it prints the ordering method but not expected next line which is how long the ordering takes.
The system also prints a warning that a buffer overflow was detected.

Is your system set up to generate core dumps?
If so, you could use a debugger like gdb to analyze what lead to the crash.
E.g, call gdb <executable> <core-file>.
Then in the gdb-prompt, use the bt(backtrace) command to show you the sequence of function calls that lead to the crash.

When I was building CalculiX with PaStiX (but without CUDA) I ran into trouble with a misconfigured OpenBLAS, as documented.

1 Like

@rsmith, you are correct I should provide more details. Thank you for your help. Appreciate it!

These are the steps i took on my native Ubuntu 24.04 LTS system using 8-core Intel CPU and GeForce 1050 video:

  1. I downloaded cuda 12.6 and installed it for Ubuntu platform in /usr/local/cuda-12.6 following Nvidia command line instructions
  2. I installed ARPACK libraries in /usr/local/Arpack_i8 by downloading and making the files from here
  3. I installed scotch in /usr/local/PaStiX/scotch_i8 by downloading and making scotch-7.0 .6 from here. I made sure to add - DINTSIZE=64 to the make command
  4. I installed hwloc to /usr/local/PaStiX/hwloc_i8 by downloading and making the files downloaded for hwloc-2.11.2
  5. I [cloned parsec](git clone Bitbucket) and installed in /usr/local/PaStiX/parsec_i8. I used the .make_parsec.sh file given in my original post above. I had hiccups along the way. The make process threw error about not finding pbt2ptt.so. I had to go to /usr/local/PaStiX/parsec/build/tools/profiling/python folder and run
python setup.py build_ext --inplace

and rename the .so file created to pbt2ptt.so before proceeding with the make process.

  1. I [cloned Pastix](git clone GitHub - Dhondtguido/PaStiX4CalculiX), rename the folder to pastix_src and compiled using the .make_pastix.sh file above. This also had issues of the make process not finding the zone_malloc.h file. I had to do:
cp /usr/local/PaStiX/parsec/parsec/utils/zone_malloc.h /usr/local/PaStiX/parsec_i8/include/parsec/utils/

before doing a clean build of pastix written to /usr/local/PaStiX/pastix_i8 folder
7. Finally I went to /usr/local/CalculiX/ccx_2.22/src folder to do the compilation of ccx using the Makefile_i8 file listed above.

@rsmith Thanks for suggesting use of gdb. I added -g to the CFLAGS line in Makefile_i8 file for ccx. Running my ccxi8 (a symlink to my ccx executable) for beamp.inp file shows the following backtrace. I understand something is happening in scotch, but what do I investigate?

W@00000 Oversubscription on core 0 detected
W@00000 Oversubscription on core 1 detected
[New Thread 0x7fffc5600000 (LWP 49662)]
[New Thread 0x7fffc4c00000 (LWP 49663)]
[New Thread 0x7fffbfe00000 (LWP 49664)]
[New Thread 0x7fffbf400000 (LWP 49665)]
[New Thread 0x7fffbea00000 (LWP 49666)]
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.0.1
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                                Started
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 6
  Number of GPUs:                                0
  MPI communication support:              Funneled
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048

  Matrix type:  General
  Arithmetic:   Float
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering step :
    Ordering method is: Scotch
*** buffer overflow detected ***: terminated

Thread 1 "ccxi8" received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c4526e in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff6c297b6 in __libc_message_impl (
    fmt=fmt@entry=0x7ffff6dce765 "*** %s ***: terminated\n")
    at ../sysdeps/posix/libc_fatal.c:132
#6  0x00007ffff6d36c19 in __GI___fortify_fail (
    msg=msg@entry=0x7ffff6dce74c "buffer overflow detected")
    at ./debug/fortify_fail.c:24
#7  0x00007ffff6d365d4 in __GI___chk_fail () at ./debug/chk_fail.c:28
#8  0x00005555563e7400 in _SCOTCHvgraphSeparateGg ()
#9  0x00005555563e832f in vgraphSeparateMl2 ()
#10 0x00005555563e861b in _SCOTCHvgraphSeparateMl ()
#11 0x00005555563cb1be in _SCOTCHvgraphSeparateSt ()
#12 0x00005555563cb2fc in _SCOTCHvgraphSeparateSt ()
#13 0x00005555563d7269 in _SCOTCHhgraphOrderNd ()
#14 0x00005555563d5980 in _SCOTCHhgraphOrderCp ()
--Type <RET> for more, q to quit, c to continue without paging--c
#15 0x00005555563bc059 in SCOTCH_graphOrderComputeList ()
#16 0x00005555563bc5d6 in SCOTCH_graphOrderList ()
#17 0x0000555555f51725 in pastixOrderComputeScotch ()
#18 0x0000555555f2c808 in pastix_subtask_order ()
#19 0x0000555555f094ea in pastix_task_analyze ()
#20 0x0000555555b2d757 in pastix_init (ad=<optimized out>, au=<optimized out>, 
    adb=<optimized out>, aub=<optimized out>, sigma=<optimized out>, 
    icol=<optimized out>, irow=0x7ffff6f9a010, neq=0x7fffffffd080, 
    nzs=0x7fffffffd0a0, symmetryflag=0x7fffffffb998, 
    inputformat=0x7fffffffb9a0, jq=0x555565a94d20, nzs3=0x7fffffffd0b0)
    at pastix.c:292
#21 0x0000555555b2fba3 in pastix_main_generic (ad=<optimized out>, 
    au=<optimized out>, adb=<optimized out>, aub=<optimized out>, 
    sigma=<optimized out>, b=<optimized out>, icol=<optimized out>, 
    irow=<optimized out>, neq=<optimized out>, nzs=<optimized out>, 
    symmetryflag=<optimized out>, inputformat=<optimized out>, 
    jq=<optimized out>, nzs3=<optimized out>, nrhs=<optimized out>)
    at pastix.c:1083
#22 0x0000555555b05c2d in linstatic (co=0x555565a8d780, 
    nk=nk@entry=0x7fffffffca78, konp=konp@entry=0x7fffffffc8a8, 
    ipkonp=ipkonp@entry=0x7fffffffc9b0, lakonp=lakonp@entry=0x7fffffffc858, 
    ne=ne@entry=0x7fffffffca80, nodeboun=0x555565a8c7e0, 
    ndirboun=0x555565a8c9e0, xboun=0x555565a8ce30, nboun=0x7fffffffca88, 
    ipompc=0x0, nodempc=<optimized out>, coefmpc=<optimized out>, 
    labmpc=0x555565a8d490 "", nmpc=0x7fffffffca90, nodeforc=0x555565a90410, 
    ndirforc=0x555565a8c500, xforc=0x555565a90550, nforc=0x7fffffffca98, 
    nelemload=0x555565a9e6e0, sideload=0x555565a9e700 " ;\340\366\377\177", 
    xload=0x555565a9e720, nload=0x7fffffffcaa0, nactdof=0x555565a99630, 
    icolp=0x7fffffffc8f0, jq=0x555565a94d20, irowp=0x7fffffffc910, 
    neq=0x7fffffffd080, nzl=0x7fffffffcad0, nmethod=0x7fffffffcac0, ikmpc=0x0, 
    ilmpc=0x0, ikboun=0x555565a8d030, ilboun=0x555565a8d230, 
    elcon=0x555565a8d510, nelcon=0x555565a90b60, rhcon=0x555565a90b80, 
    nrhcon=0x555565a90ba0, alcon=0x555565a90c10, nalcon=0x555565a90c50, 
    alzero=0x555565a90c70, ielmatp=0x7fffffffc958, ielorienp=0x7fffffffc960, 
    norien=0x7fffffffcb70, orab=0x555565a90d50, ntmat_=0x7fffffffcb68, 
    t0=0x555565a92ce0, t1=0x555565a93510, t1old=0x0, ithermal=0x7fffffffcfe0, 
    prestr=0x555565a94570, iprestr=0x7fffffffcb78, vold=0x555565a97580, 
    iperturb=0x7fffffffcfd0, sti=0x555565a9f1e0, nzs=0x7fffffffd0a0, 
    kode=0x7fffffffcb80, 
    filab=0x555565a9b850 "U    G", ' ' <repeats 194 times>..., 
    eme=0x555565aa21f0, iexpl=0x7fffffffcbc0, plicon=0x0, nplicon=0x0, 
    plkcon=0x0, nplkcon=0x0, xstatep=0x7fffffffcef8, npmat_=0x7fffffffcbe0, 
    matname=0x555565a9b7f0 "EL", ' ' <repeats 78 times>, 
    isolver=0x7fffffffcb88, mi=0x7fffffffd0c0, ncmat_=0x7fffffffcde0, 
    nstate_=0x7fffffffcdd8, cs=0x0, mcs=0x7fffffffcc68, nkon=0x7fffffffcbb0, 
    enerp=0x7fffffffcf08, xbounold=0x555565a9e490, xforcold=0x555565a9e690, 
    xloadold=0x0, amname=0x555565a92ad0 "0\325\332eUU", amta=0x555565a92c20, 
    namta=0x555565a92c70, nam=0x7fffffffcad8, iamforc=0x555565a904b0, 
    iamload=0x0, iamt1=0x555565a93d40, iamboun=0x555565a8cc30, 
    ttime=0x7fffffffcfc8, output=0x7fffffffd42b "asc ", 
    set=0x555565a8beb0 "EALLE", ' ' <repeats 76 times>, "FIXN", ' ' <repeats 77 times>, "LOADN", ' ' <repeats 33 times>..., nset=0x7fffffffcab0, 
    istartset=0x555565a90790, iendset=0x555565a907d0, ialset=0x555565a90810, 
    nprint=0x7fffffffcaa8, prlab=0x555565a8c550 "SOF  GSOF  G", 
    prset=0x555565a90640 "S1T", ' ' <repeats 78 times>, "S2T", ' ' <repeats 78 times>, nener=0x7fffffffcc38, trab=0x0, inotr=0x555565a91a70, 
    ntrans=0x7fffffffcbe8, fmpc=0x0, ipobody=0x0, ibody=0x555565a9e760, 
    xbody=0x555565a9e780, nbody=0x7fffffffcaf8, xbodyold=0x555565a9e7a0, 
    timepar=0x7fffffffd1b0, thicke=0x0, jobnamec=0x7fffffffd550 "beamp", 
    tieset=0x0, ntie=0x7fffffffcc58, istep=0x7fffffffcb40, 
    nmat=0x7fffffffcb60, ielprop=0x0, prop=0x0, 
    typeboun=0x555565a8cbe0 'B' <repeats 63 times>, mortar=0x7fffffffcba0, 
    mpcinfo=0x7fffffffd120, tietol=0x0, ics=0x0, orname=0x0, 
    itempuser=0x7fffffffd040, t0g=0x0, t1g=0x0, jmax=0x7fffffffd000)
    at linstatic.c:732
#23 0x00005555557dc2ed in main (argc=<optimized out>, argv=<optimized out>)
    at ccx_2.22.c:1233

Tomorrow I’ll come back home after a brief holidays in Greece and I could share my experience. I was able to compile CalculiX 2.22 using the PaStiX solver and cuda. I remember some issues about Scotch and i8 option that I finally was able to solve.
Unfortunately in all my tests I didn’t found a clear usage of the GPU. The cpuvsgpu log at execution time always show: something to 0 and also the nvidia-smi command correctly sign the ccx_with_cuda process but the gpu usage was zero

2 Likes

Thanks for the information @kmallick.

Since I don’t have an Nvidia GPU, I don’t use CUDA, so I cannot comment on that, and I’ve used a fork of the original PaStiX4CalculiX.

My build sequence is a little different;

  1. SPOOLES 2.2
  2. OpenBlas 0.3.28
  3. arpack-ng 3.9.1
  4. hwloc 2.11.2
  5. mfaverge-parsec-b580d208094e
  6. scotch 7.0.6
  7. PaStiX4CalculiX (cudaless branch from GitHub - Kabbone/PaStiX4CalculiX)
  8. CalculiX 2.22

(You can find my build scripts and related patches on github. Note that I’m on FreeBSD, not Linux; some of the patches and compilation flags I use are specific to FreeBSD. Feel free to clone this repo and adapt the clone to your needs.)

I should mention that I built all these libraries as static libraries only.
Mostly because other programs I use are dynamically linked to different configurations of the same libraries.

Even if you use PaStiX, SPOOLES is still better for eigenfrequency calculations, so you should include it.

With OpenBLAS it is important to configure it with both USE_THREAD=0 and USE_LOCKING=1. CalculiX uses single-threaded blas but in different threads using OpenMP so locking is needed.

Arpack-ng is a still maintained fork of the original arpack. I configured it to use OpenBLAS.

As for hwloc, I had to explicitly link it with libpciaccess and libexecinfo. I also configured it without graphical output and XML output and without plugin support. CalculiX doesn’t use that functionality and it cuts down on the number of depencencies.

With regard to parsec, I’ve been using the repo at commit b580d20 from the mfaverge/parsec repo, because it is tagged as being for pastix 6.0.1, which is what PastiX4CalculiX is based on. Parsec also needed to be linked with libpciaccess and libexecinfo.

Scotch comes with Makefile.inc for different systems. I just picked the one for x86-64 and FreeBSD.

Note that building PaStiX4CalculiX requires Python 2.7 for code generation. If you use Python 3 the resulting code will not work.
I used -DCMAKE_C_FLAGS='-fopenmp -lpciaccess -lm -Wno-unused-parameter' when invoking cmake to build it.

When building ccx I explicitly set -I and -L to point to the locations of the custom built libraries mentioned before.
I also used -ffixed-form -std=legacy -fallow-argument-mismatch for building the FORTRAN code; this markedly reduces the number of warnings generated during compilation.

So the problem starts in pastixOrderComputeScotch which calls SCOTCH_graphOrderList.

Looking into that, SCOTCH_graphOrderList is called with the third parameter listtab (a pointer to an array) set to NULL, but the second parameter listnbr (the length of the array) set to a value that might not be 0. This then calls graphOrderComputeList where said NULL pointer is then dereferenced without checking further down in the function.
That could very well be the cause of the problem.

What I don’t understand yet is why I don’t have the same problem.

1 Like

I had forgot to add that:

  • I do have Openblas installed in /usr/local/OpenBLAS_i8 folder. I did use USE_THREAD=0 and USE_LOCKING=1 during the making of BLAS
  • I made sure to use commit b580d20 of Parsec from the mfaverge/parsec repo
  • I was able to compile Pastix4Calculix using Python 3.12 except a minor challenge with a zone_malloc.h file missing as mentioned above
  • I am aware of the cudaless branch from GitHub - Kabbone/PaStiX4CalculiX. I tried compiling that version of pastix but I ended up with:
CMake Error at test/CMakeLists.txt:43 (add_library):
  No SOURCES given to target: bcsc_test


CMake Error at test/CMakeLists.txt:122 (add_library):
  No SOURCES given to target: pastix_tests

I do not need to use CUDA or the GPU. I was only following the instructions for using Calculix with Pastix.

BTW, I have a version of Calculix 2.22 with Spooles running perfectly. I was only compiling a standalone version of Calculix with Pastix in my current project.

Since I understand my issue has to do with Scotch, I have been looking into it more closely. It does have problems finishing some of the tests successfully as recommended during the install, but compiles and installs fine otherwise.

1 Like

The Python 2 scripts that generate code for different precisions use the imp module which was removed in 3.12. So those scripts should have failed with an error.

When using Python 3.11 as python (not just python3), I get lots of warnings like these:

-- Generate precision dependencies in /home/rsmith/src/calculix-build/source/pastix4calculix/spm
/home/rsmith/src/calculix-build/source/pastix4calculix/cmake_modules/morse_cmake/modules/precision_genera
tor/genDependencies.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib and sl
ated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp;

And later on the build fails because of faulty generated Makefiles:

sopalin/parsec/CMakeFiles/parsec_headers_tgt.dir/build.make:82: *** missing separator.  Stop.
CMakeFiles/sopalin_headers.dir/build.make:81: *** missing separator.  Stop.
gmake[1]: *** [CMakeFiles/Makefile2:1494: sopalin/parsec/CMakeFiles/parsec_headers_tgt.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:389: CMakeFiles/sopalin_headers.dir/all] Error 2
CMakeFiles/bcsc_headers_tgt.dir/build.make:75: *** missing separator.  Stop.
kernels/CMakeFiles/pastix_kernels.dir/build.make:73: *** missing separator.  Stop.
gmake[1]: *** [CMakeFiles/Makefile2:363: CMakeFiles/bcsc_headers_tgt.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:1442: kernels/CMakeFiles/pastix_kernels.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

What version do you see when you run python --version in a terminal? I suspect you still have python 2.7 installed as python.

I don’t remember seeing that.
An easy fix would be to edit CMakeLists.txt and remove the following lines:

## Executable and tests
enable_testing()
include(CTest)
# Examples executables
add_subdirectory(example)
# Testing executables
add_subdirectory(test)

I did patch the CMakeLists.txt to remove building the documentation.
And added two small patches to address missing incude files and prototypes. You can find them in the repo I linked to earlier.

It certainly looks that way. Although to be honest I do not understand why I do not also get the same error. The way SCOTCH_graphOrderList is called in PaStiX seems like a bug to me. But the current version of PaStiX still has it.

Recently I switched my build from scotch-v6.0.8 to scotch-v7.0.6. Both worked fine AFAICT.

This is my script in order to compile scotch with i8:

*start****************************
* procedure for scotch           *
**********************************

export CFLAGS="-DINTSIZE64"
export FCFLAGS="-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8"
** You must use this and only this one!!!
tar -zxvf scotch-master.tar.gz
mv scotch-master scotch_i8
cd scotch_i8
cd src
ln -sf Make.inc/Makefile.inc.x86-64_pc_linux2 Makefile.inc
sed -i '/CFLAGS/ s/$/ -DINTSIZE64/' Makefile.inc
** Search in your system where are defined the mpi.h, tipically I use openMPI
** and put the information un the cflag and ldflag of the file Makefile.inc
** For example on my system:
CFLAGS		= -O3 -fPIC -DCOMMON_FILE_COMPRESS_GZ -DCOMMON_PTHREAD -DCOMMON_PTHREAD_AFFINITY_LINUX -DCOMMON_RANDOM_FIXED_SEED -DSCOTCH_MPI_ASYNC_COLL -DSCOTCH_PTHREAD -DSCOTCH_PTHREAD_MPI -DSCOTCH_RENAME -Drestrict=__restrict -DIDXSIZE64 -DINTSIZE64 -I/usr/local/openmpi-3.1.6/install/include
LDFLAGS		= -O3 -fdefault-integer-8 -lz -lm -lrt -pthread -I/usr/local/openmpi-3.1.6/install/lib
** in the ../scotch_i8/src/Makefile.inc file.
** Than make some changes in file ./src/check/test_libmetis_dual_f.f90.in
sed -i 's/integer(c_int), value, intent(in) :: val/integer(8), value, intent(in) :: val/g' check/test_libmetis_dual_f.f90.in
sed -i 's/call exit_c (1)/call exit_c (1_8)/g' check/test_libmetis_dual_f.f90.in
** in order to meet the integer*8 compulsory need

cmake	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/scotch_i8/install \
	-DCMAKE_C_FLAGS="-m64" \
	-DCMAKE_Fortran_FLAGS=-fdefault-integer-8 \
	..

make -j 16 scotch
make -j 16 esmumps
make install


**********************************
* procedure for scotch           *
******************************end*
1 Like
#which python
/usr/bin/python
#/usr/bin/python --version
Python 3.12.3

I do recall replacing

import imp

with

import importlib

in one of the Python codes. I wonder if the Python errors are system specific.

I will try compiling the cudaless version by modifying the CMakeLists file soon.

@teofil75 ,Thank you so much for your help!
I think I followed your instructions to the dot. But just for the purpose of my journaling, here is what I did:

  1. I downloaded Scotch
  2. I went into the directory of scotch-master
cd src
ln -sf Make.inc/Makefile.inc.x86-64_pc_linux2 Makefile.inc

I made sure the Makefile.inc (a symlink) in /src folder looks like:

EXE		=
LIB		= .a
OBJ		= .o

MAKE		= make
AR		= ar
ARFLAGS		= -ruv
CAT		= cat
CCS		= gcc
CCP		= mpicc
CCD		= gcc
CFLAGS		= -O3 -fPIC -DCOMMON_FILE_COMPRESS_GZ -DCOMMON_PTHREAD -DCOMMON_PTHREAD_AFFINITY_LINUX -DCOMMON_RANDOM_FIXED_SEED -DSCOTCH_MPI_ASYNC_COLL -DSCOTCH_PTHREAD -DSCOTCH_PTHREAD_MPI -DSCOTCH_RENAME -Drestrict=__restrict -DIDXSIZE64 -DINTSIZE64 -l/usr/lib/x86_64-linux-gnu/openmpi/include
CLIBFLAGS	=
FCFLAGS=-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8
LDFLAGS		= -O3 -fdefault-integer-8 -lz -lm -lrt -pthread -I/usr/lib/x86_64-linux-gnu/openmpi/lib
CP		= cp
FLEX		= flex
LN		= ln
MKDIR		= mkdir -p
MV		= mv
RANLIB		= ranlib
BISON		= bison

I issued:

sed -i 's/call exit_c (1)/call exit_c (1_8)/g' check/test_libmetis_dual_f.f90.in
sed -i 's/integer(c_int), value, intent(in) :: val/integer(8), value, intent(in) :: val/g' check/test_libmetis_dual_f.f90.in
export CFLAGS="-DINTSIZE64"
export FCFLAGS="-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8"
cmake -DCMAKE_INSTALL_PREFIX="/usr/local/PaStiX/scotch_i8" -DCMAKE_C_FLAGS="-m64" -DCMAKE_Fortran_FLAGS="-fdefault-integer-8" -DINTSIZE=64 ..
make -j8 scotch
make -j8 esmumps
make install

Now this installed scotch in /usr/local/PaStiX/scotch_i8 folder

Subsequently, I re-complied Calculix ccx using the Makefile provided in my original post. Everything went smoothly.

But I am still getting the same error running ccx on beamp. Here is my backtrace.

[New Thread 0x7fffed600000 (LWP 133055)]                                        
[New Thread 0x7ffff6000000 (LWP 133056)]
[New Thread 0x7fffc7400000 (LWP 133057)]
[New Thread 0x7fffc6a00000 (LWP 133058)]
[New Thread 0x7fffc6000000 (LWP 133059)]
W@00000 Oversubscription on core 0 detected
W@00000 Oversubscription on core 1 detected
[New Thread 0x7fffc5600000 (LWP 133060)]
[New Thread 0x7fffc4c00000 (LWP 133061)]
[New Thread 0x7fffbfe00000 (LWP 133062)]
[New Thread 0x7fffbf400000 (LWP 133063)]
[New Thread 0x7fffbea00000 (LWP 133064)]
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.0.1
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                                Started
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 6
  Number of GPUs:                                0
  MPI communication support:              Funneled
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048

  Matrix type:  General
  Arithmetic:   Float
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering step :
    Ordering method is: Scotch
munmap_chunk(): invalid pointer

Thread 1 "ccx222i8" received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c4526e in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff6c297b6 in __libc_message_impl (
    fmt=fmt@entry=0x7ffff6dce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:132
#6  0x00007ffff6ca8fe5 in malloc_printerr (
    str=str@entry=0x7ffff6dd1520 "munmap_chunk(): invalid pointer")
    at ./malloc/malloc.c:5772
#7  0x00007ffff6ca946c in munmap_chunk (p=<optimized out>)
    at ./malloc/malloc.c:3040
#8  0x00007ffff6caddea in __GI___libc_free (mem=0x555566143460)
    at ./malloc/malloc.c:3388
#9  0x00005555563fdf07 in graphCoarsen3 ()
#10 0x00005555563c42f6 in _SCOTCHthreadLaunch ()
#11 0x00005555563fe277 in graphCoarsen2 ()
#12 0x00005555563fe44f in _SCOTCHgraphCoarsen ()
#13 0x00005555563f307d in vgraphSeparateMlCoarsen ()
#14 0x00005555563f33c8 in vgraphSeparateMl2 ()
#15 0x00005555563f353e in _SCOTCHvgraphSeparateMl ()
#16 0x00005555563cc13b in _SCOTCHvgraphSeparateSt ()
#17 0x00005555563cbfcb in _SCOTCHvgraphSeparateSt ()
#18 0x00005555563cbef1 in _SCOTCHvgraphSeparateSt ()
#19 0x00005555563db665 in _SCOTCHhgraphOrderNd ()
#20 0x00005555563c75dc in _SCOTCHhgraphOrderSt ()
#21 0x00005555563d7ee6 in _SCOTCHhgraphOrderCp ()
#22 0x00005555563c75dc in _SCOTCHhgraphOrderSt ()
#23 0x00005555563b912f in SCOTCH_graphOrderComputeList ()
#24 0x00005555563b9452 in SCOTCH_graphOrderList ()
#25 0x0000555555f4ea45 in pastixOrderComputeScotch ()
#26 0x0000555555f29b28 in pastix_subtask_order ()

Success at last! Stars do have to line up. :rofl:

I had missed a step of recompiling Pastix in between compiling scotch (per @teofil75 ) and Calculix. But I had to add the flag of -DINTSIZE=64 in the cmake command for scotch, otherwise Pastix threw me an error about scotch:

PASTIX_INT64 is enabled and provided Scotch is not compiled with int64
  support.

I have edited my post and modified my cmake command before scotch compilation above accordingly.

After recompiling scotch, Pastix and then ccx, my executable goes to completion on beamp without any issue. :raised_hands:t4: But I still get this strange Oversubscription error. Any idea what could be causing it?

ccx222i8 beamp

************************************************************

CalculiX Version 2.22 i8, Copyright(C) 1998-2024 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Sat Jan  4 11:57:48 AM MST 2025

  The numbers below are estimated upper bounds

  number of:

   nodes:                   261
   elements:                    32
   one-dimensional elements:                     0
   two-dimensional elements:                     0
   integration points per element:                     8
   degrees of freedom per node:                     3
   layers per element:                     1

   distributed facial loads:                     0
   distributed volumetric loads:                     0
   concentrated loads:                     9
   single point constraints:                    63
   multiple point constraints:                     1
   terms in all multiple point constraints:                     1
   tie constraints:                     0
   dependent nodes tied by cyclic constraints:                     0
   dependent nodes in pre-tension constraints:                     0

   sets:                     6
   terms in all sets:                   105

   materials:                     1
   constants per material and temperature:                     2
   temperature points per material:                     1
   plastic data points per material:                     0

   orientations:                     0
   amplitudes:                     4
   data points in all amplitudes:                     4
   print requests:                     4
   transformations:                     0
   property cards:                     0


 STEP                     1

 Static analysis was selected

 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 6 cpu(s) for setting up the structure of the matrix.
 number of equations
 720
 number of nonzero lower triangular matrix elements
 37458

 Using up to 6 cpu(s) for the stress calculation.

 Using up to 6 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
W@00000 Oversubscription on core 0 detected
W@00000 Oversubscription on core 1 detected
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.0.1
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                                Started
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 6
  Number of GPUs:                                0
  MPI communication support:              Funneled
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048

  Matrix type:  General
  Arithmetic:   Float
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering step :
    Ordering method is: Scotch
    Time to compute ordering:              0.0117 
+-------------------------------------------------+
  Symbolic factorization step:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure:      64548
    Fill-in of L:                         0.853403
    Time to compute symbol matrix:        0.0026 
+-------------------------------------------------+
  Reordering step:
    Split level:                                 0
    Stoping criteria:                           -1
    Time for reordering:                  0.0015 
+-------------------------------------------------+
  Analyse step:
    Number of non-zeroes in blocked L:      129096
    Fill-in:                              1.706806
    Number of operations in full-rank LU   :    12.72 MFlops
    Prediction:
      Model:                             AMD 6180  MKL
      Time to factorize:                  0.0035 
    Time for analyze:                     0.0001 
+-------------------------------------------------+
  Factorization step:
    Factorization used: LU
    Time to initialize internal csc:      0.0031 
    Time to initialize coeftab:           0.0012 
    Time to factorize:                    1.0309  (12.34 MFlop/s)
    Number of operations:                      12.72 MFlops
    Number of static pivots:                     0
CPU vs GPU CBLK GEMMS -> 143 vs 0
CPU vs GPU BLK GEMMS -> 0 vs 0
CPU vs GPU TRSM -> 0 vs 0
    Time to solve:                        0.0203 
    - iteration 1 :
         total iteration time                   0.0204 
         error                                  1.3067e-06
    - iteration 2 :
         total iteration time                   0.00154 
         error                                  1.3059e-09
    - iteration 3 :
         total iteration time                   0.00162 
         error                                  1.505e-12
    - iteration 4 :
         total iteration time                   0.00169 
         error                                  7.2393e-16
    Time for refinement:                  0.0311 
    - iteration 1 :
         total iteration time                   0.00133 
         error                                  1.1255e-14
    Time for refinement:                  0.0020 
________________________________________

CSC Conversion Time: 0.023120
Init Time: 0.331662
Factorize Time: 1.035309
Solve Time: 0.053605
Clean up Time: 0.000000
---------------------------------
Sum: 1.443696

Total PaStiX Time: 1.443696
CCX without PaStiX Time: 0.038895
Share of PaStiX Time: 0.973766
Total Time: 1.482591
Reusability: 0 : 1 
________________________________________

 Using up to 6 cpu(s) for the stress calculation.


 Job finished

________________________________________

Total CalculiX Time: 1.488083
________________________________________

1 Like

I never experienced such error, but at the end in my compilation I avoid to use MPI in both PaRSEC and PaStiX.
Moreover in my tests I always experienced more than 120MFlop/s, your value is 10 times less:

Time to factorize:                    1.0309  (12.34 MFlop/s)

Here is one of my test:

Not reusing csc.
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.0.1
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                                Started
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 2
  Number of GPUs:                                1
  MPI communication support:              Disabled
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048

  Matrix type:  General
  Arithmetic:   Float
  Format:       CSC
  N:            202412
  nnz:          13171430

+-------------------------------------------------+
  Ordering step :
    Ordering method is: Scotch
    Time to compute ordering:              0.5019 
+-------------------------------------------------+
  Symbolic factorization step:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure:   67009949
    Fill-in of L:                         5.087523
    Time to compute symbol matrix:        0.0587 
+-------------------------------------------------+
  Reordering step:
    Split level:                                 0
    Stoping criteria:                           -1
    Time for reordering:                  0.1837 
+-------------------------------------------------+
  Analyse step:
    Number of non-zeroes in blocked L:    134019898
    Fill-in:                              10.175045
    Number of operations in full-rank LU   :    96.72 GFlops
    Prediction:
      Model:                             AMD 6180  MKL
      Time to factorize:                  12.1503 
    Time for analyze:                     0.0130 
+-------------------------------------------------+
  Factorization step:
    Factorization used: LU
    Time to initialize internal csc:      0.3696 
    Time to initialize coeftab:           0.1878 
    Time to factorize:                    0.8375  (115.49 GFlop/s)
    Number of operations:                      96.72 GFlops
    Number of static pivots:                     0
CPU vs GPU CBLK GEMMS -> 71919 vs 0
CPU vs GPU BLK GEMMS -> 10370 vs 0
CPU vs GPU TRSM -> 4144 vs 0
    Time to solve:                        0.0354 
    - iteration 1 :
         total iteration time                   0.0478 
         error                                  8.2322e-08
    - iteration 2 :
         total iteration time                   0.0471 
         error                                  4.3618e-09
    - iteration 3 :
         total iteration time                   0.0479 
         error                                  3.0351e-10
    - iteration 4 :
         total iteration time                   0.048 
         error                                  1.1094e-11
    - iteration 5 :
         total iteration time                   0.0476 
         error                                  1.6898e-13
    Time for refinement:                  0.2626 
    Time for refinement:                  0.0231 
________________________________________

CSC Conversion Time: 0.192505
Init Time: 0.891372
Factorize Time: 1.395825
Solve Time: 0.326749
Clean up Time: 0.000000
---------------------------------
Sum: 2.806453

Total PaStiX Time: 2.806453
CCX without PaStiX Time: 2.863384
Share of PaStiX Time: 0.494979
Total Time: 5.669837
Reusability: 0 : 1 
________________________________________
1 Like

@teofil75 I see that your run shows:

Number of GPUs: 1

I wonder why mine is still showing 0.

1 Like

you must to set the env var:

export PASTIX_GPU=1	0 - CPU, 1 - GPU
2 Likes

here my complete procedure: (it is more ordered if you copy this text using courier font)

** The installation folder is choosen as:		/usr/local
** The sources download folder is assumed to be:	~/download/fem/CalculiX

   ~/download/fem/CalculiX
                  +-- ccx_2.22.README.INSTALL
                  +-- ccx_2.22.SPOOLEScorrection.tar.bz2
                  +-- ccx_2.22.src.tar.bz2
                      +-- dependencies
                          +-- ARPACK
                              +-- arpack96.tar.Z
                              +-- patch.tar.Z
                          +-- lapack
                          MUMPS_5.7.1.tar.gz
                          +-- pardiso
                              +-- intel-onemkl-2025.0.0.940_offline.sh
                          +-- PaStiX
                              +-- hwloc-2.11.2.tar.gz
                              +-- scotch-master.tar.gz
                              +-- starpu-1.4.7.tar.gz
                              +-- PaStiX4CalculiX
                          +-- SPOOLES
                          +-- spooles.2.2.tgz

** The CalculiX i8 builder assume a directory sturcture for it's
** solvers like this:
   /usr/local
        +-- ARPACK
        +-- OpenBLAS_i8
        +-- PaStiX
            +-- hwloc_i8
            +-- parsec_i8
            +-- scotch_i8
            +-- PaStiX4CalculiX
        +-- SPOOLES.2.2
        +-- cuda-12.4
        +-- mpich-4.1
        +-- openmpi-3.1.6
  
** Moreover the location for Pardiso solver integrated in
** MKL oneAPI from INTEL:
** /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so

*start****************************
* procedure for ARPACK           *
**********************************

cd /usr/local
tar -zxvf arpack96.tar.Z
tar -zxvf patch.tar.Z
cd ARPACK/
nano UTIL/second.f
** comment line #24: EXTERNAL           ETIME
nano ARmake.inc
** define: home = /usr/local/ARPACK
** define: PLAT = INTEL
** define: FC      = gfortran
** define: FFLAGS  = -O3
** define: MAKE    = /usr/bin/make
make lib
** verify the presence of: libarpack_INTEL.a
make clean

** after that modify:
nano ARmake.inc
** define: PLAT = INTEL_i8
nano ARmake.inc
** define> FFLAGS  = -O3 -fdefault-integer-8
export CFLAGS="-DINTSIZE64"
export FCFLAGS="-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8"
make lib
** verify the presence of: libarpack_INTEL_i8.a

**********************************
* procedure for ARPACK           *
******************************end*
* ------------------------------ *
*start****************************
* procedure for SPOOLES          *
**********************************

mkdir SPOOLES.2.2
cd SPOOLES.2.2/
tar -zxvf spooles.2.2.tgz
nano Make.inc
** uncomment line: CC = gcc
** comment line: #  CC = /usr/lang-4.0/bin/cc
** uncomment line:  OPTLEVEL = -O3
** comment line: #  OPTLEVEL = -O
** uncomment line:  RANLIB = ranlib
** comment line: #  RANLIB = echo
nano makefile
** uncomment line:         cd MT/src             ; make -f makeGlobalLib
** per la vesione con MT
** modificare il file I2Ohash/src/util.c per apportare la ccx_2.15.SPOOLEScorrection.tar.bz2
** comment line 42: /*loc  = (loc1*loc2) % hashtable->nlist ;*/
** and add:
** long int loc3  = (long int)loc1*(long int)loc2 % hashtable->nlist ;
** loc=(int)loc3;
** Than check for the file drawTree.c in the folder ./Tree/src
** and be sure that Tree/src/makefile calls for: $(OBJ).a(drawTree.o) \
** as well as also for the file: Tree/src/makeGlobalLib call for the file:
**  drawTree.c \
make global
** After that the library will be created: spooles.a
** After that create for the MT library: enter in the folder ../MT/src
** and do:
make
** it will be create the library: ../MT/src/spoolesMT.a


**********************************
* procedure for SPOOLES          *
******************************end*
* ------------------------------ *
*start****************************
* procedure for INTEL oneAPI MKL *
**********************************

wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/79153e0f-74d7-45af-b8c2-258941adf58a/intel-onemkl-2025.0.0.940_offline.sh
sudo sh ./intel-onemkl-2025.0.0.940_offline.sh

sudo sh ./intel-onemkl-2025.0.0.940_offline.sh
sudo sh ./intel-onemkl-2025.0.0.940_offline.sh
** Download from: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html?operatingsystem=linux&linux-install=offline
** the offline file: intel-onemkl-2025.0.0.940_offline.sh about 497 MB.
** After that execute the .sh script:
sudo intel-onemkl-2025.0.0.940_offline.sh
** and ran it. Follow the on-screen instructions.
** During the installation process, you will be given several options.
** You can choose to install to a default directory or specify your own.
** You will also be asked to accept the terms and conditions.
** After installation, you may need to set environment variables in your .bashrc (or .zshrc if you use Zsh).
** Add the following line:
echo "source /opt/intel/oneapi/setvars.sh" >> ~/.bashrc
** and than:
source ~/.bashrc
** than check the installation:
ls /opt/intel/oneapi/mkl/latest/lib/intel64/
find /opt/intel/oneapi/mkl/ -name *pardiso*
** 
** problemi per ridefinizioni della var. envMKL che ho risolto definendola
** extern in pardiso.h (ultima riga): extern char envMKL[32];
** e verificando che in pardiso.c ci fosse una sola definizione di
** quella var: char envMKL[32]; /* moved to pardiso.h */
** così finalmente ccx_2.17 è stato compilato correttamente
** After compile on CalculiX with Pardiso solver but you must correct a variable definition as extern,
** in pardiso.h (lastt row): extern char envMKL[32];
** In the ccx src folder execute:
sed -i '/char envMKL[32];/extern char envMKL[32];/' pardiso.h
** and verify in the pardiso.c file for only one definition for the variable: char envMKL[32]; /* moved to pardiso.h */

**********************************
* procedure for INTEL oneAPI MKL *
******************************end*
* ------------------------------ *
*start****************************
* procedure for OpenBLAS         *
**********************************

git clone https://github.com/xianyi/OpenBLAS.git
mv OpenBLAS OpenBLAS_i8
cd OpenBLAS_i8/
make INTERFACE64=1 USE_OPENMP=1 LAPACK=1 LAPACKE=1
make PREFIX=/usr/local/OpenBLAS_i8/install install
** If LAPACKE is not compiled in correctly with the previous commands,
** then compile it in separately:
git clone https://github.com/Reference-LAPACK/lapack.git
cd lapack
cp make.inc.example make.inc
cd LAPACKE
make VERBOSE=1 lapacke
** that will create the librery: liblapacke.a
** and than copy it in: ../OpenBLAS_i8/install/lib
cp ../liblapacke.a ../../install/lib/.
** Than clear all:
make clean
** After that the header files and the libreries are leave in ../include e ../lib

**********************************
* procedure for OpenBLAS         *
******************************end*
* ------------------------------ *
*start****************************
* procedure for hwloc            *
**********************************

** All the dependencies must be compiled using integer*8 support:
export CFLAGS="-DINTSIZE64"
export FCFLAGS="-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8"

umask 022
./configure CC=gcc CXX=g++ \
	--prefix=/usr/local/PaStiX/hwloc_i8/install \
	--enable-static \
	--enable-shared \
	--enable-cuda \
	--with-cuda=/usr/local/cuda-12.4

make -j8
make install

**********************************
* procedure for hwloc            *
******************************end*
* ------------------------------ *
*start****************************
* procedure for scotch           *
**********************************

export CFLAGS="-DINTSIZE64"
export FCFLAGS="-fdefault-integer-8 -fdefault-real-8 -fdefault-double-8"
** You must use this and only this one!!!
tar -zxvf scotch-master.tar.gz
mv scotch-master scotch_i8
cd scotch_i8
cd src
ln -sf Make.inc/Makefile.inc.x86-64_pc_linux2 Makefile.inc
sed -i '/CFLAGS/ s/$/ -DINTSIZE64/' Makefile.inc
** Search in your system where are defined the mpi.h, tipically I use openMPI
** and put the information un the cflag and ldflag of the file Makefile.inc
** For example on my system:
CFLAGS		= -O3 -fPIC -DCOMMON_FILE_COMPRESS_GZ -DCOMMON_PTHREAD -DCOMMON_PTHREAD_AFFINITY_LINUX -DCOMMON_RANDOM_FIXED_SEED -DSCOTCH_MPI_ASYNC_COLL -DSCOTCH_PTHREAD -DSCOTCH_PTHREAD_MPI -DSCOTCH_RENAME -Drestrict=__restrict -DIDXSIZE64 -DINTSIZE64 -I/usr/local/openmpi-3.1.6/install/include
LDFLAGS		= -O3 -fdefault-integer-8 -lz -lm -lrt -pthread -I/usr/local/openmpi-3.1.6/install/lib
** in the ../scotch_i8/src/Makefile.inc file.
** Than make some changes in file ./src/check/test_libmetis_dual_f.f90.in
sed -i 's/integer(c_int), value, intent(in) :: val/integer(8), value, intent(in) :: val/g' check/test_libmetis_dual_f.f90.in
sed -i 's/call exit_c (1)/call exit_c (1_8)/g' check/test_libmetis_dual_f.f90.in
** in order to meet the integer*8 compulsory need

cmake	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/scotch_i8/install \
	-DCMAKE_C_FLAGS="-m64" \
	-DCMAKE_Fortran_FLAGS=-fdefault-integer-8 \
	..

make -j 16 scotch
make -j 16 esmumps
make install


**********************************
* procedure for scotch           *
******************************end*
* ------------------------------ *
*start****************************
* procedure for starPU           *
**********************************

** Per installare pip il tuo ambiente Python è gestito tramite il gestore di pacchetti di sistema
** (in questo caso, apt su Debian/Ubuntu), e quindi non è possibile installare pacchetti Python
** globalmente con pip senza usare un ambiente virtuale.
** Questo comportamento è stato introdotto per evitare conflitti con i pacchetti di sistema.
** La soluzione raccomandata è di creare un ambiente virtuale per gestire i pacchetti Python
** separatamente dal sistema:
sudo apt install python3-venv
** Crea una cartella per il tuo ambiente virtuale, ad esempio venv, nella directory di progetto:
python3 -m venv venv
** Attivare l'Ambiente Virtuale: Per attivare l'ambiente virtuale:
source venv/bin/activate
** Installare setuptools nell'Ambiente Virtuale:
** Ora che sei nell'ambiente virtuale, puoi installare facilmente setuptools con pip:
pip install --upgrade setuptools
** Ora puoi tornare alla tua directory di progetto e eseguire make o altri comandi di compilazione,
** utilizzando l'ambiente virtuale.

tar -zxvf starpu-1.4.7.tar.gz
cd starpu-1.4.7/
mkdir build
cd build
../configure \
	--prefix=/usr/local/PaStiX/starpu-1.4.7/install \
	--enable-cuda

make -j 16
make install
make clean

**********************************
* procedure for starPU           *
******************************end*
* ------------------------------ *
*start****************************
* procedure for parsec           *
**********************************

export PATH="/home/alex/downloads/Python-2.7.18:$PATH"
git clone https://bitbucket.org/mfaverge/parsec.git

if ! [[ -d build ]]; then
    mkdir build
fi
cd build

INSTALLPATH="/usr/local/PaStiX/parsec_i8/install"

umask 022

# fixes
sed -i '/-1 == cpu/i return cpu;' ../parsec/bindthread.c
** minimum version to compile PaRSEC:
export PATH=/opt/cmake-3.21/bin:$PATH
** Last version: but avoid to use it!!!
#export PATH=/opt/cmake-3.31.1/bin:$PATH
export PATH="/home/alex/downloads/Python-2.7.18:$PATH"

** per ccx_2.22_pastix
** all test passed:
cmake	\
	-DCMAKE_CXX_COMPILER=g++ \
	-DCMAKE_C_COMPILER=gcc \
	-DCMAKE_Fortran_COMPILER=gfortran \
	-DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/parsec_i8/install_noCUDA \
	-DPARSEC_GPU_WITH_CUDA=OFF \
	-DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8/install \
	-DPYTHON_EXECUTABLE=/opt/python2.7/bin/python2.7 \
	-DPARSEC_DIST_WITH_MPI=OFF \
	-DBUILD_TESTING=OFF \
	..

** per ccx_2.22_cuda
** all test passed:
cmake	\
	-DCMAKE_CXX_COMPILER=g++ \
	-DCMAKE_C_COMPILER=gcc \
	-DCMAKE_Fortran_COMPILER=gfortran \
	-DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/parsec_i8/install \
	-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4 \
	-DCUDA_USE_STATIC_CUDA_RUNTIME=ON \
	-DCMAKE_CUDA_HOST_COMPILER=gcc \
	-DPARSEC_GPU_WITH_CUDA=ON \
	-DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8/install \
	-DPYTHON_EXECUTABLE=/opt/python2.7/bin/python2.7 \
	-DPARSEC_DIST_WITH_MPI=OFF \
	-DBUILD_TESTING=OFF \
	..

make -j 16
make install
make test
make clean

cd install/bin/
ln -s parsec-ptgpp parsec_ptgpp

cp ../../parsec_i8/parsec/utils/zone_malloc.h ../../parsec_i8/install/include/parsec/utils/zone_malloc.h

**********************************
* procedure for parsec           *
******************************end*
* ------------------------------ *
*start****************************
* procedure for PaStiX4CalculiX  *
**********************************

export PATH="/home/alex/downloads/Python-2.7.18:$PATH"
git clone https://github.com/Dhondtguido/PaStiX4CalculiX.git

if ! [[ -d build ]]; then
    mkdir build
fi
cd build

** l'ultima volta sul notebook HP non li ho piu' usati!
** pero' devo caricare
#export PATH=/usr/local/PaStiX/parsec_i8/install/bin:$PATH
#export LD_LIBRARY_PATH=/usr/local/PaStiX/parsec_i8/install/lib:$LD_LIBRARY_PATH
#export PKG_CONFIG_PATH=/usr/local/PaStiX/parsec_i8/install/lib/pkgconfig:$PKG_CONFIG_PATH
export C_INCLUDE_PATH=/usr/local/PaStiX/parsec_i8/install/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/PaStiX/parsec_i8/install/include:$CPLUS_INCLUDE_PATH
#export C_INCLUDE_PATH=/usr/local/PaStiX/starpu-1.4.7/install/include/starpu/1.4:$C_INCLUDE_PATH
#export CPLUS_INCLUDE_PATH=/usr/local/PaStiX/starpu-1.4.7/install/starpu/1.4:$CPLUS_INCLUDE_PATH

sed -i 's/^coeftab_fct_memory_t coeftabMemory\[4\];/extern coeftab_fct_memory_t coeftabMemory[4];/' ../sopalin/coeftab.h

commentare i tests in CMakeList.txt di PaStiX4CalculiX, line 879...
cp ../../parsec_i8/parsec/utils/zone_malloc.h ../../parsec_i8/install/include/parsec/utils/.
cp ../../parsec_i8/parsec/utils/zone_malloc.h ../../parsec_i8/install_noCUDA/include/parsec/utils/.

** per ccx_2.22_pastix
cmake	\
	-DBLAS_DIR=/usr/local/OpenBLAS_i8/install \
	-DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8/install \
	-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4 \
	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/PaStiX4CalculiX_K/install \
	-DCMAKE_BUILD_TYPE=Release \
	-DPASTIX_WITH_STARPU=OFF \
	-DPASTIX_WITH_PARSEC=ON \
	-DPARSEC_DIR=/usr/local/PaStiX/parsec_i8/install_noCUDA \
	-DPARSEC_INCLUDE_DIRS=/usr/local/PaStiX/parsec_i8/install_noCUDA/include \
	-DPARSEC_LIBRARIES=/usr/local/PaStiX/parsec_i8/install_noCUDA/lib \
	-DPASTIX_ORDERING_SCOTCH=ON \
	-DSCOTCH_DIR=/usr/local/PaStiX/scotch_i8/install \
	-DPASTIX_WITH_CUDA=OFF \
	-DCMAKE_C_COMPILER=gcc \
	-DCMAKE_CXX_COMPILER=g++ \
	-DCMAKE_Fortran_COMPILER=gfortran \
	-DCMAKE_C_FLAGS="-fopenmp" \
	-DPASTIX_WITH_MPI=OFF \
	-DBUILD_TESTING=OFF \
	-DPYTHON_EXECUTABLE=/opt/python2.7/bin/python2.7 \
	..

	-DPASTIX_WITH_MPI=ON \
	-DMPI_CXX_COMPILER=/usr/local/openmpi-3.1.6/install/bin/mpicxx \
	-DMPI_C_COMPILER=/usr/local/openmpi-3.1.6/install/bin/mpicc \

** per ccx_2.22_cuda
cmake	\
	-DBLAS_DIR=/usr/local/OpenBLAS_i8/install \
	-DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8/install \
	-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4 \
	-DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/PaStiX4CalculiX/install \
	-DCMAKE_BUILD_TYPE=Release \
	-DPASTIX_WITH_STARPU=OFF \
	-DPASTIX_WITH_PARSEC=ON \
	-DPARSEC_DIR=/usr/local/PaStiX/parsec_i8/install \
	-DPARSEC_INCLUDE_DIRS=/usr/local/PaStiX/parsec_i8/install/include \
	-DPARSEC_LIBRARIES=/usr/local/PaStiX/parsec_i8/install/lib \
	-DPASTIX_ORDERING_SCOTCH=ON \
	-DSCOTCH_DIR=/usr/local/PaStiX/scotch_i8/install \
	-DPASTIX_WITH_CUDA=ON \
	-DPASTIX_INT64=ON \
	-DCMAKE_C_COMPILER=gcc \
	-DCMAKE_CXX_COMPILER=g++ \
	-DCMAKE_Fortran_COMPILER=gfortran \
	-DCMAKE_C_FLAGS="-fopenmp" \
	-DPASTIX_WITH_MPI=OFF \
	-DBUILD_TESTING=OFF \
	-DPYTHON_EXECUTABLE=/opt/python2.7/bin/python2.7 \
	..

	-DPASTIX_WITH_MPI=ON \
	-DMPI_CXX_COMPILER=/usr/local/openmpi-3.1.6/install/bin/mpicxx \
	-DMPI_C_COMPILER=/usr/local/openmpi-3.1.6/install/bin/mpicc \

** NON SERVE PIU'! BASTA COPIARE I FILES...
** per parsec with all test passed:
sed -i 's#include "parsec/utils/zone_malloc.h"//#include "/usr/local/PaStiX/parsec/parsec/utils/zone_malloc.h"/' ../bcsc/bvec.c
sed -i 's#include "parsec/utils/zone_malloc.h"//#include "/usr/local/PaStiX/parsec/parsec/utils/zone_malloc.h"/' ../sopalin/pastix_task_solve.c
sed -i 's#include "parsec/utils/zone_malloc.h"//#include "/usr/local/PaStiX/parsec/parsec/utils/zone_malloc.h"/' ../refinement/pastix_task_refine.c

grep -r "Threads::Threads" .

make -j 16
make install
make clean

Per mettere a posto CUDA: prototipo per sostituzione: sed -i 's///g' ../
1) Devi capire che scheda video hai:
lspci | grep -i nvidia
poi chiedi a chatGPU, qui sotto un esempio:
+-----------------------------------------------+-------------------------------+-----------------------+
|	Modello GPU				|	Architettura CUDA	|	Codice sm_XX	|
+-----------------------------------------------+-------------------------------+-----------------------+
|	GTX 750 / 750 Ti			|	Maxwell			|	sm_50		|
|	GTX 960 / 970 / 980			|	Maxwell			|	sm_52		|
|	GTX 10xx (ad es. GTX 1060, 1070)	|	Pascal			|	sm_61		|
|	GTX 1650 / 1650 Mobile / Max-Q		|	Turing			|	sm_75 <-- ASUS	|
|	RTX 20xx (ad es. RTX 2060, 2080)	|	Turing			|	sm_75 <-- HP	|
|	RTX 30xx (ad es. RTX 3060, 3090)	|	Ampere			|	sm_86		|
|	RTX 40xx (ad es. RTX 4060, 4090)	|	Ada			|	sm_90		|
+-----------------------------------------------+-------------------------------+-----------------------+

sed -i 's/-arch=sm_35/-arch=sm_75/g' ../CMakeLists.txt

1) il file SpMVCSR.h da modificare:
sed -i 's/row = __shfl(row, /row = __shfl_sync(0xFFFFFFFF, row, /g' ../kernels/gpus/LightSpMV-1.0/src/SpMVCSR.h
sed -i 's/sum += __shfl_down(sum, /sum += __shfl_down_sync(0xFFFFFFFF, sum, /g' ../kernels/gpus/LightSpMV-1.0/src/SpMVCSR.h
sed -i 's/a.x = __shfl_down(a.x, /a.x = __shfl_down_sync(0xFFFFFFFF, /g' ../kernels/gpus/LightSpMV-1.0/src/SpMVCSR.h
sed -i 's/a.y = __shfl_down(a.y, /a.y = __shfl_down_sync(0xFFFFFFFF, /g' ../kernels/gpus/LightSpMV-1.0/src/SpMVCSR.h


**********************************
* procedure for PaStiX4CalculiX  *
******************************end*
* ------------------------------ *
*start****************************
* procedure for CalculiX: ccx_i8 *
**********************************

** Go in the src folder and type make -j8
** The makefile is here: Makefile

CFLAGS = -Wall -O3 -fopenmp -fpic -I ../../../SPOOLES.2.2 -I$(PASTIX_INCLUDE) -DARCH="Linux" -DSPOOLES -DARPACK -DMATRIXSTORAGE -DINTSIZE64 -DPASTIX -DPASTIX_FP32
FFLAGS = -Wall -O3 -fopenmp -fpic -fdefault-integer-8

CC=cc
FC=gfortran

.c.o :
	$(CC) $(CFLAGS) -c $<
.f.o :
	$(FC) $(FFLAGS) -c $<

include Makefile.inc

SCCXMAIN = ccx_2.22.c

OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)

DIR=../../../SPOOLES.2.2

PASTIX_INCLUDE = ../../../PaStiX/PaStiX4CalculiX/install/include
PASTIX_LIBS = ../../../PaStiX/hwloc_i8/install/lib/libhwloc.a ../../../OpenBLAS_i8/install/lib/libopenblas.a \
  ../../../PaStiX/PaStiX4CalculiX/install/lib/libpastix.a ../../../OpenBLAS_i8/install/lib/libopenblas.a -lpthread -lm ../../../PaStiX/PaStiX4CalculiX/install/lib/libspm.a \
  ../../../PaStiX/PaStiX4CalculiX/install/lib/libpastix_parsec.a ../../../PaStiX/PaStiX4CalculiX/install/lib/libpastix_kernels.a ../../../OpenBLAS_i8/install/lib/libopenblas.a -lrt \
                                                                       ../../../PaStiX/parsec_i8/install/lib/libparsec.so \
                                                                   -lpthread -ldl -lrt                                                                  \
                                                                                                                                -lrt \
  ../../../PaStiX/scotch_i8/install/lib/libscotch.a ../../../PaStiX/scotch_i8/install/lib/libscotcherrexit.a -lpthread -lz -lm \
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libgomp.so.1 -lhwloc

LIBS = \
     $(DIR)/spoolesMT.a \
     ../../../ARPACK/libarpack_INTEL_i8.a \
     $(PASTIX_LIBS) \
     -lpthread -lm -lc

ccx_2.22_pastix: $(OCCXMAIN) ccx_2.22.a $(LIBS)
	./date.pl; $(CC) $(CFLAGS) -c ccx_2.22.c; $(FC) -Wall -O3 -o $@ \
	$(OCCXMAIN) ccx_2.22.a $(LIBS) -L/usr/local/PaStiX/hwloc_i8/install/lib -lhwloc -fstack-protector-strong

ccx_2.22.a: $(OCCXF) $(OCCXC)
	ar vr $@ $?


**********************************
* procedure for CalculiX: ccx_i8 *
******************************end*

this are my tests:

_____________________________________________________________________________________________________________________________
   inputFile    | size  | #cpu | spooles | pardiso | PaStiX |  cuda  | Cholesky | Descrription                              |
----------------+-------+------+---------+---------+--------+--------+----------+-------------------------------------------+
basetta         |   29k |   1  |     1.2 |     0.9 |    0.9 |    2.4 |          |      *Temp    *Solid c3d8                 |
tappo           |   38k |   2  |     8.3 |     5.6 |   11.5 |   11.8 |          |      *Plastic *Solid c3d10                |
tappo3d         |  117k |   4  |    fail |   188.6 |  182.0 |  135.0 |          |      *Plastic *Solid c3d10                |
tenutaTappo     |   48k |   2  |   149.0 |    69.3 |   41.9 |   41.3 |          |      *Plastic *Solid c3d6/c3d8            |
vessel          |  326k |   6  |    fail |  1316.7 | 1768.1 | 1291.6 |          |      *Plastic *Shell s3/s4                |
morsettoEsa2    |  545k |   8  |    fail |   387.1 |  414.5 |  301.0 |          |      *Plastic *Solid c3d6/c3d8            |
cylHeadLin232k  |  232k |   6  |   757.5 |   271.2 |  239.1 |  154.7 |          |      *Plastic *Solid c3d4                 |
F9Qt_modal      |   11k |   1  |    30.0 |   841.6 |  748.6 |   fail |          |      *Modal   *Solid c3d6/c3d8            |
flyRiv          |   11k |   1  |  1417.0 |    fail |  835.9 |  384.5 |          | *cnt *Plastic *Solid c3d6/c3d8            |
serbaExpanMod16 |   45k |   2  |    38.6 |    25.3 |   24.4 |   25.5 |    366.2 |      *Elastic *Shell s6/s8                |
serbaExpan      |   24k |   2  |   100.3 |    61.6 |   44.6 |   42.9 |          |      *Plastic *Shell s6/s8                |
grassoSpina     |   47k |   2  |    43.1 |    22.8 |   16.7 |   16.5 |    137.2 |      *Plastic *Solid c3d10                |
slider          |  141k |   4  |  1954.2 |   747.6 |  419.3 |  325.2 |          | *cnt *Plastic *Solid c3d10/c3d6/c3d8      |
piegato         |   49k |   2  |   346.0 |   253.7 |  243.1 |  222.3 |          |      *Elastic *Solid c3d8                 |
quarter         |   28k |   1  |   286.7 |   191.0 |  163.8 |  153.0 |          | *cnt *Rubber  *Solid c3d4/c3d6/c3d8       |
piegaF          |    4k |   1  |   104.5 |    89.1 |   89.4 |   86.6 |          | *cnt *Plastic *Solid c3d6/c3d8            |
levaCnt         |  141k |   4  | 10152.2 |  6238.8 | 4798.3 | 4393.9 |          | *cnt *Plastic *Solid c3d6/c3d8I      *Dyn |
micro_M5_cnt    |   15k |   1  |   140.5 |   101.3 |   37.6 |   72.8 |          | *cnt *Plastic *Solid c3d10/c3d6/c3d8      |
new_2nd rast2.5 |  260k |   6  |  3774.9 |  1634.4 | 1262.9 |  608.9 |          | *cnt *Plastic *Solid c3d10/c3d6/c3d8I     |
micro_cnt 6sigm |   12k |   1  |    55.0 |    45.8 |   44.8 |   42.5 |          | *cnt *Plastic *Solid c3d10/c3d6/c3d8I     |
fabbro          |   28k |   2  |  3794.4 |  2665.6 | 2390.8 | 3821.1 |          |      *Plastic *Solid c3d8I                |
glison          |   19k |   1  |   52.1  |    64.7 |  185.6 |   fail |          |      *Modal   *Solid c3d6/c3d8I           |
torsoSpring     |    1k |   1  |   16.0  |    14.7 |   15.3 |   16.7 |          | *cnt *Plastic *Solid c3d6/c3d8I *Dyn      |
guida           |   49k |   2  |  3733.1 |  1774.8 |   fail | 1796.1 |          | *cnt *Rubber  *Solid c3d4/c3d6/c3d8I      |
welding         |    7k |   1  |  1514.7 |   804.9 |  574.9 |  546.9 |          | *cnt *Plastic *Solid c3d8                 |
voluta_3d       |   58k |   2  |  7680.8 |  3777.8 | 2544.4 | 3227.1 |          | *cnt *Elastic *Solid c3d4/c3d6/c3d8I      |
overM_cnt       |   36k |   2  |   283.7 |   119.6 |   71.0 |   66.7 |          | *cnt *Plastic *Solid c3d4/c3d6/c3d8I      |
conio3D         |   14k |   1  |   552.6 |   261.7 |  237.0 |  214.0 |          | *cnt *Plastic *Solid c3d4/c3d6/c3d8I      |
parklock        |  217k |   8  |  2787.4 |   628.0 |  734.7 |  282.3 |          |      *Plastic *Solid c3d10                |
----------------+-------+------+---------+---------+--------+--------+----------+-------------------------------------------+
1 Like

@teofil75 thank you for your continued help. Much appreciated! :pray:t4:

The most important thing I got from your message is to turn off MPI for both parsec and Pastix compilation. Just to round up what worked for me at the end (and not what I had posted in very first post):

My make_parsec.sh file :

#!/bin/bash

if ! [[ -d build ]]; then
    mkdir build
fi
cd build

INSTALLPATH="/usr/local/PaStiX/parsec_i8"

umask 022

# fixes
sed -i '/-1 == cpu/i return cpu;' parsec/bindthread.c

cmake \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_Fortran_COMPILER=gfortran \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=${INSTALLPATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.6 \
    -DCUDA_DIR=/usr/local/cuda-12.6 \
    -DCUDA_USE_STATIC_CUDA_RUNTIME=ON \
    -DCMAKE_CUDA_HOST_COMPILER=gcc \
    -DPARSEC_GPU_WITH_CUDA=ON \
    -DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8 \
    -DPARSEC_DIST_WITH_MPI=OFF \
    -DBUILD_TESTING=OFF \
    ..

make -j8

rm -rf ${INSTALLPATH}
make install

And my make_pastix.sh file:

#!/bin/bash
if ! [[ -d build ]]; then
    mkdir build
fi
cd build


cmake   \
    -DBLAS_DIR=/usr/local/OpenBLAS_i8 \
    -DHWLOC_DIR=/usr/local/PaStiX/hwloc_i8 \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.6 \
    -DCMAKE_INSTALL_PREFIX=/usr/local/PaStiX/pastix_i8 \
    -DCMAKE_BUILD_TYPE=Release \
    -DPASTIX_WITH_PARSEC=ON \
    -DPARSEC_DIR=/usr/local/PaStiX/parsec_i8 \
    -DSCOTCH_DIR=/usr/local/PaStiX/scotch_i8 \
    -DPASTIX_WITH_CUDA=ON \
    -DPASTIX_ORDERING_SCOTCH=ON \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_Fortran_COMPILER=gfortran \
    -DCMAKE_C_FLAGS="-fopenmp" \
    -DPASTIX_WITH_MPI=OFF \
    -DBUILD_TESTING=OFF \
    -DPYTHON_EXECUTABLE=~/anaconda3/envs/py27env/bin/python \
    ..

make -j8
make install

The newly compiled ccx 2.22 executable works good for me.
I realized that the error messages I was getting:

W@00000 Oversubscription on core 0 detected
W@00000 Oversubscription on core 1 detected

only happens when I set

export OMP_NUM_THREADS=6

I don’t get that error for OMP_NUM_THREADS=4. Not sure why that would happen, but I am happy that it all works now.

On my laptop I receive the same error when I use more than 8 cores.
But what about the GPU usage? Are you able to use them?
Anyway, PaStiX is twice faster than Pardiso on highly non linear models.

I have not tested whether ccx is using the GPU or not. How are you testing?

All the hard work paid off since it meets the objective I had in mind. For analyses I need to run that involves a lot of contact and material nonlinearity, Pastix seems to provide most consistent run time and iteration steps. Paradiso comes close but still has some variability from one run to the other. Spooles fares miserably. Sometimes it will run to completion, and sometimes it will not converge. Even if Spooles converges to completion, it can take a long time.

I am happy that Pastix can provide consistent convergence when run on multiple cores. More tests to follow.

I changed a little bit the pastix.c src code, in this way I could set the 2 parameters:

IPARM_MAX_BLOCKSIZE
and
IPARM_MIN_BLOCKSIZE

In this way it is possible to speed up the factorization:

+-------------------------------------------------+
  Factorization step:
    Factorization used: LU
    Time to initialize internal csc:      0.6602 
    Time to initialize coeftab:           0.6432 
    Time to factorize:                    1.3929  (442.04 GFlop/s)
    Number of operations:                      615.70 GFlops
    Number of static pivots:                     0
CPU vs GPU CBLK GEMMS -> 94770 vs 0
CPU vs GPU BLK GEMMS -> 44100 vs 0
CPU vs GPU TRSM -> 14780 vs 0 

Here my env vars:

export MAX_BLOCKSIZE=1024
export RATIO_BLOCKSIZE=4

In this way I reached: 442.04 GFlop/s, instead of 350 GFlop/s.
In pastix.c change the lines:

piparm[IPARM_MIN_BLOCKSIZE] = 1024;
piparm[IPARM_MAX_BLOCKSIZE] = 2048;

with:

	const char *env_value = getenv("MAX_BLOCKSIZE");	// Ottieni la variabile d'ambiente MAX_BLOCKSIZE
	int max_blocksize = (env_value && atoi(env_value) != 0) ? atoi(env_value) : 2048;
	piparm[IPARM_MAX_BLOCKSIZE] = max_blocksize;

	const char *env_fraction = getenv("RATIO_BLOCKSIZE");	// Ottieni la variabile d'ambiente RATIO_BLOCKSIZE
	int ratio_blocksize = (env_fraction && atoi(env_fraction) != 0) ? atoi(env_fraction) : 2;