CalculiX and PaStiX solver Windows version

Hi Rafal,

I try to compile calculix using mingw64 & intel pardiso, but there is undefined reference for pardiso.

Do you have working Makefile for mingw64 & intel pardiso?

1 Like

Modify Makefile_MT (example), like:

add:

CFLAGS = -Wall -O2 -fopenmp -I ../../../SPOOLES.2.2 -DARCH="Linux" -DSPOOLES -DARPACK -DMATRIXSTORAGE -DUSE_MT=1 -I$(MKL)/include

MKL=/usr/local/mkl/2023.0.0

LIBS = \
       $(MKL)/lib/intel64/mkl_rt.lib \
       $(DIR)/MT/src/spoolesMT.a \
       $(DIR)/spooles.a \
       ../../../ARPACK/libarpack_INTEL.a \
       -lpthread -lm

CalculiX PARDISO + exdous v2.20 (GitHub - gustafson/CalculiX at exodus):
Dropbox - ccx_2.20_exo_pardiso.7z - Simplify your life

1 Like

Hi @rafal.brzegowy , are you able to use successfully PastiX with CUDA? if do so, can you share the executable with us? if not used, then is it possible to use CUDA with pastiX on Windows? and can you guide me on this?

Thanks in Advance!

Unfortunately no, the environment I use (msys2/mingw) does not have an nvcc compiler (also clang generates a lot of errors).

https://llvm.org/docs/CompileCudaWithLLVM.html

ccx 2.22 + exodus + PARDISO + i8

https://www.dropbox.com/scl/fi/pz3rbbtctm1sott0iolj2/ccx-2.22-exodus-int32_int64.7z?rlkey=asvn3jslais733s9sdczyapmu&st=rsq2sbaj&dl=0

1 Like

Thank you @rafal.brzegowy !!

Hello Mr. Brzegowy,

I was wondering if your CCX 2.17 version using PaStiX has the GPU CUDA library linked, and if so, how can I test it? How can I run an example test? Are there any environment variables that need to be set?

This version base on cudaless (no cuda): GitHub - Kabbone/PaStiX4CalculiX at cudaless

Test version ccx 2.22 base on PaStiX 6.4.0 (raw, Windows, no CUDA):
https://www.dropbox.com/scl/fi/xz9leyv99uc0ixdveu432/ccx_2.22_pastix_6.4.0.rar?rlkey=4t8iocv4mzpfkc45z2d8nxvmt&st=4z3qs5d2&dl=0

pastix.c from:

Thank you very much for all your appreciated work on compiling CCX with the Pastix solver. I wonder if it would be possible to compile it also on Ubuntu 22.04 Linux, and if it could be possible to use CUDA GPU as well.
Thanks in advance

p.s.
when I try to compile PaStiX4CalculiX kabbone version always obtain:
CMake Error at cmake_modules/morse_cmake/modules/find/LibrariesAbsolutePath.cmake:59 (message):
Dependency of CUDA_LIBRARIES ‘Threads::Threads’ NOT FOUND
Call Stack (most recent call first):
cmake_modules/morse_cmake/modules/find/FindPARSEC.cmake:211 (libraries_absolute_path)
CMakeLists.txt:365 (find_package)

Maybe How to Install CUDA on Ubuntu 22.04 is of use to you.

The Kabbone repositories are mostly useful for those that don’t have CUDA capable GPU’s. If you have CUDA installed and working, try the original PastiX4CalculiX. (Note that this uses an older version of PastiX, which requires Python 2 to build; the scripts used in the build fail with Python 3.)

thank you for clarification, but every time I run cmake I obtain this error:


– Looking for hwloc_topology_init - found
CMake Error at cmake_modules/morse_cmake/modules/find/LibrariesAbsolutePath.cmake:59 (message):
Dependency of CUDA_LIBRARIES ‘Threads::Threads’ NOT FOUND
Call Stack (most recent call first):
cmake_modules/morse_cmake/modules/find/FindPARSEC.cmake:211 (libraries_absolute_path)
CMakeLists.txt:365 (find_package)

– Configuring incomplete, errors occurred!
about Threads::Threads. But I don’t understand if this error is about CUDA or PaRSEC. Do you have some advice?

It looks like a CUDA error. The file FindPARSEC.cmake is installed by PaRSEC, I believe.

There is an option of building PaRSEC without thread support;
-DPARSEC_DIST_THREAD=OFF. Maybe that helps?

1 Like

Finally I was able to compile and install the PaStiX solver for CalculiX v 2.22 on native Linux UBUNTU 24.04
I performed a lot of tests and compared the process time for SPOOLES, Paradiso from MKL oneAPI and PaStiX. I compared also the same solver on windows11 from Rafal user and my compiled version on Linux.
The Linux solver is always twice faster than the windows corrispondant version.
Using the Rafal executables, PaStiX is comparable but a little much faster than Pardiso. On Linux: PaStiX takes twice memory respect Pardiso, moreover Pardiso is faster than PaStiX.
Finally, I compiled a ccx version using the new PANUA solver. It is slower than Pardiso MKL too!!!
As the last step I need to compile PaStiX with cuda support, but unfortunately I always get the same error: cuda THREADS::THREADS error. It seems there is some issiues when I use both PaRSEC with CUDA

1 Like

I’m curious about the Panua solver - how much does it cost? And how much slower is it than MKL Pardiso? I was expecting it to be significantly faster based on the marketing materials on their website…

1 Like

This might be filesystem related. It is well known that Linux performs better here than ms-windows.

If you want to properly compare versions, they should be built the same as far as possible; e.g. both using 4-byte integers.

How many different problems did you use to make the comparison?
In the tests that I did, PaStiX is not always faster than SPOOLES.

The PANUA solver is very expensive, costing 8,000€/year. Moreover, something strange happened: the solver ran properly, but no FRD file was generated.

Finally, I was able to successfully compile CalculiX v2.22 with the PaStiX solver (the modified version PaStiX4CalculiX v6.0.1) including CUDA support v12.4 and PaRSEC. I tested it on an NVIDIA RTX 6000 ADA with 48GB of memory. Unfortunately, the results were disappointing: while the nvidia-smi tool recognized the ccx executable, the GPU utilization percentage remained low, as did the power consumption.
I set the correct PaStiX env variable but I cannot understand the performancies.
Does anyone has any advice?

Alex

I seem to remember a paper a dozen or more years ago that broke down the performance issues and choke points. I seem to remember that whether cuda and a gpu help or not depends on the relative performance between the CPU and GPU. The GPU is hindered by the GPU memory size, geometry, DP speed, and communications speed with memory. The CPU is hindered by DP speed, and memory size and throughput. The paper reports testing about 7 problems, only some of which had a significant speed inprovement with a GPU. In your case your GPU has poor performance with DP64 which runs at 1/64 the throughput of HP32. The relative performance WRT the CPU might be especially bad if your CPU is relatively good with more memory channels and avx512 (which speeds FP64 a lot by doing several FP64 calculations at a time.). In general Nvidia consumer grade cards may have the same processor as their Higher performance in FP 64 cards, but the card is deliberately crippled to a lower FP64 multiplier. AMD uses a higher divisor for their consumer grade cards, but no CUDA. Best bet is a used Nvidia A100 80GB or something similar like a A30, V100, or a K80. These are all similar in FP64 performance, but graphics and AI performance falls as you go down this list. K80’s don’t do graphics at all. What duffers like us need is a HPC guru to do for AMD products (with their higher FP64 in consumer products) what we have for Nvidia CUDA cards as far as a usable solver to link to callculix.