PaStiX solver performance

Does anyone have experience with the PaStiX solver using GPU, both with and without CUDA? I managed to compile CalculiX using the PaStiX4CalculiX library with CUDA and the PaRSEC scheduler. The program runs correctly, but the performance is worse compared to MKL oneAPI Pardiso.
I fix all the env variable correctly, the nvidia-smi detect correctly the cuda job, the gpu is correctly detected by PaStiX output, but the cpu vs gpu revel a 0 usage for the GPU.

Alex

1 Like

Hey Alex - I have the same question. I haven’t had time to compile pastix but I do want to benchmark it against MKL Pardiso.

What compiler settings were you using (optimization level, march, etc)?

For larger problems Pastix may be significantly slower than Parsec. For middle sized I think could be a little faster. For smaller it can be quite a bit faster. Cuda does not help with all problems and requires a GPU with high double precision throughput and a lot of memory to help at all.

3 Likes

Commercial solvers are quite a bit faster, even than MKL. Any ideas what they do to achieve the much higher calculation speed?

Sort of. The developers of Pardiso***, I think, have a much, much faster version (costs a lot) than Intel provides. Also the way the Matrices are broken down for calculation for all the sub parts of the problems can be made to fit the internal computational structure and memory of a gpu. GPU use is slow because it accesses ram very slowly. The program Calculix was written for standard serial application on a CPU, with a little bit of parallel operation here and there. The GPU is mostly used to access the Double precision compute nodes in the gpu. Most of these are crippled on consumer grade GPU, or even some of the modern ones optimized for AI which can do many teraflops up to near a peta flops for low precision numbers. The basics of callculix were written before the potential for numeric calculation on GPU’s was developed to today’s extent. I.e. Callculix is a 2004 product that has not taken advantage of most proprietary numerical development since. If someone like Guido and friends should like to devote a few dozen man years to an open source update for calculix using GPU’s it could catch up with where FEM software is now in a decade… but the commercial software would then be 10 years ahead of where it is now.

Something like that was happening with LSDyna, (started from a public domain product) with a lot of development by government research and grad students. It was bought up by Ansys in 2019 at about the time I was looking to purchase an FEM program for my use. (It would have been well suited to what I do).

Something like that may happen as there is a lot of non-propriatery HPC work going on, not on the proprietary Nvidia side of things, but by the open source AMD/Apple/etc. side of things. Probably won’t be an inexpensive product like Callculix/Mecway based on this in my lifetime, but a lot of tools are there now if you’ve got the team.


https://panua.ch/pardiso/benchmarks

so, what is written in the CalculiX readMe:

Finally, CalculiX has recently been linked with PaStiX. PaStiX is a very
fast freeware solver able to use the Graphical Processing Unit
(GPU). Benchmark tests have revealed a speed-up of a factor up to 8 for
static calculations with contact. This, however, assumes that you have a
hight-end Graphical card with at least 32 GB of memory on it. Still, even
without using the GPU speed-ups of up to a factor of 4 were observed. This
applies to medium to big models in the range between 1 and 5 million
degrees of freedom.

is not so true?

My experience with Pastix is that it gets slower compared to Pardiso for problems over 1 million nodes (not degrees of freedom). and needs more memory to run than Pardiso. I agree the GPU needs to have a lot of memory and be fast in Double precision. I think the origional testing with CUDA was done with K80 GPU’s.

In my tests I had the RTX 6000 ADA that is better than K80.
Moreover all PaRSEC tests had been passed. Are there some env variable to be fixed in order to use the GPU?

Calculix uses the GPU mostly for DoublePrecision math which is crippled on this particular GPU to 1/64 of single precision. Otherwise the specs on this GPU are very good. Cards with good DP compute performance have DP at 1/2 that of single precision.

but it’s not similar with SAP, started as opensource FE code and later use TAUCS matrix solver. Not using commercial solver has in depth evaluation, probably.

my expereinces on medium size of nodes but complex analysis such as large multipart contact analysis including plasticity, PaStiX convergences found to be faster than MKL significantly.

My experience with medium size non-linear problems is similar. As it closes to convergance it uses much faster single precision (SP) at first, then DP for the last iteration or two. Complex problems and non-linear can take many more iterations for convergance.

I’m conducting benchmarks of various solvers using different models. In my opinion, there is a significant difference between the performance of linear and nonlinear models:

As Xyont write: PaStiX is significant faster than Pardiso on highly non-linear models.

1 Like

interesting report, Pardiso failed at some test in contact and plasticity, PaStiX almost three times faster in some cases, but also failed at rubber contact. Can environment variable to be share? thank you