CalculiX with PaStiX and CUDA

teofil75 · December 10, 2024, 10:27pm

Finally, I was able to compile CalculiX v. 2.22 using PaStiX and CUDA 12.4 on KUBUNTU 24.04. The runtime scheduler I used was PaRSEC. I performed some tests, but with small models—fewer than 300,000 nodes—the speed-up from CUDA is not noticeable. How large does the model need to be to observe a significant advantage from using the GPU?
For my tests I only have a laptop with 64GB of ram and a NVIDIA RTX 3000 graphic card with 6GB of memory. The nvidia-smi display a marginal usage of the graphic card, 15W over max 65W. Which env variable do I need to set up? Is it enought: export PASTIX_GPU=1

Alex

rsmith · December 11, 2024, 5:25pm

Even though it doesn’t directly answer your question, The GPU is not always faster might be of interest.

TL;DR: The GPU is limited by the bandwith of the PCI express bus. This does not affect e.g. multithreaded AVX code. So even for “embarrasingly parallel” calculations, the CPU might be a better choice.

teofil75 · December 11, 2024, 6:21pm

So I don’t need to set any environment variables to display the number of GPUs for the CalculiX run.

teofil75 · December 15, 2024, 1:17pm

Now, I’m quite sure that the GPU are not used at all. Even if I was able to compile CalculiX and all the necessary dependancies this is the output about the PaStiX solver and the GPU usage:
…
…
Not reusing csc.

±------------------------------------------------+

PaStiX : Parallel Sparse matriX package     +

±------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Started
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 1
Number of GPUs: 1
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 514302
nnz: 31369950
…

and after:
…
…
Number of operations: 149.49 GFlops
Number of static pivots: 0
CPU vs GPU CBLK GEMMS → 218533 vs 0
CPU vs GPU BLK GEMMS → 17784 vs 0
CPU vs GPU TRSM → 7556 vs 0
Time to solve: 0.1248
- iteration 1 :
…
…
Moreover the PaRSEC compilation and installation goes well, but some tests were failed:
Running tests…

Test project /usr/local/PaStiX/parsec_i8/build
Start 1: launcher_shm
1/34 Test #1: launcher_shm … Passed 0.00 sec
Start 2: launcher_mpi
2/34 Test #2: launcher_mpi …***Failed 0.01 sec
Start 3: launcher_gpu
3/34 Test #3: launcher_gpu …***Failed 0.00 sec
Start 4: unit_startup1_shm
4/34 Test #4: unit_startup1_shm … Passed 0.66 sec
Start 5: unit_startup2_shm
5/34 Test #5: unit_startup2_shm … Passed 0.41 sec
…
…
The nvidia-smi report:
Sun Dec 15 08:23:22 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 6000 Ada Gene… On | 00000000:01:00.0 Off | Off |
| 30% 45C P0 64W / 300W | 462MiB / 49140MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 4060 Ti On | 00000000:05:00.0 Off | N/A |
| 0% 35C P8 8W / 160W | 191MiB / 8188MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2939 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1088548 C ccx_2.22_cuda 436MiB |
| 1 N/A N/A 2939 G /usr/lib/xorg/Xorg 57MiB |
| 1 N/A N/A 3217 G /usr/bin/gnome-shell 110MiB |
±----------------------------------------------------------------------------------------+
clearly show that the cuda ccx_2.22_cuda task is started but the GPU usage is 0% as well as the GPU momory!!!

I am afraid about the test # 3: 3/34 Test #3: launcher_gpu …***Failed
Do you recomand me to use cuda 10.2 instead of 12.4?

Topic		Replies	Views
PaStiX solver performance	13	339	December 23, 2024
PaStiX4CalculiX does not use GPU	10	2301	December 9, 2024
CalculiX Version 2.17	7	3609	March 26, 2021
CalculiX for GPUs executable?	1	458	November 27, 2023
Trouble compiling and running CalculiX with Pastix on Ubuntu 24.04	31	346	January 7, 2025

CalculiX with PaStiX and CUDA

Related topics