Finally, I was able to compile CalculiX v. 2.22 using PaStiX and CUDA 12.4 on KUBUNTU 24.04. The runtime scheduler I used was PaRSEC. I performed some tests, but with small models—fewer than 300,000 nodes—the speed-up from CUDA is not noticeable. How large does the model need to be to observe a significant advantage from using the GPU?
For my tests I only have a laptop with 64GB of ram and a NVIDIA RTX 3000 graphic card with 6GB of memory. The nvidia-smi display a marginal usage of the graphic card, 15W over max 65W. Which env variable do I need to set up? Is it enought: export PASTIX_GPU=1
TL;DR: The GPU is limited by the bandwith of the PCI express bus. This does not affect e.g. multithreaded AVX code. So even for “embarrasingly parallel” calculations, the CPU might be a better choice.
Now, I’m quite sure that the GPU are not used at all. Even if I was able to compile CalculiX and all the necessary dependancies this is the output about the PaStiX solver and the GPU usage:
…
…
Not reusing csc.
±------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Started
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 1
Number of GPUs: 1
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048
and after:
…
…
Number of operations: 149.49 GFlops
Number of static pivots: 0
CPU vs GPU CBLK GEMMS → 218533 vs 0
CPU vs GPU BLK GEMMS → 17784 vs 0
CPU vs GPU TRSM → 7556 vs 0
Time to solve: 0.1248
- iteration 1 :
…
…
Moreover the PaRSEC compilation and installation goes well, but some tests were failed:
Running tests…
Test project /usr/local/PaStiX/parsec_i8/build
Start 1: launcher_shm
1/34 Test #1: launcher_shm … Passed 0.00 sec
Start 2: launcher_mpi
2/34 Test #2: launcher_mpi …***Failed 0.01 sec
Start 3: launcher_gpu
3/34 Test #3: launcher_gpu …***Failed 0.00 sec
Start 4: unit_startup1_shm
4/34 Test #4: unit_startup1_shm … Passed 0.66 sec
Start 5: unit_startup2_shm
5/34 Test #5: unit_startup2_shm … Passed 0.41 sec
…
…
The nvidia-smi report:
Sun Dec 15 08:23:22 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 6000 Ada Gene… On | 00000000:01:00.0 Off | Off |
| 30% 45C P0 64W / 300W | 462MiB / 49140MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 4060 Ti On | 00000000:05:00.0 Off | N/A |
| 0% 35C P8 8W / 160W | 191MiB / 8188MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2939 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1088548 C ccx_2.22_cuda 436MiB |
| 1 N/A N/A 2939 G /usr/lib/xorg/Xorg 57MiB |
| 1 N/A N/A 3217 G /usr/bin/gnome-shell 110MiB |
±----------------------------------------------------------------------------------------+
clearly show that the cuda ccx_2.22_cuda task is started but the GPU usage is 0% as well as the GPU momory!!!
I am afraid about the test # 3: 3/34 Test #3: launcher_gpu …***Failed
Do you recomand me to use cuda 10.2 instead of 12.4?