Problems of PaStiX with CUDA

Hi all, I compiled CalculiX with CUDA enabled PaStiX.
[user@node01 FEM]$ ldd ccx_2.19_i8 => (0x00007ffcc7b65000) => /lib64/ (0x00002b0086210000) => /home/user/SCILIB/lib/ (0x00002b008645f000) => /lib64/ (0x00002b00866af000) => /lib64/ (0x00002b00868cb000) => /lib64/ (0x00002b0086bed000) => /lib64/ (0x00002b0086eef000) => /home/user/SCILIB/lib/ (0x00002b00870f7000) => /lib64/ (0x00002b0087377000) => /home/user/FEM/PaStiX/ompi3/lib/ (0x00002b008757b000) => /home/user/FEM/PaStiX/ompi3/lib/ (0x00002b0087877000) => /lib64/ (0x00002b0087a91000) => /lib64/ (0x00002b0087ca7000) => /lib64/ (0x00002b0087fae000) => /lib64/ (0x00002b00881d4000)
/lib64/ (0x00002b0085fec000) => /lib64/ (0x00002b00885a2000) => /lib64/ (0x00002b00887b8000) => /usr/lib64/atlas/ (0x00002b00889f4000) => /lib64/ (0x00002b008965c000) => /lib64/ (0x00002b0089872000) => /usr/local/cuda-10.0/lib64/ (0x00002b0089bdc000) => /home/user/FEM/PaStiX/ompi3/lib/ (0x00002b0089e56000) => /home/user/FEM/PaStiX/ompi3/lib/ (0x00002b008a10c000) => /lib64/ (0x00002b008a416000) => /lib64/ (0x00002b008a621000) => /lib64/ (0x00002b008a824000) => /lib64/ (0x00002b008aa29000) => /lib64/ (0x00002b008ac7a000) => /lib64/ (0x00002b008aea0000) => /lib64/ (0x00002b008b0a5000) => /lib64/ (0x00002b008b2bd000)
But while running, it seems CUDA does not work!
[user@node01 FEM]$ ./ccx_2.19_i8 ball

Using up to 1 cpu(s) for the energy calculation.

Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.

  • PaStiX : Parallel Sparse matriX package     +

Version: 6.0.1
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Started
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 1
Number of GPUs: 0 <============================
MPI communication support: Funneled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 3059
nnz: 205489

Ordering step :
Ordering method is: Scotch
Time to compute ordering: 0.0252
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: 615448
Fill-in of L: 2.995041
Time to compute symbol matrix: 0.0035

The compiler and system is:
CentOS 7 + CUDA-10.0 + gcc-4.8.5 + ompi-3.1.6

What is the output of nvidia-smi ?

1 Like

NO ccx output. ccx runs only on cpu

That is quite an impressive machine! I had lots of trouble getting Pastix and Calculix to work. See if you can get just Pastix installed and working without calculix. They provide a simple.c example file that I modified to read a matrix outputted from calculix to text files. Once that works with gpu, then I would proceed with understanding what is wrong with the calculix integration… Some more discussion on this is here:

Thank you. You remind me.
While I make test the pastix example, it really has some mistake. Only several tests passed.

That looks like some memory issue… What you need is to test that pastix can solve a calculix matrix and use the available gpus…

See these two files:

Thank you. I’ll test this.
I find that the problem is caused by hwloc, different version brings different bug.