Problems of PaStiX with CUDA

Hi all, I compiled CalculiX with CUDA enabled PaStiX.
[user@node01 FEM]$ ldd ccx_2.19_i8
linux-vdso.so.1 => (0x00007ffcc7b65000)
libarpack.so.2 => /lib64/libarpack.so.2 (0x00002b0086210000)
libhwloc.so.15 => /home/user/SCILIB/lib/libhwloc.so.15 (0x00002b008645f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b00866af000)
libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00002b00868cb000)
libm.so.6 => /lib64/libm.so.6 (0x00002b0086bed000)
librt.so.1 => /lib64/librt.so.1 (0x00002b0086eef000)
libparsec.so.2 => /home/user/SCILIB/lib/libparsec.so.2 (0x00002b00870f7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b0087377000)
libmpi.so.40 => /home/user/FEM/PaStiX/ompi3/lib/libmpi.so.40 (0x00002b008757b000)
libmpi_cxx.so.40 => /home/user/FEM/PaStiX/ompi3/lib/libmpi_cxx.so.40 (0x00002b0087877000)
libz.so.1 => /lib64/libz.so.1 (0x00002b0087a91000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00002b0087ca7000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00002b0087fae000)
libc.so.6 => /lib64/libc.so.6 (0x00002b00881d4000)
/lib64/ld-linux-x86-64.so.2 (0x00002b0085fec000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b00885a2000)
libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00002b00887b8000)
libtatlas.so.3 => /usr/lib64/atlas/libtatlas.so.3 (0x00002b00889f4000)
libudev.so.1 => /lib64/libudev.so.1 (0x00002b008965c000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x00002b0089872000)
libcudart.so.10.0 => /usr/local/cuda-10.0/lib64/libcudart.so.10.0 (0x00002b0089bdc000)
libopen-rte.so.40 => /home/user/FEM/PaStiX/ompi3/lib/libopen-rte.so.40 (0x00002b0089e56000)
libopen-pal.so.40 => /home/user/FEM/PaStiX/ompi3/lib/libopen-pal.so.40 (0x00002b008a10c000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x00002b008a416000)
libutil.so.1 => /lib64/libutil.so.1 (0x00002b008a621000)
libcap.so.2 => /lib64/libcap.so.2 (0x00002b008a824000)
libdw.so.1 => /lib64/libdw.so.1 (0x00002b008aa29000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00002b008ac7a000)
libattr.so.1 => /lib64/libattr.so.1 (0x00002b008aea0000)
libelf.so.1 => /lib64/libelf.so.1 (0x00002b008b0a5000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00002b008b2bd000)
But while running, it seems CUDA does not work!
[user@node01 FEM]$ ./ccx_2.19_i8 ball

Using up to 1 cpu(s) for the energy calculation.

Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
±------------------------------------------------+

  • PaStiX : Parallel Sparse matriX package     +
    

±------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Started
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 1
Number of GPUs: 0 <============================
MPI communication support: Funneled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 3059
nnz: 205489

±------------------------------------------------+
Ordering step :
Ordering method is: Scotch
Time to compute ordering: 0.0252
±------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: 615448
Fill-in of L: 2.995041
Time to compute symbol matrix: 0.0035
±------------------------------------------------+

The compiler and system is:
CentOS 7 + CUDA-10.0 + gcc-4.8.5 + ompi-3.1.6

What is the output of nvidia-smi ?

1 Like

NO ccx output. ccx runs only on cpu

That is quite an impressive machine! I had lots of trouble getting Pastix and Calculix to work. See if you can get just Pastix installed and working without calculix. They provide a simple.c example file that I modified to read a matrix outputted from calculix to text files. Once that works with gpu, then I would proceed with understanding what is wrong with the calculix integration… Some more discussion on this is here:

Thank you. You remind me.
While I make test the pastix example, it really has some mistake. Only several tests passed.

That looks like some memory issue… What you need is to test that pastix can solve a calculix matrix and use the available gpus…

See these two files:

https://www.feacluster.com/code/simple.c
https://www.feacluster.com/code/spooles.c._for_ijv

Thank you. I’ll test this.
I find that the problem is caused by hwloc, different version brings different bug.