Thanks for your reply. I would expect the same, but I have not tried MKL INTEL Pardiso,
because I’m atm not sure how to do this. I was just surprised, because everyone said, that spooles is slower, but at my case this is just the case, if I use not the full multithread spooles.
I use the https://www.pardiso-project.org/
library libpardiso600-GNU800-X86-64.so
and the makefile for ccx:
CFLAGS = -Wall -g -O2 -fopenmp -I ../../../SPOOLES.2.2 -DARCH="Linux" -DSPOOLES -DPARDISO -DARPACK -DMATRIXSTORAGE -DUSE_MT=8
FFLAGS = -Wall -g -O2 -fopenmp
CC=cc
FC=gfortran
.c.o :
$(CC) $(CFLAGS) -c $<
.f.o :
$(FC) $(FFLAGS) -c $<
include Makefile.inc
SCCXMAIN = ccx_2.17.c
OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)
DIR=../../../SPOOLES.2.2
LIBS = \
$(DIR)/MT/src/spoolesMT.a \
$(DIR)/spooles.a \
../../../ARPACK/libarpack_INTEL.a \
-L../../../pardiso -lpardiso600-GNU800-X86-64 -lpthread -lm -llapack -lc
ccx_2.17_MT: $(OCCXMAIN) ccx_2.17_MT.a
./date.pl; $(CC) $(CFLAGS) -c ccx_2.17.c; $(FC) -fopenmp -Wall -O2 -g -o $@ $(OCCXMAIN) ccx_2.17_MT.a $(LIBS)
ccx_2.17_MT.a: $(OCCXF) $(OCCXC)
While Pardiso is solving the equation system, htop shows, that all threads are working with 100%, thus I’m quite sure, I use the multi-threading version.
Maybe the bottleneck is the standard lapack
package, which is needed by Pardiso Project
?
Atm I have no explanation for this behaviour and for quite small input Files, the Pardiso Project
Solver is faster (e.g. 7 seconds compared to 11 with spooles)
I had a more detailed look at the runtimes: (all times in sec)
PARDISO:
total_time;304.65253
readinput;0.66694
fort_allocation;11.36228
fort_calinput;39.87290
init_var;0.00001
descascade;0.00002
det_struct_mat;1.69151
linstatic_total;251.01715
linstatic_stress1;0.08033
linstatic_stiffness;1.01056
linstatic_spooles;0.00000
linstatic_pardiso;247.17648
linstatic_stress2;1.21279
spooles_factoring;0.00000
spooles_solve;0.00000
spooles_cleanup;0.00000
pardiso_factoring;243.78792
pardiso_solve;3.28830
pardiso_cleanup;0.10025
SPOOLES full multithreading:
total_time;194.28114
readinput;0.63942
fort_allocation;11.48573
fort_calinput;40.45663
init_var;0.00001
descascade;0.00000
det_struct_mat;1.80309
linstatic_total;139.84573
linstatic_stress1;0.08460
linstatic_stiffness;1.03885
linstatic_spooles;135.93945
linstatic_pardiso;0.00000
linstatic_stress2;1.23636
spooles_factoring;134.12712 (8 Cores)
spooles_solve;1.20269 (8 Cores)
spooles_cleanup;0.60964
pardiso_factoring;0.00000
pardiso_solve;0.00000
pardiso_cleanup;0.00000
SPOOLES part multithreading:
total_time;399.39059
readinput;0.62447
fort_allocation;14.16187
fort_calinput;44.83856
init_var;0.00001
descascade;0.00000
det_struct_mat;1.75905
linstatic_total;337.99054
linstatic_stress1;0.08328
linstatic_stiffness;1.03335
linstatic_spooles;334.12296
linstatic_pardiso;0.00000
linstatic_stress2;1.20208
spooles_factoring;331.44077 (1 Core)
spooles_solve;2.15284 (8 Cores)
spooles_cleanup;0.52933
pardiso_factoring;0.00000
pardiso_solve;0.00000
pardiso_cleanup;0.00000
Interesting are just the times with spooles or pardiso in the variable name. The other times are the same, because it’s the ccx part