Hi all,
I was able to compile the windows version of ccx with the Pastix solver.
But currently this version does not work, an error immediately pops up (crash of ccx).
Hi all,
I was able to compile the windows version of ccx with the Pastix solver.
But currently this version does not work, an error immediately pops up (crash of ccx).
I have the first working version of ccx on Windows!
My benchmark:
ccx 2.17 PaStiX, metis, no parsec, no starpu: 590s
ccx 2.17 PaStiX, scotch, no parsec, no starpu: 493s
ccx 2.17 Intel PARDISO: 649s
ccx 2.17 SPOOLES: 825s
Version for tests (version i4, metis):
http://s000.tinyupload.com/index.php?file_id=87740980074093979332
password: nCG4W48PgG
New version base on Scotch and have link to MKL PARDISO (pardiso libraries required!):
https://gofile.io/d/klndR2
password: yCxFxNAkFW
With this version you can directly compare pastix vs pardsio (Windows version).
@rafal.brzegowy, thanks for take the time to compile, and share this versions, Iâm using your Pardiso versions since months and they work perfectly!. Will test this one as soon as possible.
Best regards
Hey @rafal.brzegowy, could you share the makefile that you used to compile PaStiX without Parsec? I am encountering an error as spm keeps looking for parsec.h
Hi,
Did you use PaStiX cudaless version from?:
My Makefile (alpha version )
OPT = -O2 -m64
#Specify where to store the generated .o files
OBJDIR = Multi_v2
CFLAGS = -Wall $(OPT) -fopenmp -posix -fpic -I$(PASTIX_INCLUDE) -I$(HWLOC_INCLUDE) -I$(STARPU_INCLUDE) -DARCH="Linux" -DPARDISO -DMATRIXSTORAGE -DUSE_MT=1 -DNETWORKOUT -DCALCULIX_EXTERNAL_BEHAVIOURS_SUPPORT -DPASTIX -DPASTIX_FP32 -fcommon
FFLAGS = -Wall $(OPT) -fopenmp -posix -fpic -fallow-argument-mismatch
#-DPASTIX_GPU
#ARPACK
CFLAGS+= -I /usr/local/ARPACK_OpenBLAS -DARPACK
#SPOOLES
CFLAGS+= -I /usr/local/SPOOLES.2.2 -DSPOOLES
CC=gcc
FC=gfortran
#Source files in this folder and in the adapter directory
$(OBJDIR)/%.o : %.c
$(CC) $(CFLAGS) -c $< -o $@
$(OBJDIR)/%.o : %.f
$(FC) $(FFLAGS) -c $< -o $@
include Makefile.inc
SCCXMAIN = ccx_2.17.c
OCCXF = $(SCCXF:%.f=$(OBJDIR)/%.o)
OCCXC = $(SCCXC:%.c=$(OBJDIR)/%.o)
OCCXMAIN = $(SCCXMAIN:%.c=$(OBJDIR)/%.o)
DIR1=/usr/local/SPOOLES.2.2
DIR2=/usr/local/ARPACK_OpenBLAS
MKL=/usr/local/MKL2020U2
PASTIX_INCLUDE = /usr/local/PaStiX/pastix_i4/include
HWLOC_INCLUDE = /usr/local/PaStiX/hwloc_i4/include
STARPU_INCLUDE = /usr/local/PaStiX/starpu_i8/include
PASTIX_LIBS = \
/usr/local/PaStiX/hwloc_i4/lib64/libhwloc-15.dll \
/usr/local/PaStiX/pastix_i4/lib/libpastix.a \
/usr/local/PaStiX/pastix_i4/lib/libspm.a \
/usr/local/PaStiX/pastix_i4/lib/libpastix_kernels.a
LIBS = \
$(DIR1)/MT/src/spoolesMT.a \
$(DIR1)/spooles.a \
$(DIR2)/libarpack_x64.a \
/mingw64/lib/libopenblas.a \
$(MKL)/mkl_core.dll \
$(MKL)/mkl_intel_thread.dll \
$(MKL)/mkl_intel_lp64_dll.lib \
$(PASTIX_LIBS)
$(OBJDIR)/ccx_PASTIX.exe: $(OBJDIR) $(OCCXMAIN) $(OBJDIR)/ccx_2.17_MT.a $(LIBS)
./date.pl; $(CC) $(CFLAGS) -static-libgcc -static-libgfortran -static-libgcc -static-libstdc++ \
-Wl,-Bstatic -lm -lcrypt -lpthread -lwinpthread -lgomp -lquadmath -lstdc++ -ldl -c ccx_2.17.c;
$(FC) $(FFLAGS) -static-libgcc -static-libgfortran -static-libgcc -static-libstdc++ \
-Wl,-Bstatic -lm -lpthread -lstdc++ \
-Wl,-Bstatic,--whole-archive -lwinpthread -lgomp -lquadmath \
-Wl,--no-whole-archive -o $@ $(OCCXMAIN) $(OBJDIR)/ccx_2.17_MT.a $(LIBS) \
-L/mingw64/x86_64-w64-mingw32/lib -lopenblas -lmetis -lscotch -lscotcherrexit -lstdc++
$(OBJDIR)/ccx_2.17_MT.a: $(OCCXF) $(OCCXC)
ar vr $@ $?
$(OBJDIR):
mkdir -p $(OBJDIR)
clean:
rm -f $(OBJDIR)/*.o $(OBJDIR)/ccx_2.17.a $(OBJDIR)/ccx_PASTIX.exe
Additional information / errors:
Thanks for the resources! I am actually trying to build parsec with CUDA and use that to build PaStix. Have you been successful in doing that?
I have tried but without success.
Hi,
is this executable files are dynamically linking with MFront libraries?
Thank you,
Hi,
Yes, you need (-DCALCULIX_EXTERNAL_BEHAVIOURS_SUPPORT):
libCALCULIXBEHAVIOUR.dll
libCalculiXInterface.dll
libNHU2.dll
libstdc+±6.dll
libTFELException.dll
libTFELMaterial.dll
libTFELMath.dll
libTFELNUMODIS.dll
libTFELUtilities.dll
,
PS. There is progress with (original) ParSec and mingw64/cygwin:
Hi,
Important, for the best possible performance (for all cases: without parsec, with parsec), set:
Please try with this settings.
Hi,
I took simple test using Mazars material models, present CCX executable stopped to run with display error message as one fatal while reading input deck. Why this problem occurs even the number of constant(8) and depvar (3) are the same as given in example files.
Try using previous version of CCX (2.13) still wonât run, however it give a hint about error in usermaterials: anisotropic definition is not complete.
P.s what MFront version has been integrated and compiled since it has no DruckerPragerCap material models?
Thank you,
Hi,
If you have 8 constat try add temperature in new line (9 constant)
*User Material, constants=8
<YoungModulus>, <PoissonRatio>, <Ac>, <At>, <Bc>, <Bt>, <k>, <ed0>
<temp>
Thank you for such a guidance, it running well now but seem too long to finish. significant different compared to Modified MC material models, several seconds to minutes and itâs about 95% completed (I stopped the calculation). Not to be excited since Mazars/MFront is a brittle damage material models.
Thereâs an updated from MFront official webs, latest version has DruckerPragerCap material models. Look a great deal for both acuraccy & computational times comparing to Modified MC and Mazars.
Can you kindly share an updated version of MFront/CalculiX integration? so many thanks for times & effort.
Previous links have expired so I am posting new ones, there are two versions in the archive:
Both of these versions, as part of the tests, have the option to choose (can be added to: cmdStartup.bat from bConverged):
set PASTIX_ORDERING=0
0 - Scotch, 1 - Metis
and:
set PASTIX_SCHEDULER=1
0 - Static, 1 - StarPU, 3 - Sequential, 2 - parsec (not working yet)
my patch of pastix.c:
// Set best PaStiX parameters for CalculiX usage
const char* pastix_ordering = getenv("PASTIX_ORDERING");
if(atoi(pastix_ordering) == 1) {
iparm[IPARM_ORDERING] = PastixOrderMetis;
}
else {
iparm[IPARM_ORDERING] = PastixOrderScotch;
}
if( mode == AS ){
iparm[IPARM_SCHEDULER] = PastixSchedStatic;
}
else{
const char* pastix_scheduler = getenv("PASTIX_SCHEDULER");
if(atoi(pastix_scheduler) == 1) {
iparm[IPARM_SCHEDULER] = PastixSchedStarPU;
}
else if(atoi(pastix_scheduler) == 2) {
iparm[IPARM_SCHEDULER] = PastixSchedParsec;
}
else if(atoi(pastix_scheduler) == 3) {
iparm[IPARM_SCHEDULER] = PastixSchedSequential;
}
else {
iparm[IPARM_SCHEDULER] = PastixSchedStatic;
}
}
password: 4VERsW9m8h
My very simple benchmark:
Hey Rafa,
Have you seen any issues with the time it took to factorize the matrix in the problems that you have ran? I was trying to run my own PaStix on linux and it seems like it is taking a long time to factorize the matrix.
+-------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+-------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 24
Number of GPUs: 0
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048
Matrix type: General
Arithmetic: Float
Format: CSC
N: 1021086
nnz: 42850334
+-------------------------------------------------+
Ordering step :
Ordering method is: Scotch
Time to compute ordering: 6.7668
+-------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: 940155805
Fill-in of L: 21.940455
Time to compute symbol matrix: 0.4936
+-------------------------------------------------+
Reordering step:
Split level: 0
Stoping criteria: -1
Time for reordering: 1.1074
+-------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L: 1880311610
Fill-in: 43.880909
Number of operations in full-rank LU : 5.52 TFlops
Prediction:
Model: AMD 6180 MKL
Time to factorize: 113.1075
Time for analyze: 0.1228
+-------------------------------------------------+
Factorization step:
Factorization used: LU
Time to initialize internal csc: 0.8904
Time to initialize coeftab: 0.7960
Time to factorize: 100.6078 (56.19 GFlop/s)
Number of operations: 5.52 TFlops
Number of static pivots: 0
Time to solve: 10.5152
- iteration 1 :
total iteration time 6.35
error 0.00026925
- iteration 2 :
total iteration time 6.71
error 1.8872e-06
- iteration 3 :
total iteration time 7.77
error 4.9107e-08
- iteration 4 :
total iteration time 7.04
error 3.1283e-10
- iteration 5 :
total iteration time 7.2
error 1.4821e-12
- iteration 6 :
total iteration time 6.77
error 2.6222e-15
Time for refinement: 42.4008
________________________________________
CSC Conversion Time: 0.409282
Init Time: 8.907896
Factorize Time: 102.307430
Solve Time: 53.063267
Clean up Time: 0.000001
---------------------------------
Sum: 164.687876
Total PaStiX Time: 164.687876
CCX without PaStiX Time: 19.732016
Share of PaStiX Time: 0.893005
Total Time: 184.419891
Reusability: 0 : 1
________________________________________
Please take a look:
My performance of factorization: 130.29 GFlop/s if I have " set OPENBLAS_NUM_THREADS=1"
My performance of factorization: 19.61 GFlop/s if I have " set OPENBLAS_NUM_THREADS=8"
If you donât have the âOPENBLAS_NUM_THREADSâ at all, it takes the value from âOMP_NUM_THREADSâ.
Hello,
Thanks for the help! I was able to improve the factorization performance. Do you mind sharing your INP file for the model that you ran? I want to test it on my linux version
Not reusing csc.
+-------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+-------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 24
Number of GPUs: 0
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048
Matrix type: General
Arithmetic: Float
Format: CSC
N: 1021086
nnz: 42850334
+-------------------------------------------------+
Ordering step :
Ordering method is: Scotch
Time to compute ordering: 6.8583
+-------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: 940155805
Fill-in of L: 21.940455
Time to compute symbol matrix: 0.5008
+-------------------------------------------------+
Reordering step:
Split level: 0
Stoping criteria: -1
Time for reordering: 1.0693
+-------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L: 1880311610
Fill-in: 43.880909
Number of operations in full-rank LU : 5.52 TFlops
Prediction:
Model: AMD 6180 MKL
Time to factorize: 113.1075
Time for analyze: 0.1085
+-------------------------------------------------+
Factorization step:
Factorization used: LU
Time to initialize internal csc: 1.0117
Time to initialize coeftab: 0.7879
Time to factorize: 10.5443 (536.11 GFlop/s)
Number of operations: 5.52 TFlops
Number of static pivots: 0
Time to solve: 0.8367
- iteration 1 :
total iteration time 0.137
error 0.00040009
- iteration 2 :
total iteration time 0.129
error 3.3839e-06
- iteration 3 :
total iteration time 0.13
error 6.304e-08
- iteration 4 :
total iteration time 0.134
error 3.3557e-10
- iteration 5 :
total iteration time 0.162
error 1.4103e-12
- iteration 6 :
total iteration time 0.141
error 3.5867e-15
Time for refinement: 1.1423
________________________________________
CSC Conversion Time: 0.398161
Init Time: 8.956861
Factorize Time: 12.361112
Solve Time: 2.058843
Clean up Time: 0.000000
---------------------------------
Sum: 23.774978
Total PaStiX Time: 23.774978
CCX without PaStiX Time: 19.864284
Share of PaStiX Time: 0.544807
Total Time: 43.639262
Reusability: 0 : 1
I managed to compile a basic version of PaRSEC in the mingw64 environment, a package for mingw64 should be available soon.
Info from George Bosilca (Bitbucket):
I was not aware that Mathieu maintained his own fork of parsec. I see that he last updated 04/2019, so that version is clearly quite old, and I do expect to have some inconsistencies with the current PaRSEC API. But, you should give it a try, as long as PaSTIX only uses JDFs it remains highly possible that it will work. When you have some progress on this please let me know, I would be curious to know.
Hi @rafal.brzegowy, could you please upload again the Pastix version of CalculiX executables? Looks like the file is unnavailable now.
Best regards, and thanks in advance.