arpack-ng can be built with 64-bit integers (use environment variable INTERFACE64=1
when calling configure
).
@feacluster static analysis with the same mesh runs without problems.
@rsmith I did a new build with arpack-ng using these options for cmake:
cmake -D EXAMPLES=ON -D MPI=OFF -D BUILD_SHARED_LIBS=ON -D BLA_VENDOR=Intel10_64ilp -D INTERFACE64=ON -D ITF64SUFFIX="ILP64" -D CMAKE_INSTALL_PREFIX=/opt/arpack-ng-3.9.1 ..
But this build crashes already in the âFrequency analysis was selectedâ step.
I was able to get the arpack-ng to build along with openblas with the commands at the bottom.
I tested a large model for frequency analysis. Initially it crashed. But after I set ulimit -s unlimited it ran ok. I also tested the old arpack96 and it also ran fine as long as the ulimit -s unlimited was set⌠Are you running a larger model than this:
number of equations
1219161
number of nonzero lower triangular matrix elements
45468750
OpenBLAS
make FC=ifort INTERFACE64=1 BINARY=64 USE_THREAD=0 USE_LOCKING=0 PREFIX=/home/feacluster/OpenBLAS/build TARGET=SAPPHIRERAPIDS -j 2
arpack-ng
INTERFACE64="1" CXX=icx CC=icx F77=ifort FC=ifort ./configure --with-blas=/home/feacluster/arpack-ng/libopenblas.a --with-lapack=/home/feacluster/arpack-ng/libopenblas.a --enable-icb --enable-static --disable-shared --prefix=/home/feacluster/arpack-ng/build
@feacluster your suggestion to set âulimit -s unlimitedâ was the solution to the problem. My build does not include OpenBLAS because arpack-ng can be compiled with the Intel-MKL. Since the Intel-MKL is already included in the HPC Toolkit docker container and arpack-ng detects it correctly I will stick to this solution. Here is the output of a successful frequency analysis run which allocated around 90GB of memory:
************************************************************
CalculiX Version 2.21 i8, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm
************************************************************
You are using an executable made on Mon May 6 07:51:57 UTC 2024
The numbers below are estimated upper bounds
number of:
nodes: 1357408
elements: 1303101
one-dimensional elements: 0
two-dimensional elements: 0
integration points per element: 8
degrees of freedom per node: 3
layers per element: 1
distributed facial loads: 0
distributed volumetric loads: 0
concentrated loads: 0
single point constraints: 0
multiple point constraints: 1
terms in all multiple point constraints: 1
tie constraints: 0
dependent nodes tied by cyclic constraints: 0
dependent nodes in pre-tension constraints: 0
sets: 2
terms in all sets: 3909303
materials: 1
constants per material and temperature: 2
temperature points per material: 1
plastic data points per material: 0
orientations: 0
amplitudes: 1
data points in all amplitudes: 1
print requests: 0
transformations: 0
property cards: 0
STEP 1
Frequency analysis was selected
Decascading the MPC's
Determining the structure of the matrix:
Using up to 8 cpu(s) for setting up the structure of the matrix.
number of equations
4072224
number of nonzero lower triangular matrix elements
158473776
Using up to 0 cpu(s) for setting up the structure of the matrix.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the symmetric stiffness/mass contributions.
Factoring the system of equations using the symmetric pardiso solver
number of threads = 8
Calculating the eigenvalues and the eigenmodes
U^T*M*U=1.000000 for eigenmode 1
U^T*M*U=1.000000 for eigenmode 2
U^T*M*U=1.000000 for eigenmode 3
U^T*M*U=1.000000 for eigenmode 4
U^T*M*U=1.000000 for eigenmode 5
U^T*M*U=1.000000 for eigenmode 6
U^T*M*U=1.000000 for eigenmode 7
U^T*M*U=1.000000 for eigenmode 8
U^T*M*U=1.000000 for eigenmode 9
U^T*M*U=1.000000 for eigenmode 10
U^T*M*U=1.000000 for eigenmode 11
U^T*M*U=1.000000 for eigenmode 12
U^T*M*U=1.000000 for eigenmode 13
U^T*M*U=1.000000 for eigenmode 14
U^T*M*U=1.000000 for eigenmode 15
U^T*M*U=1.000000 for eigenmode 16
U^T*M*U=1.000000 for eigenmode 17
U^T*M*U=1.000000 for eigenmode 18
U^T*M*U=1.000000 for eigenmode 19
U^T*M*U=1.000000 for eigenmode 20
U^T*M*U=1.000000 for eigenmode 21
U^T*M*U=1.000000 for eigenmode 22
U^T*M*U=1.000000 for eigenmode 23
U^T*M*U=1.000000 for eigenmode 24
U^T*M*U=1.000000 for eigenmode 25
U^T*M*U=1.000000 for eigenmode 26
U^T*M*U=1.000000 for eigenmode 27
U^T*M*U=1.000000 for eigenmode 28
U^T*M*U=1.000000 for eigenmode 29
U^T*M*U=1.000000 for eigenmode 30
U^T*M*U=1.000000 for eigenmode 31
U^T*M*U=1.000000 for eigenmode 32
U^T*M*U=1.000000 for eigenmode 33
U^T*M*U=1.000000 for eigenmode 34
U^T*M*U=1.000000 for eigenmode 35
U^T*M*U=1.000000 for eigenmode 36
U^T*M*U=1.000000 for eigenmode 37
U^T*M*U=1.000000 for eigenmode 38
U^T*M*U=1.000000 for eigenmode 39
U^T*M*U=1.000000 for eigenmode 40
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Using up to 8 cpu(s) for the stress calculation.
Job finished
________________________________________
Total CalculiX Time: 2427.739749
________________________________________
Now I am curious what will happen with a job which does not fit in the memory . I hope that it is possible to run PARDISO also out-of-core. Do you have any experience with the memory management of PARDISO?
Have not tried the out of core feature. Iâve seen some threads here about it though.
I think my MPI version splits the memory across the nodes.
Note that if you build OpenBLAS for a single target, it might not work properly on a machine with another type of CPU.
If you use DYNAMIC_ARCH=1
, it will build the specialized code for all CPU variants, and choose the correct one at runtime. This build will take longer and result in a bigger library but it is probably the one you want to use in case you want to distribute the resulting binary.
To get an idea how you created the ccx mpi version I reran your build-script in my vm. Is it enough to adapt the makefiles of ccx and arpack and then just compile everything? I am asking because your script downloads a archive with the name pardiso_mpi.tar.gz. The content of this archive looks like a copy of the source files of ccx. Did you apply any patches to the sources of ccx? Thanks for clarification!
Yes it is a copy of all the source files with the patches already made. Hopefully just editing the Makefile and arpack will be enoughâŚ
Ah o.k., so the mpi version is your own development. The compilation of arpack is successful when I choose option (3) in your script but the build of the ccx 2.18 fails with this message:
mpiicc -std=gnu99 -w -O3 -qopenmp -DARCH="Linux" -DMATRIXSTORAGE -DUSE_MT=1 -DARPACK -DPARDISO_MPI -DMKL_ILP64 -DLONGLONG -c ccx_2.18.c
/opt/intel/oneapi/mpi/2021.12/bin/mpiicx: 1: eval: icc: not found
make: *** [Makefile:10: ccx_2.18.o] Error 127
I tried to fix this by changing the c compiler selection line in the Makefile from:
CC=mpiicc
to:
CC=mpiicx
But the result is then:
mpiicx -std=gnu99 -w -O3 -qopenmp -DARCH="Linux" -DMATRIXSTORAGE -DUSE_MT=1 -DARPACK -DPARDISO_MPI -DMKL_ILP64 -DLONGLONG -c ccx_2.18.c
In file included from ccx_2.18.c:37:
./CalculiX.h:1405:6: error: conflicting types for 'dgesv_'
1405 | void FORTRAN(dgesv,(ITG *nteq,ITG *nhrs,double *ac,ITG *lda,ITG *ipiv,
| ^
./CalculiX.h:26:22: note: expanded from macro 'FORTRAN'
26 | #define FORTRAN(A,B) A##_ B
| ^
<scratch space>:101:1: note: expanded from here
101 | dgesv_
| ^
/opt/intel/oneapi/mkl/2024.1/include/mkl_lapack.h:11922:6: note: previous declaration is here
11922 | void dgesv_( const MKL_INT* n, const MKL_INT* nrhs, double* a,
| ^
In file included from ccx_2.18.c:37:
./CalculiX.h:1408:6: error: conflicting types for 'dgetrs_'
1408 | void FORTRAN(dgetrs,(char *trans,ITG *nteq,ITG *nrhs,double *ac,ITG *lda,
| ^
./CalculiX.h:26:22: note: expanded from macro 'FORTRAN'
26 | #define FORTRAN(A,B) A##_ B
| ^
<scratch space>:102:1: note: expanded from here
102 | dgetrs_
| ^
/opt/intel/oneapi/mkl/2024.1/include/mkl_lapack.h:12028:6: note: previous declaration is here
12028 | void dgetrs_( const char* trans, const MKL_INT* n, const MKL_INT* nrhs,
| ^
ccx_2.18.c:48:3: error: call to undeclared function 'mpi_calculix'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
48 | mpi_calculix();
| ^
ccx_2.18.c:1892:6: error: conflicting types for 'mpi_calculix'
1892 | void mpi_calculix() {
| ^
ccx_2.18.c:48:3: note: previous implicit declaration is here
48 | mpi_calculix();
| ^
ccx_2.18.c:1897:26: error: expected expression
1897 | mpi_stat = MPI_Init( '', 1 );
| ^
5 errors generated.
make: *** [Makefile:10: ccx_2.18.o] Error 1
I guess that something is wrong with the cflags in the first line of the makefile?
Try downloading the install script again and running it from a fresh empty directory. I had to make some minor changes to the 2.18 code in order for 2024 mpiicx and mpiifort to compile it ok.
I tested the latest version of your script but it stops during the compilation of ccx. The reason for this is the path to the libarpack_INTEL.a library in the ccx Makefile. You defined an absolute path and on my system the path to the library is different. I changed the path in the Makefile but then the compilation stops with this message:
Can't open ccx_2.18step.c: No such file or directory at ./date.pl line 18.
Good catch about the hardcoded path. I think that error can be ignored. What happens when you type make in the src folder? Can you send any output after that ccx_2.18step.c line?
Here is the full output when I run make in the ccx directory:
user@vm06:~/feacluster/CalculiX/ccx_2.18/src$ make
mpiicx -w -D_POSIX_C_SOURCE=199309L -O2 -std=c90 -fopenmp -DARCH="Linux" -DINTSIZE64 -DMKL_ILP64 -DPARDISO_MPI -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1 -c frd.c
ar vr ccx_2.18_MPI.a frd.o
r - frd.o
./date.pl; mpiicx -w -D_POSIX_C_SOURCE=199309L -O2 -std=c90 -fopenmp -DARCH="Linux" -DINTSIZE64 -DMKL_ILP64 -DPARDISO_MPI -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1 -c ccx_2.18.c; mpiifort -qopenmp -nofor-main -o ccx_2.18_MPI ccx_2.18.o ccx_2.18_MPI.a /home/user/feacluster/ARPACK/libarpack_INTEL.a -lpthread -lm -L/opt/intel/oneapi/mkl/2024.1/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64
Can't open ccx_2.18step.c: No such file or directory at ./date.pl line 18.
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
user@vm06:~/feacluster/CalculiX/ccx_2.18/src$
Are you sure it didnât create the executable? After typing make, what is the output of
ls -lrt | tail -5
Oh, yes you were right. I have found the executable. Next I will try to create the mpi executable in my apptainer container. Thanks for your help.
I would first verify it works on your two hosts before trying the container approach. So install the executable on the second host and setup some shared file system between the two hosts. Then try running your large model example.
Because I donât have a local installation of the IntelMPI on the hosts, I have done the build inside the container. I can now run the mpi version of ccx on a single host by using these commands:
ulimit -s unlimited
export OMP_NUM_THREADS=2
mpirun -np 4 /path/to/ccx_2.18_MPI -i ccx_input_file
For the multihost setup with the apptainer container I have found some information in the documentation on the Intel website. When I have time, I will try to do this setup on my two hosts.