Using the feacluster.com ccx install script with Intel HPC Toolkit v2024.1.0

arpack-ng can be built with 64-bit integers (use environment variable INTERFACE64=1 when calling configure).

1 Like

@feacluster static analysis with the same mesh runs without problems.
@rsmith I did a new build with arpack-ng using these options for cmake:

cmake -D EXAMPLES=ON -D MPI=OFF -D BUILD_SHARED_LIBS=ON -D BLA_VENDOR=Intel10_64ilp -D INTERFACE64=ON -D ITF64SUFFIX="ILP64" -D CMAKE_INSTALL_PREFIX=/opt/arpack-ng-3.9.1 ..

But this build crashes already in the “Frequency analysis was selected” step.

I was able to get the arpack-ng to build along with openblas with the commands at the bottom.

I tested a large model for frequency analysis. Initially it crashed. But after I set ulimit -s unlimited it ran ok. I also tested the old arpack96 and it also ran fine as long as the ulimit -s unlimited was set… Are you running a larger model than this:

 number of equations
 1219161
 number of nonzero lower triangular matrix elements
 45468750

OpenBLAS

 make FC=ifort INTERFACE64=1 BINARY=64 USE_THREAD=0 USE_LOCKING=0 PREFIX=/home/feacluster/OpenBLAS/build TARGET=SAPPHIRERAPIDS -j 2

arpack-ng

INTERFACE64="1" CXX=icx CC=icx F77=ifort FC=ifort ./configure --with-blas=/home/feacluster/arpack-ng/libopenblas.a --with-lapack=/home/feacluster/arpack-ng/libopenblas.a --enable-icb --enable-static --disable-shared --prefix=/home/feacluster/arpack-ng/build

@feacluster your suggestion to set “ulimit -s unlimited” was the solution to the problem. My build does not include OpenBLAS because arpack-ng can be compiled with the Intel-MKL. Since the Intel-MKL is already included in the HPC Toolkit docker container and arpack-ng detects it correctly I will stick to this solution. Here is the output of a successful frequency analysis run which allocated around 90GB of memory:

************************************************************

CalculiX Version 2.21 i8, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Mon May  6 07:51:57 UTC 2024
 
  The numbers below are estimated upper bounds
 
  number of:
 
   nodes:                1357408
   elements:                1303101
   one-dimensional elements:                      0
   two-dimensional elements:                      0
   integration points per element:                      8
   degrees of freedom per node:                      3
   layers per element:                      1
 
   distributed facial loads:                      0
   distributed volumetric loads:                      0
   concentrated loads:                      0
   single point constraints:                      0
   multiple point constraints:                      1
   terms in all multiple point constraints:                      1
   tie constraints:                      0
   dependent nodes tied by cyclic constraints:                      0
   dependent nodes in pre-tension constraints:                      0
 
   sets:                      2
   terms in all sets:                3909303
 
   materials:                      1
   constants per material and temperature:                      2
   temperature points per material:                      1
   plastic data points per material:                      0
 
   orientations:                      0
   amplitudes:                      1
   data points in all amplitudes:                      1
   print requests:                      0
   transformations:                      0
   property cards:                      0
 
 
 STEP                      1
 
 Frequency analysis was selected
 
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 8 cpu(s) for setting up the structure of the matrix.
 number of equations
 4072224
 number of nonzero lower triangular matrix elements
 158473776

 Using up to 0 cpu(s) for setting up the structure of the matrix.
 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the symmetric stiffness/mass contributions.

 Factoring the system of equations using the symmetric pardiso solver
 number of threads = 8

 Calculating the eigenvalues and the eigenmodes

U^T*M*U=1.000000 for eigenmode 1
U^T*M*U=1.000000 for eigenmode 2
U^T*M*U=1.000000 for eigenmode 3
U^T*M*U=1.000000 for eigenmode 4
U^T*M*U=1.000000 for eigenmode 5
U^T*M*U=1.000000 for eigenmode 6
U^T*M*U=1.000000 for eigenmode 7
U^T*M*U=1.000000 for eigenmode 8
U^T*M*U=1.000000 for eigenmode 9
U^T*M*U=1.000000 for eigenmode 10
U^T*M*U=1.000000 for eigenmode 11
U^T*M*U=1.000000 for eigenmode 12
U^T*M*U=1.000000 for eigenmode 13
U^T*M*U=1.000000 for eigenmode 14
U^T*M*U=1.000000 for eigenmode 15
U^T*M*U=1.000000 for eigenmode 16
U^T*M*U=1.000000 for eigenmode 17
U^T*M*U=1.000000 for eigenmode 18
U^T*M*U=1.000000 for eigenmode 19
U^T*M*U=1.000000 for eigenmode 20
U^T*M*U=1.000000 for eigenmode 21
U^T*M*U=1.000000 for eigenmode 22
U^T*M*U=1.000000 for eigenmode 23
U^T*M*U=1.000000 for eigenmode 24
U^T*M*U=1.000000 for eigenmode 25
U^T*M*U=1.000000 for eigenmode 26
U^T*M*U=1.000000 for eigenmode 27
U^T*M*U=1.000000 for eigenmode 28
U^T*M*U=1.000000 for eigenmode 29
U^T*M*U=1.000000 for eigenmode 30
U^T*M*U=1.000000 for eigenmode 31
U^T*M*U=1.000000 for eigenmode 32
U^T*M*U=1.000000 for eigenmode 33
U^T*M*U=1.000000 for eigenmode 34
U^T*M*U=1.000000 for eigenmode 35
U^T*M*U=1.000000 for eigenmode 36
U^T*M*U=1.000000 for eigenmode 37
U^T*M*U=1.000000 for eigenmode 38
U^T*M*U=1.000000 for eigenmode 39
U^T*M*U=1.000000 for eigenmode 40
 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the stress calculation.

 
 Job finished
 
________________________________________

Total CalculiX Time: 2427.739749
________________________________________

Now I am curious what will happen with a job which does not fit in the memory :smile:. I hope that it is possible to run PARDISO also out-of-core. Do you have any experience with the memory management of PARDISO?

1 Like

Have not tried the out of core feature. I’ve seen some threads here about it though.

I think my MPI version splits the memory across the nodes.

Note that if you build OpenBLAS for a single target, it might not work properly on a machine with another type of CPU.

If you use DYNAMIC_ARCH=1, it will build the specialized code for all CPU variants, and choose the correct one at runtime. This build will take longer and result in a bigger library but it is probably the one you want to use in case you want to distribute the resulting binary.

1 Like

To get an idea how you created the ccx mpi version I reran your build-script in my vm. Is it enough to adapt the makefiles of ccx and arpack and then just compile everything? I am asking because your script downloads a archive with the name pardiso_mpi.tar.gz. The content of this archive looks like a copy of the source files of ccx. Did you apply any patches to the sources of ccx? Thanks for clarification!

Yes it is a copy of all the source files with the patches already made. Hopefully just editing the Makefile and arpack will be enough…

Ah o.k., so the mpi version is your own development. The compilation of arpack is successful when I choose option (3) in your script but the build of the ccx 2.18 fails with this message:

mpiicc -std=gnu99 -w -O3 -qopenmp -DARCH="Linux" -DMATRIXSTORAGE -DUSE_MT=1 -DARPACK -DPARDISO_MPI -DMKL_ILP64 -DLONGLONG -c ccx_2.18.c
/opt/intel/oneapi/mpi/2021.12/bin/mpiicx: 1: eval: icc: not found
make: *** [Makefile:10: ccx_2.18.o] Error 127

I tried to fix this by changing the c compiler selection line in the Makefile from:

CC=mpiicc

to:

CC=mpiicx

But the result is then:

mpiicx -std=gnu99 -w -O3 -qopenmp -DARCH="Linux" -DMATRIXSTORAGE -DUSE_MT=1 -DARPACK -DPARDISO_MPI -DMKL_ILP64 -DLONGLONG -c ccx_2.18.c
In file included from ccx_2.18.c:37:
./CalculiX.h:1405:6: error: conflicting types for 'dgesv_'
 1405 | void FORTRAN(dgesv,(ITG *nteq,ITG *nhrs,double *ac,ITG *lda,ITG *ipiv,
      |      ^
./CalculiX.h:26:22: note: expanded from macro 'FORTRAN'
   26 | #define FORTRAN(A,B) A##_  B
      |                      ^
<scratch space>:101:1: note: expanded from here
  101 | dgesv_
      | ^
/opt/intel/oneapi/mkl/2024.1/include/mkl_lapack.h:11922:6: note: previous declaration is here
 11922 | void dgesv_( const MKL_INT* n, const MKL_INT* nrhs, double* a,
       |      ^
In file included from ccx_2.18.c:37:
./CalculiX.h:1408:6: error: conflicting types for 'dgetrs_'
 1408 | void FORTRAN(dgetrs,(char *trans,ITG *nteq,ITG *nrhs,double *ac,ITG *lda,
      |      ^
./CalculiX.h:26:22: note: expanded from macro 'FORTRAN'
   26 | #define FORTRAN(A,B) A##_  B
      |                      ^
<scratch space>:102:1: note: expanded from here
  102 | dgetrs_
      | ^
/opt/intel/oneapi/mkl/2024.1/include/mkl_lapack.h:12028:6: note: previous declaration is here
 12028 | void dgetrs_( const char* trans, const MKL_INT* n, const MKL_INT* nrhs,
       |      ^
ccx_2.18.c:48:3: error: call to undeclared function 'mpi_calculix'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   48 |   mpi_calculix();
      |   ^
ccx_2.18.c:1892:6: error: conflicting types for 'mpi_calculix'
 1892 | void mpi_calculix() {
      |      ^
ccx_2.18.c:48:3: note: previous implicit declaration is here
   48 |   mpi_calculix();
      |   ^
ccx_2.18.c:1897:26: error: expected expression
 1897 |     mpi_stat = MPI_Init( '', 1 );
      |                          ^
5 errors generated.
make: *** [Makefile:10: ccx_2.18.o] Error 1

I guess that something is wrong with the cflags in the first line of the makefile?

1 Like

Try downloading the install script again and running it from a fresh empty directory. I had to make some minor changes to the 2.18 code in order for 2024 mpiicx and mpiifort to compile it ok.

1 Like

I tested the latest version of your script but it stops during the compilation of ccx. The reason for this is the path to the libarpack_INTEL.a library in the ccx Makefile. You defined an absolute path and on my system the path to the library is different. I changed the path in the Makefile but then the compilation stops with this message:

Can't open ccx_2.18step.c: No such file or directory at ./date.pl line 18.

Good catch about the hardcoded path. I think that error can be ignored. What happens when you type make in the src folder? Can you send any output after that ccx_2.18step.c line?

1 Like

Here is the full output when I run make in the ccx directory:

user@vm06:~/feacluster/CalculiX/ccx_2.18/src$ make
mpiicx -w -D_POSIX_C_SOURCE=199309L -O2 -std=c90 -fopenmp -DARCH="Linux" -DINTSIZE64 -DMKL_ILP64 -DPARDISO_MPI -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1 -c frd.c
ar vr ccx_2.18_MPI.a frd.o
r - frd.o
./date.pl; mpiicx -w -D_POSIX_C_SOURCE=199309L -O2 -std=c90 -fopenmp -DARCH="Linux" -DINTSIZE64 -DMKL_ILP64 -DPARDISO_MPI -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1 -c ccx_2.18.c; mpiifort -qopenmp -nofor-main -o ccx_2.18_MPI ccx_2.18.o ccx_2.18_MPI.a /home/user/feacluster/ARPACK/libarpack_INTEL.a -lpthread -lm -L/opt/intel/oneapi/mkl/2024.1/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64
Can't open ccx_2.18step.c: No such file or directory at ./date.pl line 18.
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
user@vm06:~/feacluster/CalculiX/ccx_2.18/src$

Are you sure it didn’t create the executable? After typing make, what is the output of

ls -lrt | tail -5

Oh, yes you were right. I have found the executable. Next I will try to create the mpi executable in my apptainer container. Thanks for your help.

I would first verify it works on your two hosts before trying the container approach. So install the executable on the second host and setup some shared file system between the two hosts. Then try running your large model example.

Because I don’t have a local installation of the IntelMPI on the hosts, I have done the build inside the container. I can now run the mpi version of ccx on a single host by using these commands:

ulimit -s unlimited
export OMP_NUM_THREADS=2
mpirun -np 4 /path/to/ccx_2.18_MPI -i ccx_input_file

For the multihost setup with the apptainer container I have found some information in the documentation on the Intel website. When I have time, I will try to do this setup on my two hosts.

2 Likes