Using the feacluster.com ccx install script with Intel HPC Toolkit v2024.1.0

Benny_Hill · April 18, 2024, 7:01pm

Hello Calculix Community,

I tried to run the CCX perl install script which I found here https://www.feacluster.com/install/install to compile ccx 2.21 with the PARDISO solver. Unfortunately the script stops because it cannot find the icc compiler which was replaced by icx in the latest version of the Intel HPC Toolkit (v2024.1.0). The same happens for the ccx v2.18 with mpi build option.

Was anyone successful compiling ccx 2.21 including PARDISO and MPI with the Intel HPC Toolkit v2024.1.0? @feacluster do you have any plans to update your fantastic build script or can you give some tips what needs to be changed to switch from icc to icx?

Thanks for helping and best regards

Benny

feacluster · April 25, 2024, 12:27am

Unfortunately, I haven’t done anything with that project in years as there has not been much interest. Are you needing the MPI version of Pardiso or just the regular Pardiso that runs on a single machine ( with many cores )?

Benny_Hill · April 26, 2024, 9:29pm

Since I have two identical machines (both running fedora 39) the MPI version is also interesting. But as a start an up to date build for running on a single machine will be easier to create?

feacluster · April 30, 2024, 1:35am

I spent some time and tried to update the script for 2024 Intel compiler. The script now works, but calculix will hang if you run any frequency/modal calculations. There is some problem in the ARPACK library even though I compiled it with the same flags… Will need some deeper investigation to find the root cause .

Benny_Hill · April 30, 2024, 9:15am

Thanks for working on this topic. In case you need someone to test your new script please tell me. I have a virtual machine running Ubuntu 22.04 and the HPC Toolkit available.

rsmith · April 30, 2024, 4:42pm

It could also be your BLAS library, since ARPACK uses that.

Recently I had quite some trouble getting PaStiX to work without CUDA. The problem in that case was that I had to build OpenBLAS without threading but with locking.

However, when I tried the same OpenBLAS configuration for multithreaded ccx with just SPOOLES/ARPACK I got segfaults. There I had to build without threads and without locking, if memory serves.

feacluster · April 30, 2024, 7:24pm

Give it a try now. I reverted back to using ifort instead of ifx. Seems ifort is still supported with the 2024 Intel compiler. The MPI one won’t work, but let’s see if the normal Pardiso one installs and runs ok…

Benny_Hill · April 30, 2024, 9:50pm

The updated script is working. I was able to compile the ccx_2.21_MT executable (ccx without mpi). But when I run the tests at the end of the script ccx hangs at testcase beam10psmooth.rfn and some of the tests do not deliver the same results as the reference values. In addition when I copy the executable from the virtual machine to the host, I am not able to run a job since libmkl_intel_ilp64.so.2 is missing. Because I don’t have the hpc toolkit on the host this error seems to be normal. To solve this problem, is it possible to compile ccx_2.21 with static libs like your ccx_2.19 build?

feacluster · May 1, 2024, 3:00am

For the static build, might be easier to just copy the /opt/intel/oneapi folder to the other host?

In the compare script in the test folder, try adding this line at the top where you see the other ones defined for the .rfn files:

        if [ $i = beam10psmooth.rfn.inp ]
        then
            continue
        fi

I don’t believe the rfn files are meant to be run by themeslves.

There will be probably be a dozen test cases with different results. See this post for why that happens:

feacluster · May 1, 2024, 3:12am

Where are you getting your ARPACK and BLAS from? I am using the one from 1996 called, “arpack96.tar.gz”… It includes BLAS when you untar it.

rsmith · May 1, 2024, 3:29am

I’ve been using arpack-ng and OpenBLAS. OpenBLAS supplies both BLAS and LAPACK(E).

My build scripts (for UNIX-like systems) can be found here.
I should add that I’m using the FreeBSD port patches for SPOOLES, which includes the “large input” patch.

Benny_Hill · May 1, 2024, 11:00am

Yes, I can copy the complete folder it is 14GB . Your install script only compiles arpack and calculix? Spooles is not needed because it is replaced by the pardiso solver? So I would guess that all the information to do the build with the intel compilers can be found in the file “Makefile” for calculix and “ARmake.inc” for arpack?

feacluster · May 1, 2024, 3:03pm

Yes, that is correct. Here is the Makefile I used to build the 2.19 version with static libraries:

# for libiomp, first link dynamically then do ldd to find path to it. Static will be in the same folder

#CFLAGS = -march=cascadelake -std=gnu99 -w -O2 -std=gnu99 -fopenmp -DARCH="Linux" -DMKL_ILP64 -I"${MKLROOT}/include" -DPARDISO -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1
#FFLAGS = -march=cascadelake -w -O2 -fopenmp -i8
CFLAGS = -w -O2 -std=gnu99 -fopenmp -DARCH="Linux" -DMKL_ILP64 -DPARDISO -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1
FFLAGS = -w -O2 -fopenmp -i8

CC=icc
FC=ifort

.c.o :
        $(CC) $(CFLAGS) -c $<
.f.o :
        $(FC) $(FFLAGS) -c $<

include Makefile.inc

SCCXMAIN = ccx_2.19.c

OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)

LIBS = \
       ../../../ARPACK/libarpack_INTEL.a \
       -lpthread -lm

ccx_2.19_MT: $(OCCXMAIN) ccx_2.19_MT.a  $(LIBS)
        ./date.pl; $(CC) $(CFLAGS) -c ccx_2.19.c; $(FC) -nofor-main -o $@ $(OCCXMAIN) ccx_2.19_MT.a $(LIBS) -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a
${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a
/glob/development-tools/versions/oneapi/2022.1.1/oneapi/compiler/2022.0.1/linux/compiler/lib/intel64_lin/libiomp5.a -Wl,--end-group -lpthread -lm -ldl
ccx_2.19_MT.a: $(OCCXF) $(OCCXC)
        ar vr $@ $?

feacluster · May 1, 2024, 3:11pm

Interesting, will look into these later when I get some time. But out of curiosity why are you trying to use Pastix when Pardiso has been free for many years now? I tried using it on linux but found it slow as it factorizes the matrix twice for some reason. That was some years ago. Maybe it has changed now?

rsmith · May 1, 2024, 4:50pm

Intel paradiso is available only for ms-windows and linux, not FreeBSD.

And Panua paradiso requires you to login and get a license. And it says: “Panua-Pardiso licenses are available for Linux, Windows, and Mac”.

You mean like this:

  Factorization step:
    Factorization used: LU
    Time to initialize internal csc:      0.0840 
    Time to initialize coeftab:           0.0189 
    Time to factorize:                    0.0916  (134.68 GFlop/s)
    Number of operations:                      12.34 GFlops
    Number of static pivots:                     0
    Time to solve:                        0.0120 
    - iteration 1 :
         total iteration time                   0.0147 
         error                                  3.9474e-07
    - iteration 2 :
         total iteration time                   0.0148 
         error                                  6.5724e-10
    - iteration 3 :
         total iteration time                   0.0149 
         error                                  7.3421e-13
    Time for refinement:                  0.0503 
    - iteration 1 :
         total iteration time                   0.0148 
         error                                  2.0178e-16
    Time for refinement:                  0.0207

I agree that it looks like something is done twice. But that seems to apply to “refinement”, because it happens after “solve”.

CalculiX uses a patched version of PaStiX that’s from 2017/2018 IIRC. So it won’t have changed much. People have been looking at using the upstream version of PaStiX with CalculiX unchanged. But that is a work in progress.

But now I’m curious. When I can find the time I’ll have a look at the source code to see if I can understand what’s going on.

feacluster · May 1, 2024, 5:45pm

Ah ok. I just assumed Intel oneapi works on all linux flavors . Maybe this can help with the FreeBSD:

https://briancallahan.net/blog/

Benny_Hill · May 2, 2024, 10:44am

Yesterday I created an apptainer container based on the official hpc-toolkit docker files. This makes everything more portable and repeatable. I used the makefiles for calculix and arpack which I found in the temporary directory of your build-script. The compilation in this container is successful but when I run an analysis using more than one cpu, the run hangs (or crashes):

************************************************************

CalculiX Version 2.21 i8, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Wed May  1 19:03:38 UTC 2024
 
  The numbers below are estimated upper bounds
 
  number of:
 
   nodes:                   2638
   elements:                   2005
   one-dimensional elements:                      0
   two-dimensional elements:                      0
   integration points per element:                      4
   degrees of freedom per node:                      3
   layers per element:                      1
 
   distributed facial loads:                      0
   distributed volumetric loads:                      0
   concentrated loads:                      1
   single point constraints:                    926
   multiple point constraints:                      4
   terms in all multiple point constraints:                   1114
   tie constraints:                      0
   dependent nodes tied by cyclic constraints:                      0
   dependent nodes in pre-tension constraints:                      0
 
   sets:                     23
   terms in all sets:                   4357
 
   materials:                      1
   constants per material and temperature:                      2
   temperature points per material:                      1
   plastic data points per material:                      0
 
   orientations:                      0
   amplitudes:                      6
   data points in all amplitudes:                      6
   print requests:                      0
   transformations:                      0
   property cards:                      0
 
 
 STEP                      1
 
 Static analysis was selected
 
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 8 cpu(s) for setting up the structure of the matrix.
Segmentation fault (core dumped)

Here is my Makefile for ccx:

CFLAGS = -w -D_POSIX_C_SOURCE=199309L -O2 -std=c90 -fiopenmp -DARCH="Linux" -DINTSIZE64 -DMKL_ILP64 -DPARDISO -DLONGLONG -DARPACK -DMATRIXSTORAGE -DUSE_MT=1
FFLAGS = -w -O2 -fiopenmp -i8

CC=icx
FC=ifort

.c.o :
	$(CC) $(CFLAGS) -c $<
.f.o :
	$(FC) $(FFLAGS) -c $<

include Makefile.inc

SCCXMAIN = ccx_2.21.c

OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)


LIBS = \
	/src/arpack-v9.6/libarpack_INTEL.a \
       -lpthread -lm

ccx_2.21_MT: $(OCCXMAIN) ccx_2.21_MT.a  $(LIBS)
	./date.pl; $(CC) $(CFLAGS) -c ccx_2.21.c; $(FC) -qopenmp -nofor-main -o $@ $(OCCXMAIN) ccx_2.21_MT.a $(LIBS) -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread

ccx_2.21_MT.a: $(OCCXF) $(OCCXC)
	ar vr $@ $?

and my ARmake.inc for ARpack:

###########################################################################
#
#  Program:         ARPACK
#
#  Module:          ARmake.inc
#
#  Purpose:         Top-level Definitions
#
#  Creation date:   February 22, 1996
#
#  Modified:
#
#  Send bug reports, comments or suggestions to arpack@caam.rice.edu
#
############################################################################
#
# %---------------------------------%
# |  SECTION 1: PATHS AND LIBRARIES |
# %---------------------------------%
#
#
# %--------------------------------------%
# | You should change the definition of  |
# | home if ARPACK is built some place   | 
# | other than your home directory.      |
# %--------------------------------------%
#
home = /src/arpack-v9.6
#
#  %--------------------------------------%
#  | The platform identifier to suffix to |
#  | the end of library names             |
#  %--------------------------------------%
#
PLAT = INTEL
#
#  %------------------------------------------------------%
#  | The directories to find the various pieces of ARPACK |
#  %------------------------------------------------------%
#
BLASdir      = $(home)/BLAS
LAPACKdir    = $(home)/LAPACK
UTILdir      = $(home)/UTIL
SRCdir       = $(home)/SRC
#
DIRS        = $(BLASdir) $(LAPACKdir) $(UTILdir) $(SRCdir)
#
# %-------------------------------------------------------------------%
# | Comment out the previous line and uncomment the following         |
# | if you already have the BLAS and LAPACK installed on your system. |
# | NOTE: ARPACK assumes the use of LAPACK version 2 codes.           |
# %-------------------------------------------------------------------%
#
#DIRS         = $(UTILdir) $(SRCdir)
#
# %---------------------------------------------------%
# | The name of the libraries to be created/linked to |
# %---------------------------------------------------%
#
ARPACKLIB  = $(home)/libarpack_$(PLAT).a
LAPACKLIB = 
BLASLIB = 
#
ALIBS =  $(ARPACKLIB) $(LAPACKLIB) $(BLASLIB) 
#
# 
# %---------------------------------------------------------%
# |                  SECTION 2: COMPILERS                   |
# |                                                         |
# | The following macros specify compilers, linker/loaders, |
# | the archiver, and their options.  You need to make sure |
# | these are correct for your system.                      |
# %---------------------------------------------------------%
#
#
# %------------------------------%
# | Make our own suffixes' list. |
# %------------------------------%
#
.SUFFIXES:
.SUFFIXES:	.f	.o
#
# %------------------%
# | Default command. |
# %------------------%
#
.DEFAULT:
	@$(ECHO) "Unknown target $@, try:  make help"
#
# %-------------------------------------------%
# |  Command to build .o files from .f files. |
# %-------------------------------------------%
#
.f.o:
	@$(ECHO) Making $@ from $<
	@$(FC) -c $(FFLAGS) $<
#
# %-----------------------------------------%
# | Various compilation programs and flags. |
# | You need to make sure these are correct |
# | for your system.                        |
# %-----------------------------------------%
#
FC      = ifort
FFLAGS	= -O -i8

LDFLAGS = 
CD      = cd

ECHO    = echo

LN      = ln
LNFLAGS = -s

MAKE    = /usr/bin/make

RM      = rm
RMFLAGS = -f

SHELL   = /bin/sh
#
#  %----------------------------------------------------------------%
#  | The archiver and the flag(s) to use when building an archive   |
#  | (library).  Also the ranlib routine.  If your system has no    |
#  | ranlib, set RANLIB = touch.                                    |
#  %----------------------------------------------------------------%
#
AR = ar 
ARFLAGS = rv
#RANLIB  = touch
RANLIB   = ranlib
#
# %----------------------------------%
# | This is the general help target. |
# %----------------------------------%
#
help:
	@$(ECHO) "usage: make ?"

and the changed second.f file in the UTIL directory of ARpack:

      SUBROUTINE SECOND( T )
*
      REAL       T
*
*  -- LAPACK auxiliary routine (preliminary version) --
*     Univ. of Tennessee, Univ. of California Berkeley, NAG Ltd.,
*     Courant Institute, Argonne National Lab, and Rice University
*     July 26, 1991
*
*  Purpose
*  =======
*
*  SECOND returns the user time for a process in seconds.
*  This version gets the time from the system function ETIME.
*
*     .. Local Scalars ..
      REAL               T1
*     ..
*     .. Local Arrays ..
      REAL               TARRAY( 2 )
*     ..
*     .. External Functions ..
      REAL               ETIME
*      EXTERNAL           ETIME
*     ..
*     .. Executable Statements ..
*

      T1 = ETIME( TARRAY )
      T  = TARRAY( 1 )

      RETURN
*
*     End of SECOND
*
      END

By the way, the executable which I created with your build-script inside my Ubuntu VM shows exactly the same problem. @feacluster do you have the same issue?

feacluster · May 2, 2024, 11:36pm

Just noticed the same issue as well. But it went away when I changed the two occurences of -fiopenmp to -fopenmp in the Makefile and recompiled.

Benny_Hill · May 4, 2024, 1:05pm

Yes, this fixed the problem with the multi-threading. But when I try to run some larger models ccx still crashes:

************************************************************

CalculiX Version 2.21 i8, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Fri May  3 07:20:16 UTC 2024
 
  The numbers below are estimated upper bounds
 
  number of:
 
   nodes:                 435456
   elements:                 410375
   one-dimensional elements:                      0
   two-dimensional elements:                      0
   integration points per element:                      8
   degrees of freedom per node:                      3
   layers per element:                      1
 
   distributed facial loads:                      0
   distributed volumetric loads:                      0
   concentrated loads:                      0
   single point constraints:                      0
   multiple point constraints:                      1
   terms in all multiple point constraints:                      1
   tie constraints:                      0
   dependent nodes tied by cyclic constraints:                      0
   dependent nodes in pre-tension constraints:                      0
 
   sets:                      2
   terms in all sets:                1231125
 
   materials:                      1
   constants per material and temperature:                      2
   temperature points per material:                      1
   plastic data points per material:                      0
 
   orientations:                      0
   amplitudes:                      1
   data points in all amplitudes:                      1
   print requests:                      0
   transformations:                      0
   property cards:                      0
 
 
 STEP                      1
 
 Frequency analysis was selected
 
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 8 cpu(s) for setting up the structure of the matrix.
 number of equations
 1306368
 number of nonzero lower triangular matrix elements
 50212188

 Using up to 0 cpu(s) for setting up the structure of the matrix.
 Using up to 8 cpu(s) for the stress calculation.

 Using up to 8 cpu(s) for the symmetric stiffness/mass contributions.

 Factoring the system of equations using the symmetric pardiso solver
 number of threads = 8

 Calculating the eigenvalues and the eigenmodes

U^T*M*U=1.000000 for eigenmode 1
U^T*M*U=1.000000 for eigenmode 2
U^T*M*U=1.000000 for eigenmode 3
U^T*M*U=1.000000 for eigenmode 4
U^T*M*U=1.000000 for eigenmode 5
U^T*M*U=1.000000 for eigenmode 6
U^T*M*U=1.000000 for eigenmode 7
U^T*M*U=1.000000 for eigenmode 8
U^T*M*U=1.000000 for eigenmode 9
U^T*M*U=1.000000 for eigenmode 10
U^T*M*U=1.000000 for eigenmode 11
U^T*M*U=1.000000 for eigenmode 12
U^T*M*U=1.000000 for eigenmode 13
U^T*M*U=1.000000 for eigenmode 14
U^T*M*U=1.000000 for eigenmode 15
U^T*M*U=1.000000 for eigenmode 16
U^T*M*U=1.000000 for eigenmode 17
U^T*M*U=1.000000 for eigenmode 18
U^T*M*U=1.000000 for eigenmode 19
U^T*M*U=1.000000 for eigenmode 20
U^T*M*U=1.000000 for eigenmode 21
U^T*M*U=1.000000 for eigenmode 22
U^T*M*U=1.000000 for eigenmode 23
U^T*M*U=1.000000 for eigenmode 24
U^T*M*U=1.000000 for eigenmode 25
U^T*M*U=1.000000 for eigenmode 26
U^T*M*U=1.000000 for eigenmode 27
U^T*M*U=1.000000 for eigenmode 28
U^T*M*U=1.000000 for eigenmode 29
U^T*M*U=1.000000 for eigenmode 30
U^T*M*U=1.000000 for eigenmode 31
U^T*M*U=1.000000 for eigenmode 32
U^T*M*U=1.000000 for eigenmode 33
U^T*M*U=1.000000 for eigenmode 34
U^T*M*U=1.000000 for eigenmode 35
U^T*M*U=1.000000 for eigenmode 36
U^T*M*U=1.000000 for eigenmode 37
U^T*M*U=1.000000 for eigenmode 38
U^T*M*U=1.000000 for eigenmode 39
U^T*M*U=1.000000 for eigenmode 40
Segmentation fault (core dumped)

I see the same problem with your 2.19 build which I downloaded from your homepage. Interestingly I can run this frequency analysis with the ccx windows executable which I found in the latest dev version of PrePoMax. This build also includes the intel pardiso solver and I did run it through wine. Next I will check if this issue also occurs in static analysis.

feacluster · May 4, 2024, 2:00pm

Try ulimit -s unlimited. But I doubt that is the issue. Likely it is because the arpack library is too old for large integers which is used for the frequency analysis.

Are you able to run large static analysis models?

Topic		Replies	Views
Pastix4Calculix - Build and installation issues	10	2271	September 25, 2024
CalculiX with Intel's MPI cluster sparse solver	38	2598	December 20, 2024
Ccx Makefile with intel mkl pardiso	1	1200	October 19, 2020
libarpack_INTEL_i8.a	10	836	March 28, 2022
Trouble compiling and running CalculiX with Pastix on Ubuntu 24.04	31	415	January 7, 2025

Using the feacluster.com ccx install script with Intel HPC Toolkit v2024.1.0

Related topics