CalculiX and PaStiX solver Windows version

Please test this beta version of ccx 2.18 i8:

password: test

1 Like

Thank you. Test in progress for a 1871515 node problem with Pastix. Running quite a bit slower than Pardiso (but running), apparently due to a much heavier use of the page file than Pradiso. Later I will try a 1248281 node file which should mostly fit in core. I will also increase the CPU count as the different methodology might work better with more. Previous Pardiso runs did not benefit much with more than 4 cores.

1 Like

Try with set PASTIX_MIXED_PRECISION=1, in my case, it reduced the used RAM from 31 GB to 20 GB and speeded up the process.

I have been using that setting. I think it is just not quite as memory efficient as Pardiso. I am currently running a problem with 1248551 nodes and upped my cpu’s to 6 and it is progressing well. What I don’t know is if it is faster than Pardiso for some problem size. I agree the Mixed precision helps a lot but I have used that since I started using Pastix. For the smaller problems it worked for it was about 2/3 the running time of Pardiso.

My system is an AMD 3700X with 64GB of memory and a nvme Pcie 4.0 SSD for the pagefile.

What I have so far:

CCX_2.18_Static Beta.exe Pastix

1871515 nodes - 4 cpus ~7:30:00 161GB committed

1248281 nodes - 4 CPUs 1:23:13 115GB committed

1248281 nodes - 6 CPUs 1:34:26 119GB committed

I will try a 489035 node problem in the morning and then run the Pardiso solver in your beta.exe with these same problems. I may compare with a CCX.exe I compiled with the latest MKL extensions.

Note my physical ram used by Calculix has been running 61GB + whatever is paged to SSD.

Michael L. McMullen PE

720-318-8158

Michael_McMullen@Mindspring.com

1 Like

Well I ran a bunch of similar problems with Mecway 14 and Calculix. My system is a Ryzn 3700X with 64GB of Memory, and a pcie 4.0 gen 1 nvme ssd running the page file. Stock speeds.

489035 nodes - 6 CPUs 0:07:32 39GB committed CCX 2.17 Pastix 8/31/21

489035 nodes - 6 CPUs 0:07:52 45GB committed CCX_2.18_Static Beta Pastix 8i

489035 nodes - 6 CPUs 0:07:55 41GB committed CCX_2.18_Dynamic Pastix

489035 nodes - 6 CPUs 0:11:08 31GB committed CCX_2.18_Dynamic Pardiso

489035 nodes - 6 CPUs 0:12:42 29GB committed CCX 2.17 Pardiso 7/03/21

822105 nodes - 4 CPUs 0:08:51 46GB committed CCX 2.18 Static Beta Pastix 8i

822105 nodes - 4 CPUs 0:10:56 34GB committed CCX 2.18 Dynamic Pardiso

1051596 nodes - 4 CPUs 0:24:36 71GB committed CCX 2.18 Static Beta Pastix 8i

1051596 nodes - 4 CPUs 0:39:12 57GB committed CCX_2.18_Dynamic Pardiso

1248281 nodes - 6 CPUs 0:45:39 72GB committed CCX_2.18_Dynamic Pardiso

1248281 nodes - 4 CPUs 1:23:13 115GB committed CCX_2.18_Static Beta Pastix 8i

1248281 nodes - 6 CPUs 1:34:26 119GB committed CCX_2.18_Static Beta Pastix 8i

1871515 nodes - 6 CPUs 2:15:42 108GB committed CCX_2.18_Dynamic Pardiso

1871515 nodes - 4 CPUs 2:21:28 103GB committed CCX_2.18_Dynamic Pardiso

1871515 nodes - 4 cpus ~7:30:00 161GB committed CCX_2.18_Static Beta Pastix 8i

Observations:

Pastix uses at least 35% more memory than Pardiso

6 CPUs uses a few more GB of memory than 4 CPUs

Total problem time increases rapidly with memory required once paging becomes significant.

Pastix is faster than Pardiso for smaller problems, but size where they cross probbly depends on keeping the problem mostly in core. Perhaps around 1,100,000 nodes on my system Pastix and Pardiso solver are about the same speed.

Compiling with 8 byte integers eliminates the program stop problem with Pastix that occured at about 500,000 nodes

SSD has used 1% of its life after 8900 hours (a year) of on time.

These problems are nearly identical, but differ a bit in the strength of the material so the number of iterations varies a little between problem sizes. For my system 8 CPUs is always slower than 4 or 6. 6 is usually slightly faster than 4.

Problem time started to increase rapidly for Committed Memory exceeding about 166% of core size.

The last problem which ran over 7.5 hours was probably on the borderline of thrashing.

Michael L. McMullen PE

720-318-8158

Michael_McMullen@Mindspring.com

3 Likes

Strange that 8 cpus is slower than 4 on your machine. Perhaps hyperthreading is enabled and the machine only has four real cpus?

Hyperthreading enabled and 8 real cpu’s 16 virtual. I think the recommendation to turn hyperthreading off may not be relevant to my machine (AMD Ryzen 3700X) though I have not tested it. Memory bandwidth may be the restraining factor as the mid level reasonably priced Amd’s are only 2 channel. Automatic core load leveling may be a factor too. I could crank up the memory speed (I’m running 2133mhz nominal rated at 3600), but memory cooling is only so-so when all slots are full and there is no ecc. Probably true that 8 cores and no hyperthreading for the Intel high end machines which have larger registers and cache and better memory bandwidth, but possibly not for newer ones as a lot of users on the forum also have an optimum of about 6 cores. The new Ryzens 5700x and up might behave closer to the intel. Also the compiles probably are not optimized for amd who’s main current virtue is more cores/$$ MKL certainly is not. Basically I have a mid to mid high end home tower with a bit more memory and SSD than typical. I squeak out what I can. Other programs (video editing, n-body problems) that parallelize better, use all the cores effectively and even the GPU.

One nice thing about using only 4 or 6 cores is that if the memory demand is OK I can run two or more problems or heavy duty applications simultaneously, and the cores can run at turbo all day.

An interesting thing with calculix is assigning more cores uses them fully in the parts of the program that are highly parallel, but the problem solve time is not faster? Newer versions might be better.

My previous computer (2010 vintage) did have 4 cores + hyperthreading and worked reasonably well, but the MB had some damage from a power surge in 2016 leaving the hard drive dead UPS fried, some MB functions no longer working or spastic, time to upgrade OS, so I upgraded (user build) with an eye to doing more FEM. But also an eye to keeping the cost reasonable(~$1000).

Also strange that 5 cores is slower than either 4 or 6. Note when running 4 cores machine runs between 25% and 38% cpu load

And that more than 4 cores may be in use. Other exec and i/o stuff than the solvers is going on using other cores, and perhaps hyperthreading.

Michael L. McMullen PE

720-318-8158

Michael_McMullen@Mindspring.com

2 Likes

Note I tried to run my memory at its 3600 mhz speed after this post, but it didn’t boot. Note thinking about this cache speed and size is probably also an issue.

New: 2.19 + exodus (source code from: GitHub - gustafson/CalculiX at exodus)

This version need Intel MKL library

PS. New info (example):

 Data writen to the .exo file
	- Number of nodes: 20000
	- Number of elements: 9801
	- Number of element blocks: 17
	- Number of node sets: 4
	- Number of element sets: 0
	- Element, and surface sets not written to .exo.
	  Affected sets are:
		- EALL
	- Inactive nodes (unused or due to shell and beam expansion):
    - Writing .exo file time period 1 at time=1.000000
2 Likes

New: 2.19+exodus (fix bug with 2D elements):

old (only points):

new:

More:

3 Likes

@rafal.brzegowy looks like there is a missing library: mkl_rt.1.dll ??
image

Like before:

This version of ccx is based on Intel PARDISO and therefore requires its files/.dll library, which the user should provide (this is the equivalent of ccx_dynamic).

1 Like

Hello @rafal.brzegowy,

Can CalculiX be built to use PaStiX out of the box from the msys2 distribution? If not, what modifications are required? I am testing a few UMATs which require me to build the executable on Windows but I have never been able to successfully get the PaStiX solver to build. Any hints you can share will be greatly appreciated.

Many thanks,
JW

1 Like

Hi,

You don’t need to compile ccx to use new UMAT, you can use ABAQUS interface for that. All you need to do is build a .dll library from the UMAT file and load it into ccx.

Short guide :

a. Compilation your’s umat like xyz.f in mingw64:

gfortran -c xyz.f

gfortran -shared -o libxyz.dll xyz.o

b. Next you can copy receive file (like xyz.dll)

c. In the .inp file, you refer to this library (this material):

*MATERIAL, NAME=@ABAQUS_XYZ
*USER MATERIAL, CONSTANTS=6
210000.,.3,1.,1.,1.,1.
*SOLID SECTION, ELSET=Eall, MATERIAL=@ABAQUS_XYZ

@ABAQUS for linear materials

@ABAQUSNL for non linear materials

PS. You can also load MFRONT material.

1 Like

Hello @rafal.brzegowy,

Many thanks for the detailed explanation. Unfortunately, I tried building a shared library but for some strange reason, the material breaks when it hits a Fortran stop in the for loops. It also jumps out when it hits any write in the umat code as well. Through partial compilation, I found the exact points where the ccx would just exit.
I also tried the MFRONT but it refused to load the library with my only working option being to compile my own version of ccx.
So that brings me back to trying to build with PaStiX on my machine. I would appreciate any help you can give me on getting it up and running.

Many thanks,
JW

See this (Files changed):

Many thanks @rafal.brzegowy. I will have a go this weekend and let you know how I get on. Do I get the new package using pacman -S mingw-w64-pastix4calculix?

Best wishes,
JW

If the compilation is successful, you will have to install this package locally, e.g .:
pacman -U package

Good news: pkg-config --static (mingw/msys2) (#57) · Issues · solverstack / PaStiX · GitLab

Many thanks @rafal.brzegowy.

Best wishes,
JW