Steady state dynamics randomly terminates without error

Hello,

I was unable to reproduce the error. Both models completed normally.
For testimg i’m using my ccx 2.20 in Windows 7 (i’m working in old Windows, not 10).
and ccx is no multithread

With best regards,
Prool

running your ccx.exe under Windows 10 (22H2) I can reproduce Matej issue. model-1 produces cygwin_exception while model-2 does not.

Analogically in macOS Catalina.

ccx 2.20

model-1 is crashed
model-2 not

With best regards,
Prool

hi, trying to run both input files on my machine Win10 (ccx_dynamic/pardiso). thrown some error messages,

*WARNING in calinput: PU, PHS, MAXU or MAXS was selected for a static, a non-cyclic-symmetric frequency, a buckling or a modal dynamic calculation; the output option is removed.

*WARNING in steadystate: too many modal damping coefficients applied damping coefficients corresponding to nonexisting eigenvalues are ignored

*edited
similar problems with ccx Prool’s distribution, add some message

model-1.inp

0 [main] ccx 310 cygwin_exception::open_stackdumpfile: Dumping stack trace to ccx.exe.stackdump

model-2.inp

Job finished

when i move the boundary to place outside the step, the problem become opposite in condition. model-1 is finish but not for model-2

this problem may similar, moving the boundary to place outside the step make it work.

i modified the input files (SSD_problem.inp) and try to re-running several times is consistent and finished using CCX Prool’s distribution…

beamdy8.inp placed the boundary outside the step, however, in your model is not.

Hereby my experience so far by debugging and testing.

I don’t want to be the judge of whether there are errors in ccx_2.20 or errors in the input files SSD_problems.inp, model-1.inp and model-2.inp, but I rightly believe that it would be nice if an error message could be inserted in future versions of ccx

Compiling and running the untouched original code of ccx_2.20 with the 3 datasets will produce a segmentation fault in the module steadystatedynamicss of other locations in the code where the cs array is being used because the array cs has not been allocated by use of the 3 datasets.

My claim can be proven by inserting the code shown in line 582 of ccx_2.20.c where the array should at least be allocated to avoid segmentation errors

Perhaps with this knowledge others can contribute with possible work arounds

me too, don’t really know it’s related to all dynamic problems or not, but reading the manual documents in case using *Base motion with *Modal dynamic and *Steady state dynamic calculations recommended with emphasized the boundary to be placed outside/before the *Step.

several example in dynamics analysis, twenty and more at almost case placed the boundary outside the step. only few examples placed inside the steps and maybe for testing purpose. so i guess this recommendation may apply for general use in dynamic analysis.

This will be my walk through for now without any warranty.

Running the untouched ccx_2.20 caused a segmentation fault in the module steadystatedynamicss due to the unallocated cs array.

I fixed this just by allocating a minimum cs array in that case it not already had been allocated in the file ccx_2.20.c
ccx_2.20

Then I ran into the next segmentation fault in the errorestimator module caused by the loop in the was started from the end which doesn’t make much sense. the errorestimator module is called from the frd module in frd.c. By looking at the errorestimator for the thermal error estimator call where first half of the array is considered to be real and the last part to be imaginary it make sense instead of the number 6 to put in the number 3 in line 1895 and line 1896. This will fix the segmentation fault in the errorestimator module.

During my walk through I also hit a few irregulars which I changed just because it seems to be the most logical. For the arpack module in line 827 I swap the int and float value so it fit with the printf format. In line 856 I also put a = infront of the / so in looked like the /= since without it doesn’t do anything, I’m sorry for the missing = in the picture
arpack2

Further in the arpack module in line 549 I swap the sti array with stx array so all calls for the frd module in arpack look identically and since it doesn’t make much sense to transfer the same array 2 different places in the same call
arpack

The modified code seems to run withou errors in Debian linux but it’s very weird in windows 10 and 7 where when running in ccx_2.20 in debugger model-1.inp & model-2.inp are running to job finished without any errors at all, but running clean it does some unknown violation to the OS causing the program to finish before end. Somehow I believe this failure must belong somewhere in the multi thread system since it doesn’t occure when the program is running in the debugger .

Please let me know if you thing I have done something totally madness :slight_smile: of if you should have anything to contribute with.

My personal final conclusion on this issue will be that something is conflicting in the multi thread implementation of ccx maybe depending on the individually OS and configuration. When I succeeded with debugger I found that step 2 was starting before step 1 had finalized writing to the frd file which cause a segmentation fault at line 44 in the frdvector module because both the arpack module and the steadystate module in turns accessed frdvector.

For my configuration I have found the only way I succussed every time with the data file SSD_problems.inp, model-1.inp & model-2.inp was to split up the steps into separate files with.

*restart, write
*step

OS rename xx.rout xx.rin

*restart, read
*step

Have fun :slight_smile:

1 Like

@fgr

I think you have identified for the developers a series of very concise points to start working on solving the problem.
Thank you very much for your time and dedication.

1 Like

Let’s make the developers aware of these findings. Does anybody know who is compiling the official Windows CalculiX?

1 Like

My results from March 4th and 6th about the unallocated cs array should probably be ignored as it conflicts when running a static analysis. Realizing this along with the safe way of running the job as separate data cards by “restart write/read” convinced me about shelfing the debugging since i believe others probably have more competence in this job than me.

1 Like

I couldn’t live with a random running program, so I had to solve this issue. I have to admit it has been uphill with 3 steps forwards and nearly 3 backward unsuccessful and maybe because I searched and expected to find a floating variable. After hours of debugging with different tools and messages of different random unallocated memory and outbound arrays I got that idea to give all allocated memory a little extra space, and this actual seems to be a quick hotfix to solve the issue.

CCX uses multithreading for manipulate with arrays, splitting them up for multiple thread depending off the setting of OMP_NUM_THREADS and by doing this with simple code either a remain ( not acceptable) or a possible outbound will occur in case of leak in memory allocation.

So my hotfix for this issue have been just to add a few bytes to all dynamic memory allocation. I have tested the fix on different machines with different number of threads 1-8 and haven’t been able provoke errors with the dataset from this thread SSD_Problem.inp, model-1.inp & model-2.inp so I believe it’s a reliable hotfix.

For those interested i have prepared this dropbox package with a patched version of ccx_2.20 including the changed source code files. I don’t have a running Pastix library so this version will only run with the Pardiso & Spooles solver. Further this compilation of the Spooles gives eigenvalues matching the Pardiso

I’m not interested in doing maintains on CCX, but please give me a note, if this hotfix shouldn’t solve this specific issue.

Have Fun

3 Likes

@fgr That’s really great. The changes were made specifically to the Windows build and thus Guido can’t implement them in the general source code repository but someone who makes builds for Windows would have to do it, right ? It would be best to implement your fix for future versions.

@Calc_em , Although this issue not very often show in the Linux/Unix build I believe that someone would find worthy to take a look at my last changes since it actual are something in the general code which cause the exit before finish.
When I told my son about my hotfix he yield at me and called me an old engineer scrap coder what I couldn’t have sitting on my back why I needed to find the exact problem.
For whom interested I have made some small notes in the source just search for the word “fgr-mod”

Have fun

1 Like

Thank you for the explanation, I will ask Guido to take a look at it then. This issue is really nasty as it makes SSD analyses unreliable so your fix seems to be of a great importance. I’m glad that you shared it since, once implemented, it can make SSD simulations usable again.

2 Likes

Dear all,

Jakub Michalski sent me his file SSD…inp. Under Linux no segfault occurs, independent of the number of cpu’s and how many times I run the problem (in this case 100 times). However, valgrind pointed me to a memory problem in frd.c related to the imaginary parts of RF, ERR and similar fields. I think I solved this problem now (no messages from valgrind any more) and sent a new version of frd.c to Jakub. Maybe someone can try this version under Windows and check that the problem is really gone. Sorry about that.

If somebody tells me how to I can also upload the routine here.

Thanks for the feedback.

Guido

2 Likes

Thank you so much Guido!
The best way to upload anything here is by sharing a link to a file, maybe a link to the GitHub repository? It looks like it was updated a few hours ago:
GitHub CalculiX: removed an error in frd.c (imaginary part for steady-state calculations) - commit 26201a3
It looks like only the “frd.c” and “steadystate.c” files need to be updated, is that correct?

Here’s the file (I’ll try to test it too but I’ve never compiled ccx on Windows before):

From what I understand, you only have to replace frd.c with the new file.

There were additional changes to steadystate.c, particularly to correcting prescribed boundary conditions - mostly removals from the prior version. Take a look at the link above to Github and see the diff.