CPU Performance and L2/L3 Cache

Hi, I’m looking to choose between two AMD processors for a new workstation build. I’m trying to choose between a Ryzen 9 9950X and a Ryzen 9 7950X3D (see screenshot)

  • Both are 16 core processors, nominally the 9950 runs at 4.3 GHz and the 7950 runs at 4.2 GHz
  • Both have 16MB L2 cache
  • The 7950 has 128MB L3 cache while the 9950 has 64MB
  • The 9950 is approximately $110 cheaper at the moment

Which will translate to better real-world FEA performance, assuming all else is equal? Does L3 cache have a significant effect on FEA performance? Does this change with single versus multicore processing?

Thank you!

It’s highly dependent on the solver and problem (element types, analysis features and so on) but you can use some benchmarks for commercial software as a reference.

The working set of memory for a substantial FEA will be much larger than the size of the cache. That said, I don’t expect a larger cache to hurt.

Probably more important is the amount of RAM the motherboard can handle. Get as much as possible. :slight_smile:

The 7950X3D does give off a lot less heat apparently, that makes cooling easier and is another point in its favor.

In my tests hyperthreading made performance worse, so you might want to disable it. And after you get to four cores, the gains of adding more cores tend to taper off.

1 Like

Excellent, thanks for the great answer.

The core scaling - is that with calculix only or with commercial software too? I haven’t done a ton of testing because I typically only have access to 6 or 8 cores, though I have run jobs on 96 core machines but never enough to get good benchmark numbers on scalability.

For memory, I wanted to get 2x64GB for a total of 128 GB but it seems that kits like that are in short supply and expensive…I was reading that running with 4x DIMMs instead of 2x would reduce the speed of the memory modules. Do you know anything out this?

For reference, the AMD product page says this about the 9950X -

Only with CalculiX. Most commercial FEA software seems to be windows-only these days, and I’m running FreeBSD UNIX.

Every multithreading/multiprocessing method has some overhead.
The more cores you use, the larger the overhead of starting and syncronyzing threads/processes becomes.

The product page clearly shows a higher speed for “2x” then for “4x”.
This makes sense given that the CPU only has two memory channels.
So when you’re using four DIMMs, there are two DIMMs connected to each memory channel. Each channel can only read from one DIMM at a time.

is this condition similar for TAUCS as solver also? i’m not yet tried to do compare since the binary distribution is lost.