Quite spectacular! I'm curious what times people with those 24-core M2 Ultra chips would get.

Quite spectacular! I'm curious what times people with those 24-core M2 Ultra chips would get.

Ultra Fractal author

 
0
reply

Hi Frederik,
Yes, but it's very expansive smile Btw I tried again now and the render time is still around 1.34 minutes, I have several programs open.
I am very happy because I decided to get the i7 14700K precisely to speed up the fractal calculation, taking into consideration that I have a 4K monitor so I wanted a fast processor without compromises.
I assembled it all myself smile I bought the components and assembled it, the case is not compatible because the heatsink hits the RAM and therefore I had to put the front fans on top and the heatsink in front, unfortunately I bought the case years ago and the Ram Viper Venom are tall, this is the composition:

CPU : I7-14700K, 3.4 ghz up to 5.6ghz raptor lake refresh 20 cores 16 threads
Motherboard: MSI Z790-A MAX PRO WIFI-7 DDR5
Video Card: Asus Geforce RTX 4070 Dual EVO OC 12GB
RAM: Patriot Viper Venom 32GB (2X16) 6800 mh/z, DDR5
SSD 1: Ediloca EN870 SSD 1TB PCIe Gen4, NVMe M.2 2280, up to 7450 MB/s, dynamic SLC cache
SSD 2: Ediloca EN870 SSD 2TB PCIe Gen4, NVMe M.2 2280, up to 7450 MB/s, dynamic SLC cache
Liquid heatsink: ARCTIC Liquid Freezer II 240 A-RGB
Power supply: Seasonic G12-GC 750W 80+ Gold
Case: Gamemax Draco NEW 4 ARGB fans (I had to remove a fan inserting the heatsink from the front)

Today's rendering of the benchmark test:

15/02/2024 19:17:36: Starting job Benchmark1.
15/02/2024 19:17:36: Anti-aliasing off.
15/02/2024 19:17:36: Motion blur off.
15/02/2024 19:17:36: Rendering in PNG format.
15/02/2024 19:17:36: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr.
15/02/2024 19:19:10: Finished calculation. Time: 0:01:34.27.
15/02/2024 19:19:10: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark1.png.
15/02/2024 19:19:13: Job finished.

Hi Frederik, Yes, but it's very expansive :P Btw I tried again now and the render time is still around 1.34 minutes, I have several programs open. I am very happy because I decided to get the i7 14700K precisely to speed up the fractal calculation, taking into consideration that I have a 4K monitor so I wanted a fast processor without compromises. I assembled it all myself :D I bought the components and assembled it, the case is not compatible because the heatsink hits the RAM and therefore I had to put the front fans on top and the heatsink in front, unfortunately I bought the case years ago and the Ram Viper Venom are tall, this is the composition: **CPU :** I7-14700K, 3.4 ghz up to 5.6ghz raptor lake refresh 20 cores 16 threads **Motherboard:** MSI Z790-A MAX PRO WIFI-7 DDR5 **Video Card:** Asus Geforce RTX 4070 Dual EVO OC 12GB **RAM:** Patriot Viper Venom 32GB (2X16) 6800 mh/z, DDR5 **SSD 1:** Ediloca EN870 SSD 1TB PCIe Gen4, NVMe M.2 2280, up to 7450 MB/s, dynamic SLC cache **SSD 2:** Ediloca EN870 SSD 2TB PCIe Gen4, NVMe M.2 2280, up to 7450 MB/s, dynamic SLC cache **Liquid heatsink:** ARCTIC Liquid Freezer II 240 A-RGB **Power supply:** Seasonic G12-GC 750W 80+ Gold **Case:** Gamemax Draco NEW 4 ARGB fans (I had to remove a fan inserting the heatsink from the front) Today's rendering of the benchmark test: 15/02/2024 19:17:36: Starting job Benchmark1. 15/02/2024 19:17:36: Anti-aliasing off. 15/02/2024 19:17:36: Motion blur off. 15/02/2024 19:17:36: Rendering in PNG format. 15/02/2024 19:17:36: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr. 15/02/2024 19:19:10: Finished calculation. Time: 0:01:34.27. 15/02/2024 19:19:10: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark1.png. 15/02/2024 19:19:13: Job finished.

Andrea Spinozzi

https://fractalcosmo.com
 
0
reply

Hello everyone, longtime UF user, wanted to share my own results of the benchmark test since I finally replaced my ancient i7-6700 a couple weeks ago.

3/6/2024 7:34:04 PM: Starting job Benchmark.
3/6/2024 7:34:04 PM: Anti-aliasing off.
3/6/2024 7:34:04 PM: Motion blur off.
3/6/2024 7:34:04 PM: Rendering in PNG format.
3/6/2024 7:34:04 PM: Calculating Benchmark.
3/6/2024 7:37:23 PM: Finished calculation. Time: 0:03:18.78.
3/6/2024 7:37:23 PM: Saving to C:\Users\Sycai\OneDrive\Documents\Fractals\Benchmark.png.
3/6/2024 7:37:26 PM: Job finished.

I did have my browser open to Twitch and Discord running, but that's it. The CPU temp didn't get above 82.4c and it was audible, but I'm impressed at what this little machine can do, so I can't complain.

My new pc is a Minipc called the UM780 XTX by Minisforum.
CPU: Ryzen 7 7840HS (8/16)(5.1MHz Boost)
GPU: Radeon 780m iGPU (8GB vram)
Memory: Crucial 32GB at DDR5-5600MHz
Storage: Kingston 1TB PCIe4 NVMe

I forgot to test my old pc but I know it would have taken at least an hour to render the benchmark and sound like a jet trying to take off. smile

Hello everyone, longtime UF user, wanted to share my own results of the benchmark test since I finally replaced my ancient i7-6700 a couple weeks ago. 3/6/2024 7:34:04 PM: Starting job Benchmark. 3/6/2024 7:34:04 PM: Anti-aliasing off. 3/6/2024 7:34:04 PM: Motion blur off. 3/6/2024 7:34:04 PM: Rendering in PNG format. 3/6/2024 7:34:04 PM: Calculating Benchmark. 3/6/2024 7:37:23 PM: Finished calculation. Time: 0:03:18.78. 3/6/2024 7:37:23 PM: Saving to C:\Users\Sycai\OneDrive\Documents\Fractals\Benchmark.png. 3/6/2024 7:37:26 PM: Job finished. I did have my browser open to Twitch and Discord running, but that's it. The CPU temp didn't get above 82.4c and it was audible, but I'm impressed at what this little machine can do, so I can't complain. My new pc is a Minipc called the UM780 XTX by Minisforum. **CPU:** Ryzen 7 7840HS (8/16)(5.1MHz Boost) **GPU:** Radeon 780m iGPU (8GB vram) **Memory:** Crucial 32GB at DDR5-5600MHz **Storage:** Kingston 1TB PCIe4 NVMe I forgot to test my old pc but I know it would have taken at least an hour to render the benchmark and sound like a jet trying to take off. :D

https://solo.to/sycaid

CPU: Ryzen 7 7840HS (8/16)(5.1MHz Boost) + GPU: Radeon 780m iGPU (8GB vram)
edited Mar 7 at 4:14 am
 
0
reply

HI,
An update, the i7 14700K is a very good CPU, I am satisfied with the purchase, but it has a problem and that is that it heats up a lot, 20 cores and 28 threads heat up a lot and the CPU under load during rendering easily reaches 97 degrees.
I have to tidy up the airflow because I have to turn two fans, but I managed to gain 7 degrees already with this configuration, however the calculation time increases by 6 seconds.
So the i7 14700K is a CPU that has an excellent quality-price ratio and performance, but the temperature under load must absolutely be taken into consideration, because INTEL has put on the market a CPU that heats up like hell fire smile smile

This is the new test with the right temperatures :

08/03/2024 20:57:01: Starting job Benchmark4.
08/03/2024 20:57:01: Anti-aliasing off.
08/03/2024 20:57:01: Motion blur off.
08/03/2024 20:57:01: Rendering in PNG format.
08/03/2024 20:57:01: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr.
08/03/2024 20:58:42: Finished calculation. Time: 0:01:40.78.
08/03/2024 20:58:42: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark4.png.
08/03/2024 20:58:46: Job finished.

Ciao

HI, An update, the i7 14700K is a very good CPU, I am satisfied with the purchase, but it has a problem and that is that it heats up a lot, 20 cores and 28 threads heat up a lot and the CPU under load during rendering easily reaches 97 degrees. I have to tidy up the airflow because I have to turn two fans, but I managed to gain 7 degrees already with this configuration, however the calculation time increases by 6 seconds. So the i7 14700K is a CPU that has an excellent quality-price ratio and performance, but the temperature under load must absolutely be taken into consideration, because INTEL has put on the market a CPU that heats up like hell fire :D 8) This is the new test with the right temperatures : 08/03/2024 20:57:01: Starting job Benchmark4. 08/03/2024 20:57:01: Anti-aliasing off. 08/03/2024 20:57:01: Motion blur off. 08/03/2024 20:57:01: Rendering in PNG format. 08/03/2024 20:57:01: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr. 08/03/2024 20:58:42: Finished calculation. Time: 0:01:40.78. 08/03/2024 20:58:42: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark4.png. 08/03/2024 20:58:46: Job finished. Ciao

Andrea Spinozzi

https://fractalcosmo.com
 
0
reply

I did the test again, it now took 1.36 minutes, so the performance always seems stable, it can vary from 1.33 to 1.40 minutes.
I lowered the temperatures of the i7 14700K only by selecting BOXED COOLER in the BIOS, the temperatures are lowered however the performance in the calculation rises from 1.32/34 to 1.36/40.

Ciao

08/03/2024 21:44:33: Starting job Benchmark5.
08/03/2024 21:44:33: Anti-aliasing off.
08/03/2024 21:44:33: Motion blur off.
08/03/2024 21:44:33: Rendering in PNG format.
08/03/2024 21:44:33: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr.
08/03/2024 21:46:10: Finished calculation. Time: 0:01:36.11.
08/03/2024 21:46:10: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark5.png.
08/03/2024 21:46:12: Job finished.

I did the test again, it now took 1.36 minutes, so the performance always seems stable, it can vary from 1.33 to 1.40 minutes. I lowered the temperatures of the i7 14700K only by selecting BOXED COOLER in the BIOS, the temperatures are lowered however the performance in the calculation rises from 1.32/34 to 1.36/40. Ciao 08/03/2024 21:44:33: Starting job Benchmark5. 08/03/2024 21:44:33: Anti-aliasing off. 08/03/2024 21:44:33: Motion blur off. 08/03/2024 21:44:33: Rendering in PNG format. 08/03/2024 21:44:33: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr. 08/03/2024 21:46:10: Finished calculation. Time: 0:01:36.11. 08/03/2024 21:46:10: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark5.png. 08/03/2024 21:46:12: Job finished.

Andrea Spinozzi

https://fractalcosmo.com
 
0
reply

Last update, I turned the heatsink fans to improve the airflow and in the BIOS I put boxed cooler instead of water cooler, today I did the test again after turning the fans and the performance seems the same but the temperatures are much better, now under load there are only 2 cores that reached 85 but almost all the others are around 70°, now it's perfect.
Hello everyone and have a good weekend

09/03/2024 16:43:48: Starting job Benchmark6.
09/03/2024 16:43:48: Anti-aliasing off.
09/03/2024 16:43:48: Motion blur off.
09/03/2024 16:43:48: Rendering in PNG format.
09/03/2024 16:43:48: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr.
09/03/2024 16:45:21: Finished calculation. Time: 0:01:32.66.
09/03/2024 16:45:21: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark6.png.
09/03/2024 16:45:24: Job finished.65ec89c7d3cd2.png

Last update, I turned the heatsink fans to improve the airflow and in the BIOS I put boxed cooler instead of water cooler, today I did the test again after turning the fans and the performance seems the same but the temperatures are much better, now under load there are only 2 cores that reached 85 but almost all the others are around 70°, now it's perfect. Hello everyone and have a good weekend 09/03/2024 16:43:48: Starting job Benchmark6. 09/03/2024 16:43:48: Anti-aliasing off. 09/03/2024 16:43:48: Motion blur off. 09/03/2024 16:43:48: Rendering in PNG format. 09/03/2024 16:43:48: Calculating C:\Users\spino\OneDrive\Documenti\Ultra Fractal 6\Fractals\Benchmark.ufr. 09/03/2024 16:45:21: Finished calculation. Time: 0:01:32.66. 09/03/2024 16:45:21: Saving to C:\Users\spino\Documents\Ultra Fractal 6\immagini2024\Benchmark6.png. 09/03/2024 16:45:24: Job finished.![65ec89c7d3cd2.png](serve/attachment&path=65ec89c7d3cd2.png)

Andrea Spinozzi

https://fractalcosmo.com
 
0
reply

I'm really curious what the newest many-core Apple Silicon CPUs would do with this benchmark.

I'm really curious what the newest many-core Apple Silicon CPUs would do with this benchmark.

Ultra Fractal author

 
0
reply

Edit: For easier updates, I created this spreadsheet.

Thanks! Looks like we have a good selection by now:

Threadr. Pro 5975WX 2×16 @ 4.3 GHz:  1m14s (10 Tops, later addition)

Thanks for making this chart, but it needs a lot of adjustment I think. Everything should probably be normalized to full boost clock. It's not realistic but base clocks are intentionally very low on some models and give falsely low operation counts, when in reality the total number of operations required to calculate a fractal is basically invariant and the operations per second are all that matter.

For example, the 14700K which shows the highest efficiency on the spreadsheet has actual passmark scores (with a ridiculous power draw higher than 5000 series threadripper, which explains those temperatures along with the far lower surface area to dissipate heat over):
Integer Math 184,923 MOps/Sec
Floating Point Math 135,930 MOps/Sec

My processor has passmark scores of:
Integer Math 362,264 MOps/Sec
Floating Point Math 202,373 MOps/Sec

Using known measured ops/s numbers from some source like passmark to calculate some kind of efficiency metric might be nicer, especially if high power draw were used instead of processor cost (which is a sunk cost in this case). You've got to assume that people won't buy a TR Pro system (populating 8-channel memory with enough that individual cores won't be starved for whatever your appplication is alone costs as much as a consumer desktop) unless they know why they need it. The flipside of that is that my system draws nearly 90W less power running something like this as a 14700K because of Intel's insane power limits, and can handle running far more at once so it's not active for as long, so by that metric it already paid for itself in power savings. I could encode 2 1080p HEVC files in software on my old system at slightly less than realtime, on this one I can encode 4 at once at 1.5-2x realtime. The power draw for that is higher but the i7-6950x had very crazy spikes in draw and high overall constant draw when running at boost. Probably over 200W. The decrease in time running at that kind of power use makes up for it.

It also stalls quite often during the calculation as the UI thread spawns the 64 calculation threads for the next set of pixels, or does some other operation that pretty much halts everything else.

My old machine is an i7-6950x (which can normally do this benchmark in about 3:20. You can add that to the list too if you want. It's a 3GHz base clock, 10/20 core. Because of pipelining differences and downclocking behavior I think I found that it's fastest at 14-16 threads rather than the full 20.

Finally, since this processor has 32MB of L3 per CCX I did some experiments with tiling... basically the idea is that since the full image here is ~47MB and knowing that UF isn't NUMA-aware (I'm running NPS1 right now anyway so it shouldn't matter) the best case scenario is that all data for a given tile be able to fit in the L3 of a single CCX. That way cache misses on other CCXs will only encounter the cross-CCX infinity fabric latencies and only once rather than a potential cross-memory-controller issue (the 8-channel memory is organized into 2 channels per CCX) + cross-CCX when accessing data when a new pixel is calculated, since the image is probably stored more or less linearly in memory and NPS1 stripes all 8 channels which results in higher overall memory latency but also higher memory bandwidth. There is a difference that's not super high but visible.

A baseline normal render took 01:13.00 this time and shows the usual pattern of occasional large dips to barely any cores being used every couple of seconds.

A 16 tile render was a second faster at 01:12:01, but only maintained ~92% with more frequent but lesser magnitude utilization dips. I used .bmp output so saving images was cheap but it was still unknown overhead on each tile. It's likely that the cost of initializing and saving a given file is higher than the actual time it takes to write the data to disk since the target NVMe is ~7GB/s sequential write speed. There's also the cost of initializing memory if it isn't just re-used and it'll have to be re-cached by the CPU. Tile render times varied from 3.74s to 5.24s since some hit low complexity areas in the corners.

I found that for 4x4 tiling (16 images) each tile was ~11MB which fits nicely in the CCX L3s with some room to spare, while the extra image saving is minimal. This was the fastest case at 1:09.65. CPU utilization was ~97% with dips somewhere between the other two. There was very little variance in render time between tiles since they all contained a quarter of the most complex part.
Of course this neglects the time to stitch the images afterwards but it's something to consider and on the processors where this actually matters memory bandwidth is so insane the time this takes is probably negligible. It might be optimal on multi-CCX processors like the newer Xeons and all of AMD's server and workstation lines to proactively split the image into tiles (strips would probably be more efficient), render one at a time, and copy into one place in memory, then stitch before saving. It's easier to program than NUMA support (which I know you don't want to mess with from our emails) and may have a more extreme speed increase when done without saving multiple files.

I've been playing with SideFX Houdini more lately so I haven't really messed with UF as much as I planned but thought I'd drop by and mention that.

>_Edit: For easier updates, I created [this spreadsheet](https://docs.google.com/spreadsheets/d/1pho68LvdApjV2BKoFFzDCNBiupP5FCAXNfEL1haJDFo/edit?usp=sharing)._ >Thanks! Looks like we have a good selection by now: >```` >Threadr. Pro 5975WX 2×16 @ 4.3 GHz: 1m14s (10 Tops, later addition) >```` Thanks for making this chart, but it needs a lot of adjustment I think. Everything should probably be normalized to full boost clock. It's not realistic but base clocks are intentionally very low on some models and give falsely low operation counts, when in reality the total number of operations required to calculate a fractal is basically invariant and the operations per second are all that matter. For example, the 14700K which shows the highest efficiency on the spreadsheet has actual passmark scores (with a ridiculous power draw higher than 5000 series threadripper, which explains those temperatures along with the far lower surface area to dissipate heat over): Integer Math 184,923 MOps/Sec Floating Point Math 135,930 MOps/Sec My processor has passmark scores of: Integer Math 362,264 MOps/Sec Floating Point Math 202,373 MOps/Sec Using known measured ops/s numbers from some source like passmark to calculate some kind of efficiency metric might be nicer, especially if high power draw were used instead of processor cost (which is a sunk cost in this case). You've got to assume that people won't buy a TR Pro system (populating 8-channel memory with enough that individual cores won't be starved for whatever your appplication is alone costs as much as a consumer desktop) unless they know why they need it. The flipside of that is that my system draws nearly 90W less power running something like this as a 14700K because of Intel's insane power limits, and can handle running far more at once so it's not active for as long, so by that metric it already paid for itself in power savings. I could encode 2 1080p HEVC files in software on my old system at slightly less than realtime, on this one I can encode 4 at once at 1.5-2x realtime. The power draw for that is higher but the i7-6950x had very crazy spikes in draw and high overall constant draw when running at boost. Probably over 200W. The decrease in time running at that kind of power use makes up for it. It also stalls quite often during the calculation as the UI thread spawns the 64 calculation threads for the next set of pixels, or does some other operation that pretty much halts everything else. My old machine is an i7-6950x (which can normally do this benchmark in about 3:20. You can add that to the list too if you want. It's a 3GHz base clock, 10/20 core. Because of pipelining differences and downclocking behavior I think I found that it's fastest at 14-16 threads rather than the full 20. Finally, since this processor has 32MB of L3 per CCX I did some experiments with tiling... basically the idea is that since the full image here is ~47MB and knowing that UF isn't NUMA-aware (I'm running NPS1 right now anyway so it shouldn't matter) the best case scenario is that all data for a given tile be able to fit in the L3 of a single CCX. That way cache misses on other CCXs will only encounter the cross-CCX infinity fabric latencies and only once rather than a potential cross-memory-controller issue (the 8-channel memory is organized into 2 channels per CCX) + cross-CCX when accessing data when a new pixel is calculated, since the image is probably stored more or less linearly in memory and NPS1 stripes all 8 channels which results in higher overall memory latency but also higher memory bandwidth. There is a difference that's not super high but visible. A baseline normal render took 01:13.00 this time and shows the usual pattern of occasional large dips to barely any cores being used every couple of seconds. A 16 tile render was a second faster at 01:12:01, but only maintained ~92% with more frequent but lesser magnitude utilization dips. I used .bmp output so saving images was cheap but it was still unknown overhead on each tile. It's likely that the cost of initializing and saving a given file is higher than the actual time it takes to write the data to disk since the target NVMe is ~7GB/s sequential write speed. There's also the cost of initializing memory if it isn't just re-used and it'll have to be re-cached by the CPU. Tile render times varied from 3.74s to 5.24s since some hit low complexity areas in the corners. I found that for 4x4 tiling (16 images) each tile was ~11MB which fits nicely in the CCX L3s with some room to spare, while the extra image saving is minimal. This was the fastest case at 1:09.65. CPU utilization was ~97% with dips somewhere between the other two. There was very little variance in render time between tiles since they all contained a quarter of the most complex part. Of course this neglects the time to stitch the images afterwards but it's something to consider and on the processors where this actually matters memory bandwidth is so insane the time this takes is probably negligible. It might be optimal on multi-CCX processors like the newer Xeons and all of AMD's server and workstation lines to proactively split the image into tiles (strips would probably be more efficient), render one at a time, and copy into one place in memory, then stitch before saving. It's easier to program than NUMA support (which I know you don't want to mess with from our emails) and may have a more extreme speed increase when done without saving multiple files. I've been playing with SideFX Houdini more lately so I haven't really messed with UF as much as I planned but thought I'd drop by and mention that.
 
0
reply

Thanks for the insights! You're certainly correct about the table being misleadingly superficial, not least in terms of power consumption being ignored, which is especially a cost factor when living in a country with inflated electricity prices due to a bad energy policy (which I do).

As for the clock speeds, I assumed that the boost can't be maintained on all cores. But this is not well maintained, anyway, so feel free to supplant it.

Thanks for the insights! You're certainly correct about the table being misleadingly superficial, not least in terms of power consumption being ignored, which is especially a cost factor when living in a country with inflated electricity prices due to a bad energy policy (which I do). As for the clock speeds, I assumed that the boost can't be maintained on all cores. But this is not well maintained, anyway, so feel free to supplant it.
 
0
reply
123
2.09k
views
48
replies
12
followers
live preview
Enter at least 10 characters.
WARNING: You mentioned %MENTIONS%, but they cannot see this message and will not be notified
Saving...
Saved
All posts under this topic will be deleted ?
Pending draft ... Click to resume editing
Discard draft