Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Marvell Cranks Up Cores and Clocks with “Triton” ThunderX3 (nextplatform.com)
65 points by rbanffy on March 18, 2020 | hide | past | favorite | 28 comments


This announcement is lacking a lot of specifics, which means it's worse than N1 / Ampere and more of the competition.

TNP always annoys me by its wordiness and lack of tables and numbers. It's incredibly low signal to noise compared to Ars and Anandtech.


Here's the Anandtech version: https://www.anandtech.com/show/15621/marvell-announces-thund...

> If we use the TX2 figures we have at hand, this would mean the new chip would land slightly ahead of Neoverse-N1 systems such as the Graviton2, and match more aggressively clocked designs such as the Ampere Altra.


Without HBM or more memory channels, the top SKUs will be rather hard to feed considering the (claimed) at least ~3-5x increase in instructions/s/socket while only increasing memory bandwidth by ~20%.


Maybe they mitigated that by increasing cache sizes, but there is no information available about that yet. I would expect that, since they support 4 threads per core and that may drive up cache usage (even though one thread may be able to be useful while another is waiting for a cache miss to clear).


Has there been any mention of MSRPs on these new arm chips? As a home server enthusiast, I'd love to have a serious arm driven system to play with...


The ThunderX2 model isn't in stock, but the Ampere model is.

Base config with Ampere eMAG 8180 32 core 2.8GHz 3MB L3 is about $3,000

https://store.avantek.co.uk/ampere-emag-64bit-arm-workstatio...


(That's not the new generations, which are not released yet, and there's no price info yet)


Has the ThunderX family shipped in "mere mortal" hardware, or is it all supercomputers and custom FANG servers and the like? I seem to remember that when ThunderX first launched, there was some noise about "ARM servers" being a market that exists, and the company I worked for at the time was looking into using it for a new product but gave up on it for some reason.



I think I've used one of these Gigabyte rack servers:

https://www.gigabyte.com/us/ARM-Server

I'm not sure offhand what the pricing is, but my guess would be less than $10,000, depending on the configuration.


So much more than an equivalent x86 box.


There aren't many 384-thread x86 boxes around, certainly none below US$ 10K


384 threads at which SMT level again?

What's the clock speed, and what's the IPC compared to say a 64 core 128 thread Threadripper? You'll find it won't compare favourably for 99.9% of workloads.


240W, 96 quite beefy ARM cores, I'm not sure about SMT. All that in 1U. The density is becoming mad.


That sounds like a shrink of a single ThunderX2 socket. I'd say, expect two of this in a 1U system, or maybe rather 2 separate 2S boards in a 2U case with shared power/cooling.


It’s SMT4, so up to 384 threads per socket.


We'll need to update htop to use the Unicode 2x2 mosaics so we can cram 2 vCores per line and use foreground and background colors cleverly...


Geeeeee ! They said in the article that the previous gen chip was, but didn't explicitely confirm for the new one, so I didn't dare to hope. Thanks for the confirmation !


> The Triton chip will have eight memory controllers supporting memory running at 3.2 GHz, which is the same number of controllers in the Vulcan chip, which maxxed out at 2.67 GHz memory speeds. That’s a 20 percent increase in memory bandwidth, and the question is how that will balance out against the high core counts in some of the Triton SKUs.

96 cores, seems like it would be tricky to keep them busy with modest memory throughput increases.


For certain workloads, having 4SMT will keep the cores busy while they're waiting on main memory.


Do they really mean memory running at 3.2GHz? What memory would be running that fast? Isn't this HBM2E and it's 3.2Gbps bandwidth?


DDR4 easily runs at 3200 MHz (DDR) though I don't know if that's common yet for ECC RDIMMS.


Doh! yes, thanks, that makes a LOT more sense!


That's a comparable configuration to an AMD EPYC "Rome" with 48c/96t and 8 memory controllers.


384 Thread or vCPU in a single socket or 768 vCPU per 1U in Dual Socket. That is ~$4K per month revenue for Cloud Vendor.

Unfortunately DRAM, NAND and Bandwidth unit cost hasn't drop a bit compared to vCPU unit cost. Along with baseline price of Rent and Electricity which means Cloud Vendors unit cost aren't that much better.

I expect this would only come to 10 to 20% price drop.


If they continue what they were doing for TX2 [^1] , you might get double socket in 1U.

Meaning 192 cores per U.

[1]: https://www.gigabyte.com/ARM-Server/R181-T90-rev-100


The chip has a CCPI (Cache Coherent Processor Interface) with 24 lanes @ 25Gb/s for 2-socket NUMA interconnect.

This technology came from Cavium (who bought the Raza, then NetLogic, then Broadcom, and very briefly Avago XLS/XLR/XLP/Vulcan designs from Avago before being bought themselves by Marvell). I don't recall it being in their MIPS64-based Octeon II or III designs, and I thought the XLP had some minimal-glue NUMA support, but I can't find anything with a quick search.

Minimal info at Wikichip: https://en.wikichip.org/wiki/cavium/ccpi


Reminds me of this article from 2011 blowing my mind... 12,500 cores for Pixar's Cars 2

https://www.cnet.com/news/new-technology-revs-up-pixars-cars...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: