Windows 10 vs. Linux Performance On AMD Threadripper 2990WX

Jhsto · on Aug 14, 2018

"Not only Youtubers test on Windows, virtually the whole tech reviewer community tested on Windows exclusively. And many concluded that Threadripper problems were due to memory problems, and only few blamed Windows for it."

https://www.phoronix.com/forums/forum/phoronix/latest-phoron...

Still, over 2x increase in performance on Linux over Windows seems insane.

rbanffy · on Aug 14, 2018

Windows is the first thing I blame ;-)

In this case, it looks like the thread scheduler is trying to do something clever that doesn't quite work well with this specific CPU. Maybe it's improperly localizing threads to cores or causing more cache evictions than it should. Without seeing performance counters it's hard to guess.

gsnedders · on Aug 14, 2018

My guess would simply be the scheduler doesn't have knowledge of the NUMA architecture of the processor, and cannot therefore efficiently schedule tasks to avoid large memory access costs.

caf · on Aug 14, 2018

The firmware has long made the NUMA topology available to the OS through ACPI tables.

rbanffy · on Aug 14, 2018

Has a NUMA topology like this ever been used with Windows? They could be misinterpreting it. They have quite a reputation for partial buggy implementations of ACPI-related stuff that end up becoming de-facto standards.

snuxoll · on Aug 14, 2018

I think it's less a matter of misinterpreting the topology, and more that the Windows scheduler isn't prepared to handle the somewhat exotic topology; whereas the Linux scheduler seems to be more intelligent about handling the memory-less NUMA nodes.

rbanffy · on Aug 14, 2018

Memory-less NUMA nodes are present on the Xeon Phi (all non-HBM is on that node), so that looks like a good explanation. I don't think Windows can handle it.

mmrezaie · on Aug 14, 2018

One can confirm this which sounds true by profiling the benchmark tools on windows and Linux specifically, remote page loads and such.

ThrowawayR2 · on Aug 14, 2018

Windows Server is designed to deal with NUMA systems and Windows 10, being a consumer OS, is not, so the comparison is outright faulty. The Phoronix reviewer even acknowledges that Windows Server benchmarks are planned for the future.

rbanffy · on Aug 14, 2018

IIRC, they run on the same kernel with different licensing enforcement. And this is not a server CPU, at least as much as Windows 10 is not a server OS, meaning that it's not marketed as such.

belltaco · on Aug 14, 2018

AMD engineers are the first thing that I blame. How did they not notice the bottleneck? Also, I am pretty sure that top engineers at companies like AMD and Nvidia have access to Windows source code so they can test performance and submit fixes/patches. AMD has lesser marketshare so it's right for Microsoft to focus their code on Intel CPUs unless AMD reports issues.

Considering that their terrible GPU drivers got me off AMD GPUs and into the waiting arms of Nvidia, color me not surprised.

buster · on Aug 14, 2018

I don't necessarily think so. Why does Linux work better? Does Linux do some magic stuff? Or is there some bug in the Windows Kernel/Scheduler? If so, then it's not AMD to blame.

kjeetgill · on Aug 14, 2018

Assuming this issue is because Windows hadn't really expected to work with memoryless NUMA nodes I gotta say, I agree with you.

These companies have working business relationships with each other. I'm sure MS is fairly responsive on these sorts of things. AMD should probably have caught this in their internal development and getting Windows on board seems obvious?

ksec · on Aug 14, 2018

There are reasons why it is called the WinTel alliance. Fortunately Linux dominate the server market now, hopefully this will give AMD some breathing space.

api · on Aug 14, 2018

Never ascribe to conspiracy what can equally well be explained by stupidity... or in this case a broken scheduler.

cptskippy · on Aug 14, 2018

Broken or optimized for a different set of processors?

Don't forget that AMD can submit changes directly to the Linux Kernel but has to request or help Microsoft to make changes to Windows.

mmrezaie · on Aug 14, 2018

And also we should keep in mind that in Kernel level improvements, Linux by far is agiler than Windows. Windows process scheduler does not have fairness for ages now. When was CFS was added to Linux? a decade?

cptskippy · on Aug 14, 2018

Linux isn't free of performance regressions with new architecture. There were patches last year to better support Epyc's NUMA.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-n...

It just seems that Linux avoided any regressions this year but Microsoft didn't.

api · on Aug 14, 2018

Oh I'm sure they're optimized for Intel, but that's easily explained by Intel's near-monopoly for the past 10 years. Now that AMD is gaining market share again we'll probably see this change. MS doesn't want to look bad in benchmarks.

cptskippy · on Aug 14, 2018

We saw a similar situation when AMD released the Bulldozer line of CPUs and it took a minute for Microsoft to release a Hot Fix to address the issue.

https://www.anandtech.com/show/5448/the-bulldozer-scheduling...

The reality is that Windows doesn't recognize the new CPU and so it's not using a scheduler optimized for it. Where as AMD could have upstreamed any necessary changes for Linux to fully support it months ago.

ksec · on Aug 14, 2018

I wouldn't call it a conspiracy. But why would M$ optimise for NUMA, or AMD when 90% of their use cases and customer have no use of it?

rbanffy · on Aug 15, 2018

Because 10% of their used base is about the same size as a first-world country.

Nexxxeh · on Aug 15, 2018

There's some additional interesting stuff in the comments. edwaleni points out that the Phoronix Test Suite uses a very outdated 7-Zip (16.02) and the current Windows client (18.05) includes a lot of changes since then.

https://www.phoronix.com/forums/forum/hardware/processors-me...

teddyh · on Aug 14, 2018

What an odd assortment of distributions to test. I would have expected the “Leading Distributions” according to LWN to appear: https://lwn.net/Distributions/#lead

But no Fedora, no Slackware, and, perhaps most perplexing, no Debian.

AdmiralAsshat · on Aug 14, 2018

I can guess why he chose what he did. Ubuntu is still pretty much "the lead" distro for your average Linux Desktop user. Antergos and Tumbleweed are both Rolling distros and likely to be "bleeding edge" as far as any kernel changes, making them more likely to work with the recent AMD hardware.

Clear Linux is the oddball out. Larabel uses it alot in his performance testing articles. The distro is maintained by Intel and they do some specific performance tuning to the kernel to work better with their chips. As a result, it usually edges out in performance on some tests[0]. That's probably why he included it.

[0]https://www.phoronix.com/scan.php?page=article&item=ryzen-li...

SmellyGeekBoy · on Aug 14, 2018

Interesting to see OpenSUSE performing so well. It was one of my first distros way back in the late 90s (when it was just SuSE Linux) but it seems to have fallen by the wayside somewhat and I certainly haven't looked at it for a long time - I'm a full time Fedora user now.

I'm going to have to revisit it soon.

paulcarroty · on Aug 14, 2018

Used openSUSE ~5 months and finally switched to Fedora.

- SUSE was sold 1 month ago and current owners doesn't have IT business - will surely be sold again - many small bugs - Red Had community is much more bigger and active

madspindel · on Aug 14, 2018

> current owners doesn't have IT business

What do you mean by that? I see several tech companies here: https://eqtventures.com/companies/

opless · on Aug 14, 2018

I wouldn't bother. It's still quite horrid.

krylon · on Aug 14, 2018

I run openSUSE on three desktop machines at home (2x Tumbleweed, 1x Leap), and I am mostly very happy with it. Hardware support is relatively good, and the system is fairly stable. Tumbleweed breaks things sometimes, but I would say that is to be expected on a rolling release distro. Thanks to snapshots, I can always roll back to a working state if something does break.

To get a fully usable multimedia-capable desktop, one has to add a few package repositories, but that happens on other distros, too.

Bootvis · on Aug 14, 2018

At least explain why you think it's horrid.

yjftsjthsd-h · on Aug 14, 2018

Personally, I tried it and had issues with package availability. It reminds me of RHEL without EPEL. Granted, there might be an equivalent that I didn't know about.

EDIT/PS: this was particularly unfortunate, because ironically I really liked the package manager itself and the rest of the system was pretty decent.

cyphar · on Aug 14, 2018

Have you looked at Packman (one of the more well-known unofficial repos), or OBS (https://build.opensuse.org/)?

diffeomorphism · on Aug 14, 2018

I thought it has quite a bit of software but splits it into more repositories (and handles many repositories better):

https://software.opensuse.org/search?utf8=%E2%9C%93&q=steam

the_grue · on Aug 14, 2018

So WHY is Windows slower? What effect can OS conceivably have on CPU benchmarks? Off the top of my head I can think of preemptive multitasking and memory allocation implementations (the latter shouldn't influence pure CPU benchmarks much, but it can influence 7-Zip results), any better guesses?

swebs · on Aug 14, 2018

There was a very good post about that a few years ago. It probably doesn't explain this specific case, but it is a fascinating read nonetheless.

>I Contribute to the Windows Kernel. We Are Slower Than Other Operating Systems. Here Is Why

http://blog.zorinaq.com/i-contribute-to-the-windows-kernel-w...

Someguywhatever · on Aug 14, 2018

Theres also the burden on Microsoft that nothing can break. Meaning: anything that used to work before (in an earlier incarnation of Windows) must continue to work in later versions. I don't think Linux has this burden. They are more free to break stuff and so more free to innovate.

the_grue · on Aug 14, 2018

I'm quite sure Linux has the same burden. Here is what Linus Torvalds has to say: https://lkml.org/lkml/2012/12/23/75

coinerone · on Aug 14, 2018

In Linux Kernel development there is one realy important Rule: Don't break Userspace!

Its like the Windows burden you mentioned, but much more reasonable imho.

the_grue · on Aug 14, 2018

Thanks, that's really interesting, even though the author toned much of his initial satire down in a later update.

stefs · on Aug 14, 2018

curiously, nowadays (i.e. the article/original comment is from 2013) some google employees voice similar criticism about their own engineering culture: building new things are rewarded, maintaining an old thing is not.

the_grue · on Aug 14, 2018

I guess it's an industry-wide problem. It would be illuminating to learn of any counter examples, where a big software company manages NOT to fall into this trap.

volkisch · on Aug 14, 2018

Just a hunch, but maybe you could try searching outside of SV.

rbanffy · on Aug 15, 2018

This deserves its own post.

AtomicOrbital · on Aug 15, 2018

it had one https://news.ycombinator.com/item?id=5688151

magicalhippo · on Aug 14, 2018

Scheduling threads is probably a big one I imagine. If Windows schedules two threads from the same process working on a shared dataset across two different NUMA nodes then memory latency can shoot up, from what I understand.

I've also noticed that Windows is very fond of passing hard-working threads around all the cores in turn, rather than letting them hog certain cores. Pretty sure that doesn't improve cache behavior etc.

nolok · on Aug 14, 2018

You're running dozens of threads over 16+ cores who don't all have the same properties to access and share memory (speed, invalidation, permission, ...).

The OS is in charge of deciding who runs where, and who shares a core with who.

So ... A lot.

ygra · on Aug 14, 2018

I wonder whether Windows Server SKUs do differently, perhaps due to different scheduler optimization choices.

dogma1138 · on Aug 14, 2018

There is no difference, in fact they might be slower as they are on older kernels.

stinos · on Aug 14, 2018

Combination of things (which probably also explains why some distros are good at one becnhmark but less good at others)? Scheduler, no architecture-specific kernel code etc.

Plus it's never just the OS: there's always platform-specific source code and the compiler.. Non-optimal optimization options for the latter and you're screwed as well.

rasz · on Aug 15, 2018

It might be blindly applying all the Intel brokeness (spectre/meltdown) patches to AMD cpu.

sas41 · on Aug 15, 2018

I had to register to say this, but for anyone who is looking for a Tech Youtuber who does Linux performance and compatibility testing, and has a good technical knowledge in CS, I cannot recommend "Level 1 Techs" enough, Their content is very high quality, their host, Wendell, has in-depth knowledge in many aspects of CS, hardware, software, etc.

Their website[0] has links to their various Youtube[1] channels and their forum, which has a really neat community.

[0] - https://level1techs.com/

[1] - https://www.youtube.com/channel/UC4w1YQAJMWOz4qtxinq55LQ/

sologoub · on Aug 14, 2018

Interesting results. Anyone know why the difference between the different Linux distros? (Ubuntu in particular)

BurningCycles · on Aug 14, 2018

I believe the main impact comes from Ubuntu using Linux 4.15 and GCC 7.3, while the other distros tested used Linux 4.17 and GCC 8.1+.

DblPlusUngood · on Aug 14, 2018

Different Linux kernel config options, probably. Different options can drastically affect performance.

RayDonnelly · on Aug 14, 2018

Compiler flags mostly.

yjftsjthsd-h · on Aug 14, 2018

Also different compiler versions; I would expect gcc 7 vs 8.1 vs 8.2 to make a difference.

pimeys · on Aug 14, 2018

I'm actually considering an update to my developer workstation. The 2950X with 16 cores would be in a nice price point, with 64 GB of RAM and NixOS with a modern kernel to drive that beast.

The only thing that bothers me is the energy usage. For 180W it costs quite a lot to run this, especially in Germany where electricity is not cheap.

pantalaimon · on Aug 14, 2018

It only consumes 180W^w 319W at full load, when idling the whole system draws 62W

https://www.computerbase.de/2018-08/amd-ryzen-threadripper-2...

hajile · on Aug 14, 2018

If you actually use all the threads, then the power usage is fairly efficient. You wouldn't say a 90w processor used a huge amount of energy. This is two processors on the same chip with Infinity Fabric tying them together. For what you get, I think it's pretty decent power consumption.

Retric · on Aug 14, 2018

Yea, Germany electricity is ridiculous.

At a more reasonable 10c per kWh that’s 1.8 cents per hour and 157$ a year assuming 100% useage 24/7 vs $1799 for the CPU. Hardly going to break the bank, real world costs vs another similar CPU and you might hit 100$ worth of electricity over a 5 year lifespan.

anc84 · on Aug 14, 2018

In Germany we care about sustainable life on this planet.

pimeys · on Aug 14, 2018

And that's why I'm considering the energy usage. If it costs too much, somebody gives me a message to use less energy. For example I was able to build my NAS to use 20W of energy just because of the prices being so high.

Hopefully the investments to solar and wind will pay themselves at some point and the price of energy plummet.

Retric · on Aug 14, 2018

It goes well beyond renewables. Even in Germany wind and solar are just not that expensive, it's a combination of subsides and failing to put aside money for decommissioning nuclear power plants.

pimeys · on Aug 14, 2018

I pay about 0.38€ per kWh and thinking of 8 hours per day usage, compiling heavy Rust and C++ projects, gaming at night. I could get my electricity prices down to 0.24€ per kWh if I'd spend some time switching providers every year.

The 1950X (16-core) is now about 760€ here in Europe, and the upcoming 2950X (16-core too) seems to go to the same or a bit lower price point. Now the price of electricity is already a much larger percentage of the purchase price, but the convenience of cutting time out from the compilation might be worth it.

Of course with so many cores, it is a good idea to get 64 GB of RAM, which adds to the energy bill. A template-heavy C++ project will eat RAM for breakfast, `make -j8` and 16 GB will definitely swap.

Retric · on Aug 14, 2018

Spending 4x as much on electricity for well under 1/3 as long means it's still well under 200$ per year before considering what your other CPU is going to cost you.

IMO the base price is a much larger consideration as again you don't use 180W in non 100% use situations which is going to be 90+% of the time for personal coding use.

Mashimo · on Aug 14, 2018

Why is ffmpeg better on Windows? (With the exception of clearLinux)

bufferoverflow · on Aug 14, 2018

Not only that, why is there such a huge difference between ClearLinux and Ubuntu/OpenSUSE?

Most video encoding, I assume, happens on linux machines these days. If switching a distro can result in such drastic improvements, it can result in massive savings for the big players, like Youtube and Twitch.

Eochei5h · on Aug 14, 2018

If it's related to the thread scheduler yes. If it's related to compiler flags then the big sites probably have their own optimized builds of the encoders already.

snuxoll · on Aug 14, 2018

Hard question to answer without knowing what compiler was used to build the binary, what version of the code was built, what build settings were used, etc. Video and audio encoding tools are the perfect example of software that will have wildly different performance characteristics depending on compiler optimizations; making comparisons between different operating systems a really unscientific process unless you use the same toolchain across all of them.

hhorceface · on Aug 14, 2018

Are you reading the graph wrong? It says "Seconds, Less is Better".

lnx01 · on Aug 14, 2018

Yes. Windows 10 took 6.27 seconds, with the exception of Clearlinux all the Linuxes took 8+ seconds.

digi_owl · on Aug 14, 2018

I seem to recall some similar results back when AMD first released their APUs with the shared FPU setup, and Windows barfed because of its scheduler. Everyone still blamed AMD for making a crap CPU design...

MBCook · on Aug 14, 2018

Strangely the article is basically useless on my iPhone. All the graphs go off the side of the screen so unless you rotate into landscape you can’t tell any difference in performance.

Very odd error for a benchmarking site.

kart23 · on Aug 14, 2018

Really? It looks great on my s9.

MBCook · on Aug 14, 2018

It looks fine other than the graphs being cut off.

Just some quirk of Safari vs Chrome I bet, but I’m surprised they hadn’t noticed/tested that case.

supernovae · on Aug 14, 2018

We had this debate with all Ryzen CPU's already. This quirk of windows isn't particular to threadripper by any means, we debated this with the 1700/1800 cpus

jrs95 · on Aug 14, 2018

I wonder what the performance would look like if someone got macOS running on it...personally I’m not looking forward to another expensive Xeon based Mac. Sure, it’s not overpriced based on the hardware they’re using. But there’s much cheaper options to get the same performance, and a small fraction of Mac Pro/iMac Pro users even need ECC RAM. The iMac design also has some inherent thermal issues which make it not the best choice for a high performance desktop system to begin with.

So, all that considered, I’ll probably be building a Hacktintosh even if I have to pay the premium for an i9 instead of something Ryzen based. Although I believe people had gotten first gen Ryzen chips working, so it seems like a TR Hackintosh should be possible. I just can’t sit around waiting multiple minutes every time I go to compile a decently sized mobile app (iOS dev being the only reason I have a Mac to begin with)

lostmsu · on Aug 14, 2018

Sadly, no compiler benchmarks.

bitL · on Aug 14, 2018

How about testing on Windows Server? Still the same issues?

vkazanov · on Aug 14, 2018

TL;DR

It seems that Windows doesn't like NUMA architectures all that much... Unlike Linux where the new Threadripper rocks.

hs86 · on Aug 14, 2018

Did this ever change for macOS? When they still sold dual socket Mac Pros, macOS was ~30% slower is some benchmarks compared to Windows on a similar machine due to the lack of NUMA support.

After that, the Mac Pro was `innovated` and since then Macs only had uniform memory access and I wonder if their developers ever solved this issue.

After the long neglect of their file system this might be another part of Darwin that is behind their competition for the next couple of years.

ksec · on Aug 14, 2018

I think that would be the reason why Apple haven’t choose AMD yet.

On the hand it has taken years if not over a decades of HPC and Server work that Linux had this fully tested and tuned. I wonder if Windows or Mac will ever be on that level.

Any one knows how good is NUMA on FreeBSD, OpenBSD and DF BSD?

hs86 · on Aug 14, 2018

For the state of NUMA on FreeBSD it seems like drewg123 (and HN comments in general) is a good source for: https://hn.algolia.com/?query=freebsd%20numa&sort=byPopulari...

ksec · on Aug 14, 2018

Looks like BSD's NUMA current situation aren't very good at all.

RantyDave · on Aug 14, 2018

The threadrippers are still only single socket. Honestly, with the way processors are going right now, I think multi-socket may become a thing of the past.

hs86 · on Aug 14, 2018

They are single socket but the arrangement of their CCX means that we have NUMA even on a single socket system. AnandTech has some info: https://www.anandtech.com/show/11697/the-amd-ryzen-threadrip...

This results in a gaming mode which disables some cores to achieve higher fps in not NUMA-aware games. Afair, this is even worse with the new 32 core CPUs.

jjm · on Aug 14, 2018

Multi cores with Infiniti fabric means actual multi CPUs. NUMA is real and no you don’t need multiple sockets for multiple CPUs.

artellectual · on Aug 14, 2018

I think HN hammered the site. It's down now.

Mashimo · on Aug 14, 2018

Up for me, also reddit linked to it too.

dingo_bat · on Aug 14, 2018

Looks like Microsoft has been wasting time on stupid UI changes and not enough on multi-core performance.

wemdyjreichert · on Aug 14, 2018

Forget the 32-core processor; these guys installed arch? Hardcore.

mizzack · on Aug 14, 2018

I know you're joking, but they used Antergos which is a snap to install.

wemdyjreichert · on Aug 14, 2018

Excepting laptops w/ Nvidia Optimus stuff & power management. Easier than stock, but still arch. Thank goodness for the arch wiki.

friesen · on Aug 15, 2018

TBF, Optimus and power stuff aren't too much easier on a distro like Ubuntu (granted, they have come a long way). I usually end up having to look it up in the Arch wiki anyhow. Thank god for that site.

rataata_jr · on Aug 14, 2018

I don't have a windows machine. But I think Threadripper must have been focused for Windows since most desktop users are Windows users after all.

mizzack · on Aug 14, 2018

The CPU in question--the 2990WX--has the added "W" for "workstation", whereas the lower core parts (which don't seem to be affected by the Windows scheduling issue) do not have the "W" denotion.

Most TR users I know are running Linux or hypervisors.

wemdyjreichert · on Aug 14, 2018

Definitely. Not a gaming box; more for pros.