Hacker News new | past | comments | ask | show | jobs | submit login
Windows 10 vs. Linux Performance On AMD Threadripper 2990WX (phoronix.com)
234 points by vkazanov on Aug 14, 2018 | hide | past | favorite | 98 comments



"Not only Youtubers test on Windows, virtually the whole tech reviewer community tested on Windows exclusively. And many concluded that Threadripper problems were due to memory problems, and only few blamed Windows for it."

https://www.phoronix.com/forums/forum/phoronix/latest-phoron...

Still, over 2x increase in performance on Linux over Windows seems insane.


Windows is the first thing I blame ;-)

In this case, it looks like the thread scheduler is trying to do something clever that doesn't quite work well with this specific CPU. Maybe it's improperly localizing threads to cores or causing more cache evictions than it should. Without seeing performance counters it's hard to guess.


My guess would simply be the scheduler doesn't have knowledge of the NUMA architecture of the processor, and cannot therefore efficiently schedule tasks to avoid large memory access costs.


The firmware has long made the NUMA topology available to the OS through ACPI tables.


Has a NUMA topology like this ever been used with Windows? They could be misinterpreting it. They have quite a reputation for partial buggy implementations of ACPI-related stuff that end up becoming de-facto standards.


I think it's less a matter of misinterpreting the topology, and more that the Windows scheduler isn't prepared to handle the somewhat exotic topology; whereas the Linux scheduler seems to be more intelligent about handling the memory-less NUMA nodes.


Memory-less NUMA nodes are present on the Xeon Phi (all non-HBM is on that node), so that looks like a good explanation. I don't think Windows can handle it.


One can confirm this which sounds true by profiling the benchmark tools on windows and Linux specifically, remote page loads and such.


Windows Server is designed to deal with NUMA systems and Windows 10, being a consumer OS, is not, so the comparison is outright faulty. The Phoronix reviewer even acknowledges that Windows Server benchmarks are planned for the future.


IIRC, they run on the same kernel with different licensing enforcement. And this is not a server CPU, at least as much as Windows 10 is not a server OS, meaning that it's not marketed as such.


AMD engineers are the first thing that I blame. How did they not notice the bottleneck? Also, I am pretty sure that top engineers at companies like AMD and Nvidia have access to Windows source code so they can test performance and submit fixes/patches. AMD has lesser marketshare so it's right for Microsoft to focus their code on Intel CPUs unless AMD reports issues.

Considering that their terrible GPU drivers got me off AMD GPUs and into the waiting arms of Nvidia, color me not surprised.


I don't necessarily think so. Why does Linux work better? Does Linux do some magic stuff? Or is there some bug in the Windows Kernel/Scheduler? If so, then it's not AMD to blame.


Assuming this issue is because Windows hadn't really expected to work with memoryless NUMA nodes I gotta say, I agree with you.

These companies have working business relationships with each other. I'm sure MS is fairly responsive on these sorts of things. AMD should probably have caught this in their internal development and getting Windows on board seems obvious?


There are reasons why it is called the WinTel alliance. Fortunately Linux dominate the server market now, hopefully this will give AMD some breathing space.


Never ascribe to conspiracy what can equally well be explained by stupidity... or in this case a broken scheduler.


Broken or optimized for a different set of processors?

Don't forget that AMD can submit changes directly to the Linux Kernel but has to request or help Microsoft to make changes to Windows.


And also we should keep in mind that in Kernel level improvements, Linux by far is agiler than Windows. Windows process scheduler does not have fairness for ages now. When was CFS was added to Linux? a decade?


Linux isn't free of performance regressions with new architecture. There were patches last year to better support Epyc's NUMA.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-n...

It just seems that Linux avoided any regressions this year but Microsoft didn't.


Oh I'm sure they're optimized for Intel, but that's easily explained by Intel's near-monopoly for the past 10 years. Now that AMD is gaining market share again we'll probably see this change. MS doesn't want to look bad in benchmarks.


We saw a similar situation when AMD released the Bulldozer line of CPUs and it took a minute for Microsoft to release a Hot Fix to address the issue.

https://www.anandtech.com/show/5448/the-bulldozer-scheduling...

The reality is that Windows doesn't recognize the new CPU and so it's not using a scheduler optimized for it. Where as AMD could have upstreamed any necessary changes for Linux to fully support it months ago.


I wouldn't call it a conspiracy. But why would M$ optimise for NUMA, or AMD when 90% of their use cases and customer have no use of it?


Because 10% of their used base is about the same size as a first-world country.


There's some additional interesting stuff in the comments. edwaleni points out that the Phoronix Test Suite uses a very outdated 7-Zip (16.02) and the current Windows client (18.05) includes a lot of changes since then.

https://www.phoronix.com/forums/forum/hardware/processors-me...


What an odd assortment of distributions to test. I would have expected the “Leading Distributions” according to LWN to appear: https://lwn.net/Distributions/#lead

But no Fedora, no Slackware, and, perhaps most perplexing, no Debian.


I can guess why he chose what he did. Ubuntu is still pretty much "the lead" distro for your average Linux Desktop user. Antergos and Tumbleweed are both Rolling distros and likely to be "bleeding edge" as far as any kernel changes, making them more likely to work with the recent AMD hardware.

Clear Linux is the oddball out. Larabel uses it alot in his performance testing articles. The distro is maintained by Intel and they do some specific performance tuning to the kernel to work better with their chips. As a result, it usually edges out in performance on some tests[0]. That's probably why he included it.

[0]https://www.phoronix.com/scan.php?page=article&item=ryzen-li...


Interesting to see OpenSUSE performing so well. It was one of my first distros way back in the late 90s (when it was just SuSE Linux) but it seems to have fallen by the wayside somewhat and I certainly haven't looked at it for a long time - I'm a full time Fedora user now.

I'm going to have to revisit it soon.


Used openSUSE ~5 months and finally switched to Fedora.

- SUSE was sold 1 month ago and current owners doesn't have IT business - will surely be sold again - many small bugs - Red Had community is much more bigger and active


> current owners doesn't have IT business

What do you mean by that? I see several tech companies here: https://eqtventures.com/companies/


I wouldn't bother. It's still quite horrid.


I run openSUSE on three desktop machines at home (2x Tumbleweed, 1x Leap), and I am mostly very happy with it. Hardware support is relatively good, and the system is fairly stable. Tumbleweed breaks things sometimes, but I would say that is to be expected on a rolling release distro. Thanks to snapshots, I can always roll back to a working state if something does break.

To get a fully usable multimedia-capable desktop, one has to add a few package repositories, but that happens on other distros, too.


At least explain why you think it's horrid.


Personally, I tried it and had issues with package availability. It reminds me of RHEL without EPEL. Granted, there might be an equivalent that I didn't know about.

EDIT/PS: this was particularly unfortunate, because ironically I really liked the package manager itself and the rest of the system was pretty decent.


Have you looked at Packman (one of the more well-known unofficial repos), or OBS (https://build.opensuse.org/)?


I thought it has quite a bit of software but splits it into more repositories (and handles many repositories better):

https://software.opensuse.org/search?utf8=%E2%9C%93&q=steam


So WHY is Windows slower? What effect can OS conceivably have on CPU benchmarks? Off the top of my head I can think of preemptive multitasking and memory allocation implementations (the latter shouldn't influence pure CPU benchmarks much, but it can influence 7-Zip results), any better guesses?


There was a very good post about that a few years ago. It probably doesn't explain this specific case, but it is a fascinating read nonetheless.

>I Contribute to the Windows Kernel. We Are Slower Than Other Operating Systems. Here Is Why

http://blog.zorinaq.com/i-contribute-to-the-windows-kernel-w...


Theres also the burden on Microsoft that nothing can break. Meaning: anything that used to work before (in an earlier incarnation of Windows) must continue to work in later versions. I don't think Linux has this burden. They are more free to break stuff and so more free to innovate.


I'm quite sure Linux has the same burden. Here is what Linus Torvalds has to say: https://lkml.org/lkml/2012/12/23/75


In Linux Kernel development there is one realy important Rule: Don't break Userspace!

Its like the Windows burden you mentioned, but much more reasonable imho.


Thanks, that's really interesting, even though the author toned much of his initial satire down in a later update.


curiously, nowadays (i.e. the article/original comment is from 2013) some google employees voice similar criticism about their own engineering culture: building new things are rewarded, maintaining an old thing is not.


I guess it's an industry-wide problem. It would be illuminating to learn of any counter examples, where a big software company manages NOT to fall into this trap.


Just a hunch, but maybe you could try searching outside of SV.


This deserves its own post.



Scheduling threads is probably a big one I imagine. If Windows schedules two threads from the same process working on a shared dataset across two different NUMA nodes then memory latency can shoot up, from what I understand.

I've also noticed that Windows is very fond of passing hard-working threads around all the cores in turn, rather than letting them hog certain cores. Pretty sure that doesn't improve cache behavior etc.


You're running dozens of threads over 16+ cores who don't all have the same properties to access and share memory (speed, invalidation, permission, ...).

The OS is in charge of deciding who runs where, and who shares a core with who.

So ... A lot.


I wonder whether Windows Server SKUs do differently, perhaps due to different scheduler optimization choices.


There is no difference, in fact they might be slower as they are on older kernels.


Combination of things (which probably also explains why some distros are good at one becnhmark but less good at others)? Scheduler, no architecture-specific kernel code etc.

Plus it's never just the OS: there's always platform-specific source code and the compiler.. Non-optimal optimization options for the latter and you're screwed as well.


It might be blindly applying all the Intel brokeness (spectre/meltdown) patches to AMD cpu.


I had to register to say this, but for anyone who is looking for a Tech Youtuber who does Linux performance and compatibility testing, and has a good technical knowledge in CS, I cannot recommend "Level 1 Techs" enough, Their content is very high quality, their host, Wendell, has in-depth knowledge in many aspects of CS, hardware, software, etc.

Their website[0] has links to their various Youtube[1] channels and their forum, which has a really neat community.

[0] - https://level1techs.com/

[1] - https://www.youtube.com/channel/UC4w1YQAJMWOz4qtxinq55LQ/


Interesting results. Anyone know why the difference between the different Linux distros? (Ubuntu in particular)


I believe the main impact comes from Ubuntu using Linux 4.15 and GCC 7.3, while the other distros tested used Linux 4.17 and GCC 8.1+.


Different Linux kernel config options, probably. Different options can drastically affect performance.


Compiler flags mostly.


Also different compiler versions; I would expect gcc 7 vs 8.1 vs 8.2 to make a difference.


I'm actually considering an update to my developer workstation. The 2950X with 16 cores would be in a nice price point, with 64 GB of RAM and NixOS with a modern kernel to drive that beast.

The only thing that bothers me is the energy usage. For 180W it costs quite a lot to run this, especially in Germany where electricity is not cheap.


It only consumes 180W^w 319W at full load, when idling the whole system draws 62W

https://www.computerbase.de/2018-08/amd-ryzen-threadripper-2...


If you actually use all the threads, then the power usage is fairly efficient. You wouldn't say a 90w processor used a huge amount of energy. This is two processors on the same chip with Infinity Fabric tying them together. For what you get, I think it's pretty decent power consumption.


Yea, Germany electricity is ridiculous.

At a more reasonable 10c per kWh that’s 1.8 cents per hour and 157$ a year assuming 100% useage 24/7 vs $1799 for the CPU. Hardly going to break the bank, real world costs vs another similar CPU and you might hit 100$ worth of electricity over a 5 year lifespan.


In Germany we care about sustainable life on this planet.


And that's why I'm considering the energy usage. If it costs too much, somebody gives me a message to use less energy. For example I was able to build my NAS to use 20W of energy just because of the prices being so high.

Hopefully the investments to solar and wind will pay themselves at some point and the price of energy plummet.


It goes well beyond renewables. Even in Germany wind and solar are just not that expensive, it's a combination of subsides and failing to put aside money for decommissioning nuclear power plants.


I pay about 0.38€ per kWh and thinking of 8 hours per day usage, compiling heavy Rust and C++ projects, gaming at night. I could get my electricity prices down to 0.24€ per kWh if I'd spend some time switching providers every year.

The 1950X (16-core) is now about 760€ here in Europe, and the upcoming 2950X (16-core too) seems to go to the same or a bit lower price point. Now the price of electricity is already a much larger percentage of the purchase price, but the convenience of cutting time out from the compilation might be worth it.

Of course with so many cores, it is a good idea to get 64 GB of RAM, which adds to the energy bill. A template-heavy C++ project will eat RAM for breakfast, `make -j8` and 16 GB will definitely swap.


Spending 4x as much on electricity for well under 1/3 as long means it's still well under 200$ per year before considering what your other CPU is going to cost you.

IMO the base price is a much larger consideration as again you don't use 180W in non 100% use situations which is going to be 90+% of the time for personal coding use.


Why is ffmpeg better on Windows? (With the exception of clearLinux)


Not only that, why is there such a huge difference between ClearLinux and Ubuntu/OpenSUSE?

Most video encoding, I assume, happens on linux machines these days. If switching a distro can result in such drastic improvements, it can result in massive savings for the big players, like Youtube and Twitch.


If it's related to the thread scheduler yes. If it's related to compiler flags then the big sites probably have their own optimized builds of the encoders already.


Hard question to answer without knowing what compiler was used to build the binary, what version of the code was built, what build settings were used, etc. Video and audio encoding tools are the perfect example of software that will have wildly different performance characteristics depending on compiler optimizations; making comparisons between different operating systems a really unscientific process unless you use the same toolchain across all of them.


Are you reading the graph wrong? It says "Seconds, Less is Better".


Yes. Windows 10 took 6.27 seconds, with the exception of Clearlinux all the Linuxes took 8+ seconds.


I seem to recall some similar results back when AMD first released their APUs with the shared FPU setup, and Windows barfed because of its scheduler. Everyone still blamed AMD for making a crap CPU design...


Strangely the article is basically useless on my iPhone. All the graphs go off the side of the screen so unless you rotate into landscape you can’t tell any difference in performance.

Very odd error for a benchmarking site.


Really? It looks great on my s9.


It looks fine other than the graphs being cut off.

Just some quirk of Safari vs Chrome I bet, but I’m surprised they hadn’t noticed/tested that case.


We had this debate with all Ryzen CPU's already. This quirk of windows isn't particular to threadripper by any means, we debated this with the 1700/1800 cpus


I wonder what the performance would look like if someone got macOS running on it...personally I’m not looking forward to another expensive Xeon based Mac. Sure, it’s not overpriced based on the hardware they’re using. But there’s much cheaper options to get the same performance, and a small fraction of Mac Pro/iMac Pro users even need ECC RAM. The iMac design also has some inherent thermal issues which make it not the best choice for a high performance desktop system to begin with.

So, all that considered, I’ll probably be building a Hacktintosh even if I have to pay the premium for an i9 instead of something Ryzen based. Although I believe people had gotten first gen Ryzen chips working, so it seems like a TR Hackintosh should be possible. I just can’t sit around waiting multiple minutes every time I go to compile a decently sized mobile app (iOS dev being the only reason I have a Mac to begin with)


Sadly, no compiler benchmarks.


How about testing on Windows Server? Still the same issues?


TL;DR

It seems that Windows doesn't like NUMA architectures all that much... Unlike Linux where the new Threadripper rocks.


Did this ever change for macOS? When they still sold dual socket Mac Pros, macOS was ~30% slower is some benchmarks compared to Windows on a similar machine due to the lack of NUMA support.

After that, the Mac Pro was `innovated` and since then Macs only had uniform memory access and I wonder if their developers ever solved this issue.

After the long neglect of their file system this might be another part of Darwin that is behind their competition for the next couple of years.


I think that would be the reason why Apple haven’t choose AMD yet.

On the hand it has taken years if not over a decades of HPC and Server work that Linux had this fully tested and tuned. I wonder if Windows or Mac will ever be on that level.

Any one knows how good is NUMA on FreeBSD, OpenBSD and DF BSD?


For the state of NUMA on FreeBSD it seems like drewg123 (and HN comments in general) is a good source for: https://hn.algolia.com/?query=freebsd%20numa&sort=byPopulari...


Looks like BSD's NUMA current situation aren't very good at all.


The threadrippers are still only single socket. Honestly, with the way processors are going right now, I think multi-socket may become a thing of the past.


They are single socket but the arrangement of their CCX means that we have NUMA even on a single socket system. AnandTech has some info: https://www.anandtech.com/show/11697/the-amd-ryzen-threadrip...

This results in a gaming mode which disables some cores to achieve higher fps in not NUMA-aware games. Afair, this is even worse with the new 32 core CPUs.


Multi cores with Infiniti fabric means actual multi CPUs. NUMA is real and no you don’t need multiple sockets for multiple CPUs.


I think HN hammered the site. It's down now.


Up for me, also reddit linked to it too.


Looks like Microsoft has been wasting time on stupid UI changes and not enough on multi-core performance.


Forget the 32-core processor; these guys installed arch? Hardcore.


I know you're joking, but they used Antergos which is a snap to install.


Excepting laptops w/ Nvidia Optimus stuff & power management. Easier than stock, but still arch. Thank goodness for the arch wiki.


TBF, Optimus and power stuff aren't too much easier on a distro like Ubuntu (granted, they have come a long way). I usually end up having to look it up in the Arch wiki anyhow. Thank god for that site.


I don't have a windows machine. But I think Threadripper must have been focused for Windows since most desktop users are Windows users after all.


The CPU in question--the 2990WX--has the added "W" for "workstation", whereas the lower core parts (which don't seem to be affected by the Windows scheduling issue) do not have the "W" denotion.

Most TR users I know are running Linux or hypervisors.


Definitely. Not a gaming box; more for pros.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: