Hacker News new | past | comments | ask | show | jobs | submit login
The AMD Radeon Graphics Driver Makes Up Roughly 10.5% of the Linux Kernel (phoronix.com)
707 points by akvadrako on Oct 11, 2020 | hide | past | favorite | 462 comments



My two cents as a kernel developer, the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

However, having a GPU driver not just be open source but in the upstream Linux kernel is a gigantic deal. Kernel development takes a long time, we have millions of lines in the amdgpu driver, and if every one of those dealt with the lengthy review process it would have never made it in the tree.

So it's a necessary evil. I do wish they would clean it up though, I sent a fix to amdgpu once that was the same thing for 3 different files that were largely duplicated. That kind of thing wouldn't fly anywhere else in the kernel


I would also mention that gpus are a GIANT abstraction, and since they are rev'd faster than arguably any hardware in a sytem, there are abstractions layered on top of that for the families and models of gpus too.

Another way of looking at it is -- I started playing with openwrt for a relatively small router, with 5 ports plus wifi.

I was amazed at not only the amount of openwrt code required to support the different router families and the different router models, but at the sheer amount of stuff turned on by default in the kernel just in case I might need to load a module for some obscure feature or package. I assume the same goes for a gpu driver both at the source level and in the kernel.


Yeah AMD/NV typically do 2-3 chips per year. That complexity adds up pretty quickly when backwards/forwards compatibility is fairly strict and the inputs/abstractions not particularly well defined or behaved.


I'm wondering at which point it would make sense to split the driver into multiple device family drivers instead of lumping it all together into a mess of unmaintainable abstractions.


At the point where people are making jokes about the Linux kernel making up X% of the GPU driver.


The opposite. The lack of abstractions, mere industrial copy pasta code is the problem.


I’m not saying it’s true here but duplication over the wrong abstraction is always better. Seems to me if each graphics card is different enough there’s probably lots to duplicate.


In software development absolutes are always dangerous.


Apart from absolutes about absolutes.


Nonsense. This duplication is not maintainable and needs massive amounts of memory. Proper abstraction adds a few if else in the data, and is about 20x smaller. You can even read and understand that, e.g. what changed with this HW upgrade. No chance with duplicated blobs of structs and enums.


> I’m not saying it’s true

Yes, I did caveat my suggestion. Why not submit a fix if it's so simple ;-)


If the Radeon driver is in the kernel now, then maybe at some point someone other than AMD will pick it up and start cleaning up excessive copy paste code.

Assuming they play nice with the community, it could be a huge benefit to AMD in the long run.

Still, 2 million lines is a massive amount of code to start working on.


Unlikely, unless that entity has all the supported hardware at hand and ready for automated tests. Refactoring without a thorough test harness is ref*toring.


isn't that what Google are hoping to do with Fuchsia to make a next generation of Android that's not dependant on device drivers in the kernel ?


I can't see how they'd be able to achieve that, unless you mean simply that the device drivers would run in userland.


Reinventing QNX will always be cutting edge


Are there still workarounds for specific games/programs inside the driver?


You can see all the workaround used in mesa here: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/u...

Propriatery drivers (especially nvidia), most likely has lots of similar game specific workarounds and optimizations (even going as far as overriding shaders in games with better ones they wrote). [1]

1: https://www.gamedev.net/forums/topic/666419-what-are-your-op...


Yes. In fact, that's essentially what the drivers are.


Leaving aside arguments about "what the drivers are", the kernel driver being discussed here generally doesn't have or need that kind of thing. The user-space drivers which talk to the kernel drivers are under the Mesa umbrella as part of Gallium for OpenGL and Direct3D support (e.g. https://github.com/mesa3d/mesa/tree/master/src/gallium/drive...) or as a standalone driver for Vulkan support (https://github.com/mesa3d/mesa/tree/master/src/amd/vulkan). That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

If anyone wants to learn more about lower-level aspects of GPUs, the Vulkan driver code I linked is one of the best places to start. It directly implements the Vulkan API on one end and talks to the kernel drivers on the other end, so it's relatively easy to follow if you're a systems programmer with an API-level understanding of graphics. Just pick a Vulkan function of your choice and start tracing through the code, e.g. vkCmdDraw: https://github.com/mesa3d/mesa/blob/master/src/amd/vulkan/ra.... The Vulkan driver calls into some of the low-level radeonsi code I linked from the Gallium tree but it isn't a Gallium-based driver, so you don't have to deal with those extra layers of abstraction.


> That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

They are enabled via driconf [0]. Not nearly as many as what I imagine you'd find in the proprietary Windows drivers though.

[0] https://github.com/mesa3d/mesa/blob/master/src/util/00-mesa-...



I understand how big a deal this is and want to buy an AMD card for my next PC, just to support them, but is the driver actually good? Ie, is support for AMD cards on par with Windows?

The Nvidia driver is crappy, doesn't support Optimus, etc, but at least I haven't had any problems with it for as long as I've used it.


I bought an AMD GPU specifically for use with my Linux workstation and haven't regretted it. Perhaps I simply had bad luck with specific nVidia cards, but the AMD driver is stable in a way the nVidia driver simply never was, especially w/ respect to GPU accelerated desktop environments and screen capture utilities. The only change I made was to switch Arch over to the LTS kernel, as the upstream kernel in Arch isn't quite as battle hardened, and did occasionally require a rollback. That's not something that's likely to affect any other distro though, it's a side effect of Arch's bleeding edge nature.

Anecdotal data and all that. I'm on a Radeon VII, pretty darn solid, will probably continue to choose AMD cards in the future. Wish the Windows driver were a bit more stable, and it's... frankly weird to be saying that in comparison to the Linux driver for the same card.


I have had a similar experience with my laptop that uses a Ryzen 5 Pro 2500U, it would crash once every couple days under Windows, but no such issue crops up using mainline drivers on Debian.


Yeah, from what I understand the difference in driver stability between Nvidia and AMD on Linux is exactly the reverse of their relationship on Windows.


>Wish the Windows driver were a bit more stable

I have an AMD 5700XT in my Windows games machine, and the driver is an absolute travesty. And looking over the installed files is a horror show. Qt5WebEngineCore.dll and avcodec-58.dll, because a browser engine and ffmpeg are essential in a device driver. And why does FacebookClient.exe exist? Fuck knows.


It's even worse with NVIDIA - they ship an entire custom NodeJS with their drivers.

Also don't forget that these are user-space apps that are simply bundled with the driver but not necessarily part of it. Qt5WebEngineCore.dll is most likely used by the UI portion of the driver (settings dialogs, radeon software etc.), same with the ffmpeg dll and the facebook client.

NVIDIA does the exact same btw. - see [1]

[1] https://www.ghacks.net/2020/03/13/nv-updater-nvidia-driver-u...


I think that crap is only included if you install "Geforce Experience". Nvidia parted off their garbage into an optional component, while AMD forces you to install it.

For anyone using Nvidia on Windows, here's a useful tool to carve out most of the trash from the driver prior to installing.

https://www.techpowerup.com/download/techpowerup-nvcleanstal...


Still nothing compared to typical RGB control software, that on Windows only runs while actively logged in (stops when locking the screen/desktop) instead of a tight/light service that uses the "last known" config that updates from the desktop/gui. Let alone painfully missing tecnical docs or support for Linux.


Does AMD make you sign in to open their locally-installed driver utility? Nvidia seems to have thought it was a good idea and went ahead and did that a year or so ago...


No, and neither does Nvidia. "Geforce Experience" is not the driver. It's just bloatware nobody needs.


I have the same card and have issues with locking up and crashing in both Windows and Linux. It looks like the kernel in Ubuntu 20.10 might have a fix for some of issues.


I remember back when AMD had both closed and open drivers and, having trouble with the proprietary drivers, I switched to the OSS version. It was NIGHT AND DAY. Games that would crash and had weird oddities now ran smooth with higher frame rates. A lot of weird video issues, especially those that come from running more than one X server or Xnest, all went away.

I would only use AMD cards in my Linux boxes. The nvidia drivers/cards pale in comparison.


At least on par. That said, the AMD driver on Windows is notoriously crap compared to Nvidia.

I'm hoping AMD's next gen turns out to be competitive with the RTX3000 series for my next GPU for the same reason.


That is really interesting, I haven't had an AMD card for >10 years now and would really like to be rid of Nvidia due to the closed source drivers. How is suspend/resume? Thinking about canceling my pre-order queue with EVGA and getting a 6xxx card.


For system integration stuff like suspend/resume, display hotplug and resolution changes, etc, the open-source radeon driver is good, and probably the best option on linux. The 3D accel is not bad (but not as good as nvidia).

However, don't expect a new Radeon GPU to be well supported on day of release, expect 1 kernel release cycle until it basically works, and one more until it has most of the bugs ironed out, and then wait until your favorite distro gets that kernel. So you're looking at 3 to 9 months depending on what distro you use.

I'm personally going to be looking for people selling their RX 5700, to replace my RX 480 ...


Agree mostly, but the best option under Linux seems to be Intel graphics (or at least was until a few years ago) - arguably not beefy enough for some things, but regarding supported features, stability and power consumption the best supported mainstream gpu in the Linux kernel.

Intel simply has no closed source driver for Linux. New hardware is often supported/merged before it is even sold. AMD is trying the same, but not there yet.


The Intel i915 driver STILL crashes my system regularly even on the latest kernels... I have a skylake i5-2600k and the iGPU is absolute dogshit. Not sure if it's a hardware or driver issue but it still hasn't been sorted out after all these years.

Typically the entire system will freeze (speakers will continue to play whatever was in the short audio buffer - pretty awful) for 10-15s, then the driver will detect the hang and reboot the iGPU. Happens much more frequently (every ~15m) when using more graphically intense programs. I can't use blender because sometimes when it hangs it won't reset and requires a full reboot.

There are dozens of issues about it and related problems in Intel's drm fork of the kernel [0]. I (finally) posted a bug report about it months ago since it seemed to have gotten worse after 5.4 but never heard back from them.

All this to say - be wary of Intel graphics on linux.

[0] https://gitlab.freedesktop.org/drm/intel/-/issues


Ever since kernel 5.7 was released my i7-5500 will not boot. (Well it will boot with “nomodeset” option but then X doesn’t work so not very useful.) It’s still not fixed in 5.9.


Wouldn't even say that, I've experienced regressions/bugs on intel drivers for laptops a few times.

In general, it's kind of a crapshoot no matter which way you go, and expect pain if the gpu chipset is less than a year old.


> The 3D accel is not bad (but not as good as nvidia).

How is that? I think Mesa provides state of the art OpenGL and Vulkan support, especially with work on ACO. Nvidia doesn't have any edge in that anymore. They did a few years ago still, but not today.


Last time i checked (which was about a couple of months ago), Mesa had very primitive support for display lists (most of the time you get a command playback though if you only submit vertex commands it gets converted to VBO - and i think that was added recently-ish) whereas Nvidia's perform optimizations in background threads to convert in the best GPU format, split as necessary to minimum calls and when rendering it performs culling before processing the full list. AMD's Windows drivers also do some of that stuff (though not all).

Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering. From what i remember until AMD pushed some devs on it, they didn't care about supporting the entire API at all.

Vulkan support is most likely good though.

(EDIT: yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it - Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL )


> Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering.

There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

> yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it

By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

> Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL

You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.


> There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

Sure, i didn't dispute that, what i wrote was that Nvidia's drivers are faster in some cases based on code i've actually seen. And they used to be slower until not too long ago in that case too, so it isn't like they aren't improving. But still Nvidia's implementation is faster.

> By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

I mean games, applications and tools, not workstation applications. Not every application uses the latest and -rarely- greatest version of everything out there nor all applications are always updated - or even under development (especially games). Those that are may have other priorities too.

But why an applications uses some API is irrelevant, the important part is that the API is being used and one implementation is faster than another, showing that that other implementation has room for improvement.

> You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.

My point was that the split itself was a mistake (it isn't like splitting OpenGL into Core and Compatibility was a mandate from heaven -or hell- it was something Khronos came up with) and the hurt was that it make things complicated for a lot of people (e.g. not everyone cares about having the best performance out there - some applications are, e.g., tools that wont even come close to using even a 1% of a GPU's power, but they'd still prefer to rely only on open APIs instead of some proprietary one or some library that may be abandoned next year - code written for OpenGL 1.x 25 years ago can still work fine in modern PCs after all) and split the OpenGL community into two "camps".

This created issues like libraries and tools only supporting one version or the other, tons of bugs and wasted time for "integrating" to Core (or supporting both Compatibility and Core), invalidating a ton of existing knowledge and books (OpenGL being backwards compatible down to 1.0 is very helpful since you can always start at the beginning with something proven and work your way towards more modern functionality in an as-needed basis) and at the end all of that was a huge waste of time since everyone outside Apple decided that Compatibility is necessary - and Apple decided that splitting OpenGL in two halves wasn't enough, so they made everyone's life even harder and came up with a proprietary API all on their own.


ACO developers will work on OpenGL at some point too. OpenGL in general isn't the case I worry about, as long as it performs sufficiently well. All modern things should be using Vulkan anyway, especially if something requires focus on performance.

And deprecated features? I think there are better things to focus on first optimization wise.


Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

Also it is much more practical (and realistic) to have a few devs optimize a handful of API implementations than expect the thousands of devs who work on thousands of applications to do that (also why OpenGL etc isn't going anywhere).


> Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

I wouldn't say that. In all common cases they don't. And as above, deprecated features is the last thing I'd start comparing that on. If you use something deprecated, worrying about performance shouldn't be the case, rather you should worry about rewriting your code.


That sounds just like sour grapes :-P "Mesa is as fast as Nvidia" "But they are slower in these cases" "That doesn't count".


At least on the hardware that I've had, it's basically rock-solid in practice. I use it with high-refresh-rate monitors, I've tried FreeSync and that works, it works with all my displays, and recently the older of my GPUs (Radeon Pro WX 7100) finally got audio output over DisplayPort, as the newer of them (Radeon VII), though I never really had any use for that feature.

The acceleration, particularly with RadeonSI and RADV, and particularly as the RADV developers (independents, Valve, and some smaller companies I wish I remembered the names of) have been making massive improvements on the shader compiler side. RADV's own shader compiler (ACO) is noticeably better than the first-party AMD LLVM stack, and RADV is substantially faster than any of the first-party AMD Vulkan drivers for both graphics and compute workloads. I hope ACO in RadeonSI becomes a thing, I think it will be a major improvement.

Message to anyone listening from AMD: maybe look into making ACO your primary target rather than LLVM, it is clearly a better design for your GPUs, it has substantially less overhead, and there's no legal reason it can't be a part of all of your drivers.

As for kernel support, it is often same-day or at least it can access the displays on launch day, provided you have the latest stable kernel. ArchLinux is rarely that far behind a new stable kernel release, so on ArchLinux, same-day support of one form or another, and full support that day or some day soon, is the norm.


Suspend / resume works fine with my Sapphire Pulse RX 5700 XT.


Is it really crap? I have it and it feels stable and the Crimson UI seems well made. It feels way better than the Catalyst days.


It is crap enough for me (RX 5700 XT user) to keep a backup of the few previous successful drivers so that when one inevitably breaks things i can roll back to a previous driver.

Some issues i had with a variety of AMD drivers on my current PC from the top of my head: turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached, letting the monitor to go to power save mode would also cause the GPU to think the monitor was lost, settings for display scaling would be lost after every full reboot (full=real reboot, not the fast hibernate based one Win10 do most of the time, you get a full reboot after updates, some installs, etc), random full system hangs when trying to play GPU accelerated video (which is pretty much most videos on web as well as some applications like Microsoft's new XBox Games app), random reboots too, etc.

So i tend to be careful with updating the drivers. Last issue i had wasn't as bad the random hangs/reboots (which fortunately hasn't happened recently) but i simply couldn't launch the crimson UI at all. I had to do a full reset and reinstall of the drivers for it to appear again.

In comparison updating to the latest Nvidia driver when i had an Nvidia GPU (which was since early 2000s to ~2 years ago) was basically a non-issue: i wouldn't even think twice about it as i never had any issue.

And FWIW that was the same on Linux too: i never had issues with Nvidia's drivers there either and performance was more or less the same (at least for OpenGL stuff). But note that i avoid stuff like Wayland, hybrid GPUs, etc like the plague.


turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached

I have a similar issue with a Dell display attached to an AMD card. After suspending the PC, the monitor does not detect the PC at the other end of the DP cable, except for Amazon Basic cables which work for some reason. Digital standards are weird.


I've had all the same problems with my recently bought DisplayPort Monitor (previous ones were all HDMI and worked flawlessly).

The fix for me was switching from Xorg to Wayland. Haven't had a problem since, apart from Steam not liking it all that much.


Interesting you mention this standby issue. I just moved a monitor from an nvidia setup where it had zero issues.

Now when you turn the laptop (with Radeon gfx) on, it requires me to turn the monitor off and on before It is recognised.


This back and forth in this thread about nuisances like that is one of the reasons I am definitely sticking up to Intel integrated GPU when running Linux. It's 2020 and stuff like that should be much smoother :-(.


Note that in my comment above i was referring to the Windows AMD driver. I haven't used Linux much with this machine (though when i did it had a 50/50 chance to completely hang the system, but i think this was an issue with the kernel and the then-new Zen APUs that was quickly fixed).


I have Lexa PRO in my workstation (Fedora) - Suspend/Resume works so far.

I have an issue though where switching off the monitor for a few days might make the AMD card disabling the outputs and not recognizing the monitor afterwards (I think it is related to the order in which I try to "wake" the monitor) - which I cannot recover from without rebooting the machine.

But this is with a machine never going into suspend or any sleep state - and I can't say if this would be the same with the NVIDIA card. I do not use the NVIDIA card for video output because the proprietary driver would regularly stop showing my desktop - or suddenly any output at all after reboot.

The integrated Intel GPU on my laptop is mostly without issues whatsoever.

On laptops I would still recommend Intel GPUs anyway for power consumption reasons - although AMD APUs are quite interesting and I don't have recent knowledge about how well they compare. The CPU and its ability to lower power consumption under sleep is also relevant there, and this was way better under Intel so far. Unless you need the increase in performance an AMD GPU/APU would offer...


I have a similar issue with Nvidia on Linux. My larger display is slow to start, so I have to rerun xrandr after suspend in order to get it working.


I remember the Catalyst days. I used to work for a company which included a pc in the price when selling its software. We unofficially supported people who would run it on their own PC but eventually had to put our foot down and explicitly state that we wouldn't support AMD cards.


Hmm, that's a low bar, huh. Is AMD on Linux anywhere close to Nvidia on Windows?


The thing is, Nvidia also has issues, but their PR game is historically better. Many graphics developers have had experiences with Nvidia support where they run into a strange bug and are instructed to set a magic value to enable a driver hack. AMD drivers have had good and bad periods and hacks of their own, but are usually better behaved in this respect. But it's actually Intel that gets the most praise for adhering to spec, and therefore being a useful baseline. So user perceptions and dev perceptions diverge on what makes the drivers good, actually, and this has shifted with the different generations of APIs too; as we've gone towards a lower level access model, the basic driver functionality has become less focused on performance hacks, but there is a lot of legacy support there to support old games.

We're long past the worst period for Radeon on Linux which was back in the 2000's with "fglrx" - a driver that I never managed to get working. The new stuff will run with some competence.


I recently bought a RX 5700 XT, and installed it on a computer that first ran Linux and then Windows 10.

In Linux, the driver (including audio) seemed very robust, but I didn't find anything like a detailed control panel for the card's graphics features.

On Windows, the AMD-supplied control panel has plenty of knobs and buttons, but the driver itself seems less robust, particularly w.r.t. audio-over-HDMI.


That's very informative, thanks. I wonder if there's a cli utility on Linux instead...



Thanks for the links; I will definitly test those out


Sure. For some reason they aren't packaged yet in common distros, so that makes them not well known.


Maybe have a look at corectrl, which aims to create a beautiful control panel for graphics cards.


What about: is AMD on Linux anywhere close to Intel on Linux? No games, just 3D acceleration for the desktop, bug-free suspend and resume, etc.


Maybe I'm just lucky, but I have not had a single issue on Windows with my RX560. I know AMD/ATI drivers used to be horrible on Windows back in the day, but I really think they've gotten a lot better, I'd say on par with Nvidia's.


It is not. My 5500 XT is unusable when using 2 monitors. https://gitlab.freedesktop.org/drm/amd/-/issues/929

Apparently AMD doesn't have the resources to debug these millions lines of code, since this has been open for a year now.

Yet people still say NVIDIA on Linux has issue. They don't support Wayland and tend to lag behind with Linux only tech in general, but the driver itself is top notch. I haven't had an nv driver crash on Linux in 10 years. It's only the same echo chamber borne by the famous moment of Linus flipping NVIDIA the bird.


My experience is 180° opposite to what you describe.

I never had a mentionable issue with AMD cards since switching to the open source driver approx. a decade ago. I have a NVIDIA 1060 card in my workstation for CUDA - every single time I put it in running state again, I have a realistic chance of completely borking my system.

In fact I had an AMD card installed after the first two incidents, simply to have at least a chance of having working video output when the NVIDIA driver once again doesn't want to talk to the kernel.

That and the whole practical implications and idealistic differences of having an (mostly) open source driver vs. a (mostly) closed source driver (I think we can agree that the open source NVIDIA driver is out of the discussion).

Obviously you might run into problems if you try to run very recent hardware right after availability. Kernel driver development is not ideal for cutting edge hardware and some things might break and it might need some time for your distro to ship the newest kernel/driver.


The Nvidia driver started supporting proper optimus at the begining of 2020 (can runs apps on integrated and dedicated cards simultaneously). I use it regularly on my XPS 15 (to play Kerbal Space Program). It's called "DRI PRIME". You have to set an environment variable for starting and application saying what GPU you want it to run on.

I am, however, very much looking forward to the new AMD GPUs. Hopefully the RX 6000 series will be near a 3080 in more than the 3 hand picked games in their teaser. Would love to use Wayland on my desktop.


That's interesting, thanks, I tried to use Optimus on my XPS years ago but it wouldn't work. I'll try it now, thanks!


Search for “amdgpu ring gfx timeout”. There seem to be a whole class of bugs that have been open for years which not only haven’t been fixed, but there isn’t even any clear indication of what the root cause(s) is/are.


I tried a couple of different AMD cards, and my machine crashes on resume if I try to use either of them (but the Intel iGPU works fine).

Searching for amdgpu bug reports leads to:

https://amdgpu-install.readthedocs.io/en/latest/install-bugr...

which links to a page saying "Bugzilla is no longer in use" :-(

This is under Qubes/Xen, though, so maybe that causes extra problems. If any devs are reading, I did report it here in the end:

https://github.com/QubesOS/qubes-issues/issues/5459


It could be misbehaved applications. While AMDGPU and Mesa are much faster than AMD proprietary driver (on some OpenGL workloads I have seen 2x improvement compared to AMDGPU-PRO or Windows driver) and are normally stable, I had several issues where bad shaders brought down whole GPU (with "ring gfx timeout"). Things like out-of-bounds access or division by zero.


I upgraded from a Geforce 460GTX to a Radeon RX560, and I ran into two issues. Nothing major, and I've had worse issues with the Nvidia drivers, but they are still something to be aware of.

The first was that my distro (KDE Neon based on Ubuntu 18.04) shipped an older version of Mesa at the time, which was too old for the AMDGPU driver, so I had to add a PPA with an updated version. Since Neon updated to a 20.04 base, it works straight from a clean install. It also worked with no issues when I switched to openSUSE Leap 15.2.

The second was that DVI output was limited to single-link instead of dual-link. My monitor at the time only supported full 1440p through dual-link DVI or displayport, and the old GPU didn't have displayport. Buying a displayport cable was a quick fix, and I believe the DVI issue is fixed in the driver now.

Aside from those two minor hurdles, it has been smooth sailing, very good OpenGL performance in the games I play.


Not sure if this is a driver problem but there's a LOT of general usability issues on AMDGPU + Linux. The default thermal control being absolute catastrophe for one.


How is it a catastrophe? I game every day on AMD on Linux and have no issues. 99.9% of consumers don’t care about overclocking so if that’s what you’re referring to I think it’s a non-issue.


It runs 75C at idle because the fan curves are wonky.


AMD has caught up to Intel but still lags behind nVidia (on Windows at least). I'm just not sure they can fight a two front war. Something has to give.


If we're talking about CPU's wrt Intel, and GPU's wrt nVidia, I think they'll do fine- IIRC, they're both separate internal groups with the same overall leader (Dr. Su).


Wait a few months after a new GPU comes out, maybe until the next major version cycle (like if you want a new card that comes out in November, wait until the 21.04 Ubuntu/PopOS release).

I bought my RX 5700XT shortly after release, and was using alpha/beta kernel releases and downloading extra files manually for several months after to run, then an upgrade/update may turn into a blank screen on boot for me. It also broke out of the box support for running full VMs, which was pretty painful for me as well, and I wasn't going down that rabbit hole to try and build it myself.

YMMV of course.. but that's just my take on it.. I bought specifically for Linux support, but took a few months to shake out.


Have you tried Nvidia on-demand option for Optimus?


It is good enough. I'd say overall Nvidia's driver is worse.


I have returned 2 Radeons that I bought for specs but returned because the drivers were bad enough that I couldn’t get the same-clock performance as Nvidia or worse dealt with driver crashes and system reboots - note that this was between 10 and 20 years ago. I am highly considering trying again at eom when they announce the new cards but that it’s a Radeon is still a downside to me.

Most of the Linux community has a historical hatred of Nvidia because of the driver issue so there’s a lot of relative love out in forums, but just “stable” would be a step up for me for Radeons on windows.


I recently built a system based around a Ryzen 5 3600 CPU and Radeon RX 5600 XT GPU, and in both Windows 10 and Linux with a 5.4+ kernel it's rock solid. Gaming in Windows is simply amazing and it pairs well with my 1440p monitor. On Linux gaming is also extremely good, with only a couple of "Windows only" titles acting buggy under Proton/Steam. Considering Proton itself is in its infancy, that's to be expected.

With native performance on official Linux games on par with or better than the Windows equivalent, and more and more games getting Linux ports due to Vulkan, I just about have no need to boot into Windows at home anymore apart from Fusion 360.

As a workstation in Windows, since I don't overclock I don't see any stability issues. Fusion 360 is fast and fluid unlike my 8 year old Sandy Bridge dinosaur at work, even after adding a GT 1030. Good quality Crucial RAM and a no-frills AsRock B450 board make for a rock-solid build. Ditto on Linux as a workstation, everything just works and works well, and it's superb for 3D modeling and music creation (two of my main hobbies).


Good to hear that things have gotten better! Will be watching the oct 27 reveal of the new cards :)


I'm also very interested on giving my 2080 Ti to my partner (a Windows user) and getting the fastest next gen Radeon to myself.


It is not on par at all - my 5600 got annihilated by driver issues.

AMD has incredible CPUs, but just buy an Nvidia GPU - especially if you are using linux.


Nvidia has subpar support for Wayland on Linux because it uses its own EGLStreams buffer API instead of the standard GBM buffer API, which is better-supported. Both AMD and Intel use GBM.

Also, the open source driver for Nvidia (nouveau) has incredibly poor performance compared to Nvidia's proprietary driver, and lacks essential features such as reclocking for recent hardware generations:

https://nouveau.freedesktop.org/PowerManagement.html

AMD's and Intel's open source drivers are their primary offerings on Linux and have good performance across all hardware generations.


Intel has actually gone downhill lately, especially for prior generations. I've had to live with 5 or so years of tearing with multi-monitor support on Ivy Bridge, and even single monitor tears inexplicably with some software (that shouldn't). The Intel Xorg driver is unmaintained and the generic modesetting driver doesn't work quite as well. When I first got my Ivy Bridge system, triple head mode didn't work for a while either, so it's not like they have great support when the hardware is current either.

I've switched to AMD now and things are much better. Go with AMD.


The Xorg modesetting driver works quite reliably on Intel in my experience.

The SNA acceleration architecture in the Intel Xorg driver was a disaster in terms of correctness and stability. When SNA appeared as an option it initially seemed quite fast, but didn't take long to reveal it was also quite broken vs. UXA.

I used to explicitly use UXA but for the last 5-10 years simply using modesetting has been the way to go.

Personally I think you're conflating Xorg and kernel driver issues. Xorg is basically unmaintained in general now and unfortunately SNA was the last major development in that context for the Intel driver, and it was not good.


This doesn't apply if you want to run CUDA-dependent software. I've generally gone for Nvidia for my personal machine since Torch has behaved oddly on AMD cards in the past.

It's true that Nvidia doesn't support Wayland properly, but that's not really an issue in my opinion. Wayland still has its own problems that mean switching from X11 isn't viable yet.


Although your argument is valid, are we talking about CUDA? Obviously CUDA is an NVIDIA thing under all platforms, right? I don't think anyone would buy AMD with the intention of running CUDA.

Regarding GPUs and how good they work under Linux, computing on GPUs is only a part of the discussion I would argue...


What issues have you had with Wayland? Switching to it has given me a tear free experience on both AMD & Intel laptops, besides that it performs similar to X11.


> tear free

> 5 or so years of tearing

I know what people are referring to, but a less geeky person might come away from this thinking people get very emotional about bad Linux graphics drivers.


My main problem with it is limited software support. Xmonad isn't available and as far as I can tell what support exists for screen recording and screenshots is half-baked at best. I haven't seen anywhere near enough problems with X11 to make switching window managers worth it, and the screen recording thing would be a massive pain to work around.


I'm still on an Intel system (skylake) and my experience is similar to yours. 5+ years of bugs and crashes, tearing, multi-monitor headaches and general instability.

Eagerly awaiting the new AMD hardware.


I've found the wayland server to be a great experience with intel—the only weird bits I've seen is full-screen noise on firefox and poor support for high dpi, the latter of which is even shittier under X11. The server is really very usable nowadays.

AMD's ok if you have the room for the discrete card, but I wish they would invest more in integrated on-board chips.


Modern AMD GPUs work better on linux than nvidia. No tearing, multi-monitor works, and vulkan is very smooth. Nvidia is actually less stable, and has some peculiar quirks, such as needing composite manager running to get rid of tearing, spotty multi-monitor support, etc..


You are dismissing people saying they ARE having issues with AMD on Linux. In fact my AMD card does not do multi-monitor, and in this thread I'm not the only one that has multi-monitor issues on AMD.


Which card are you using? I'm aware the older cards are still bad. Especially if you still need fglrx. In my personal experience, the modern AMD GPUs on linux is first time graphics have worked reasonably well on linux. Even intel drivers are riddled with bugs and instability (not to mention they still don't even do gallium). GMA 3650 (powersgx based) being the most infamous worst driver ever.


A 5500 XT bought in June, so not old at all. I've heard the opposing argument, that since it's a relatively new card (out since Dec 2019?) I should expect some bugs, which is insane one year later. It's actually unusable, I have to log into my machine via SSH to restart it, or force reboot. It might break after 30 minutes or 3 days, when idle or busy.

https://gitlab.freedesktop.org/drm/amd/-/issues/929

AMD developers in that thread are chasing their tails and still haven't figured out why so many cards are having issues, and why other aren't, but as a consumer, that's really not inspiring at all.


Funny, I have 5600 XT (Sapphire Pulse) and it runs like dream. The out of box experience with Linux has been very good. Note that some of the aftermarket cards are actually bad and the instability might not be software related. Before 5600 XT, I used R9 290, and while it did require some tweaks to enable all features (due to being older card), it still ran relatively stable and in general was better experience than any nvidia card I had used in past.


This guy is having the same issues I'm having with a 5600. Multi-monitor, entirely new computer built a couple months ago.

Randomly locks up, random black screen, random rainbow colors all over my monitors.

With my new Nvidia 2060 which I bought to replace it; nothing. No issues. Works just fine on Manjaro.

For whatever reason, the AMD cards just get clapped on Linux.


My experience with linux is that the nvidia drivers and support are the worst of the bunch, and if I had a nickle every time I could trace a kernel panic through their driver I'd get a very nice lunch. Their popularity seems to be driven primarily by exclusive access to CUDA APIs and windows gaming. Nouveau is OK for accelerated 2d but is hardly in the same ballpark as the AMD drivers.

That said I just picked up a quadro (not my choice, came with a prebuilt NUC) and I've been pleased to find that it "just works" on freebsd (I use it to realtime transcode video), so clearly great experiences are possible and I don't want to be needlessly harsh.

Personally, I'm dying for a discrete intel card. I can't recall any hiccups with intel chipsets, ever, and that matters WAY more to me than raw performance.


> the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

Could you say more about what specifically makes the driver abominable? Is it just those files with largely duplicated code?


duplication 3 times with small differences between is a good case to keep separate imo.

abstraction is one of the main sources of code complexity.

you start with one function used in 3 places, then add boolean args to it to get slightly different functionality at each place, eventually it becomes a mess of complexity


I think that's very subjective and situational.

The amdgpu driver has duplicated files for different versions of things, so it'll have thing_v6.c and thing_v7.c and thing_v8.c with a lot of duplicated functions.

The more common way of doing something like this would be to have structs of function pointers that get populated based on what version of GPU you have. You have one file with all the common functions that they can share, in the definitions for each GPU version you set the majority of the function pointers to the common version they all share and for ones that have to be different, you set them to their unique version. That way you can define all the common functions once, and point to them in the structs for each version.

Having a quick flick through the code now, they do use structs of function pointers in each version for common operations but they still don't abstract out the ones that are either identical or have very few differences that you could special case.

Refactoring such a giant driver for no performance gain is going to be extremely low on AMD's todo list, so it'll probably stay like that. It just doesn't look like anything else in the kernel


This is literally what what everyone does in embedded C land. The repetitive definitions ate generally intended to be used with macros and are typically generated from the same definitions as the chip registers itself. Some places also auto generate embedded c/c++ structs or classes which imo is better. But I have gotten quite a bit of pushback for doing it.

A big issue also is the use of bitfields as much as reg duplication. Bitfields in c/c++ are a minefield if you don't lock down a known-good compiler version because there's just so much of it that's technically unspecified. Oftentimes you'll also have issues where certain register fields exist for some registers of a series and not the next or where the functionality/sizing/interpretation is context dependent or where certain locks or write orders are needed for correct access and these are often handed with presence checking macros.

IMO, if we want better driver code, it's time for GCC/Clang to nail down the bitfield layouts for the embedded use cases. This has been broken for far too long.


Sounds like an excellent way for someone looking for something to contribute to get their code into the kernel though


It would be very difficult to get accepted. You'd have to get the AMDGPU driver maintainers on board, and you'd probably have to do a lot of it at once to justify the change. It would also take some discussion, and you're talking about refactoring a lot of stuff which probably moves underneath you during this, so you have to keep iterating to keep up with the changes, all without knowing if they'll even end up taking it...

Changes like this are probably a good way to get started but I would guess the AMDGPU driver is one of the worst places to get started as a beginner.


I mean, each new version is separate, correct? So the only change that can happen under you is when something is backported. How often does that happen for a gpu driver, and how far back does that go?


Or you duplicate code in 3 places, and apply the same fixes or updates in 3 places for all of eternity. There are pros and cons to both methods and each have their places, no need to start this constant debate here.


That's why this approach can tend to be a positive for driver versions matched to hardware iterations: a given fix may or may not apply to a given hardware config, and likely has to be tested against each config separately.

It's one of the unusual circumstances where, unfortunately, abstraction can decrease flexibility and increase development time.


Proverb: "A little copying is better than a little dependency." (Rob Pike)

That is, it's better to have duplications than the wrong abstraction. This may also be in reference to C compilation, in that loading header files and dependencies costs more than inlined code. That's one of the goals that the Go language sought to resolve, anyway.


> Though as reported previously, much of the AMDGPU driver code base is so large because of auto-generated header files for GPU registers, etc. In fact, 1.79 million lines as of Linux 5.9 for AMDGPU is simply header files that are predominantly auto-generated. It's 366k lines of the 2.71 million lines of code that is actual C code.


Why not generate it during the build? Is there a good reason not to do that?


It was generated by the hardware division. These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

So, it includes many times more register definitions than are ever used (consider there are 8x more register definition lines than actual code lines that could use them) and it includes many sets of 16 or 64 definitions that a software developer would have made one parameterized definition (all the same except for _00, _01, _02, _03, etc). But this is exactly what the hardware guys generated for public release, and it is to be used as-is.

IMHO it's kinda annoying and sad. The rest of the kernel is held to a higher standard, that's why all the other non-trivial multi-arch multi-family multi-generation code in the linux kernel is much more concise / less sloppy. It takes a lot of effort to make it that way, and commercial companies pretty much never bother, except when required by the Linux maintainers.

But, modern graphics drivers are way too complex and way too much work, and most people do want some proper modern GPU support in the kernel, so compromises have to be made. It's not too bad, just a bunch of inert header lines, git and the compiler handle them just fine I guess.


Isn't that the beauty of open source, when if someone has severe OCD they could just spend their time tidying up the kernel driver instead of watching mind numbing telly?


I'm not sure what the score is - if these things were tidied up would AMD still be able to upstream their own changes or do they take back fixes from Kernel devs? Seems likely a complex political process...


> It was generated by the hardware division. These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

...which is arguably not compatible with the GPL:

"The source code for a work means the preferred form of the work for making modifications to it."


It's not applicable, in practice.

This is the published hardware interface for the driver, the formal public contract. You can't change it without changing the hardware itself.

If you really want to run the generator... well, the preferred form for modification is open to interpretation and if it's some proprietary tool then just getting the output is preferable to a dependency. Sometimes the rabbit hole is too deep, and we have to draw a line.


> ...which is arguably not compatible with the GPL:

From what I can tell, most if not all of the driver is licensed with an MIT-style license. But even if it was GPL, AMD would be the licensor, so it gets to decide the “preferred form of the work”.


"Preferred form for modification" is a form that is suitable for a skilled stranger to modify it with little exposure.


What I meant is that the copyright owner is not bound by the terms of a GPL license he grants to others. Similarly, a licensee who receives software from the copyright owner under a GPL license cannot compel the copyright owner to do anything.


An author that licenses the software under GPL, but does not release the source code in that format cannot legally incorporate outsider contributions into his GPL'd work as he would be in a position of infringing the derivative work author's right.

> a licensee who receives ... GPL license cannot compel the copyright owner to do anything.

Unless licensee in question has also contributed to a published revision of original licensor's code. And for that to work (remember the wording "preferred form for modification"), you need a form suitable for modification by a skilled stranger with little prior exposure to said work. You would otherwise get different preferred forms of modification of each contributor, which is unworkable.


> you need a form suitable for modification by a skilled stranger with little prior exposure to said work

That’s a nice idea, but it’s not a condition of the GPL. GPL v2 and v3 both only state, “The ‘source code’ for a work means the preferred form of the work for making modifications to it.” That definition exists because without it a licensee might try to argue that distribution of modified and then obfuscated code satisfies the source code offer condition.

Regarding a project licensed to others under the GPL, if the project owner accepts contributions under the GPL, then he becomes a licensee of the contributions. So, as you pointed out, he would need to meet the “preferred form” clause and other terms, at least as regards to the contributed portions. As you might expect, for a substantial project with many contributors, this could become very complicated. Therefore, many projects require contributions be made under a more liberal license (or even a copyright assignment) that allows the contribution to be sub-licensed to others without conditions.


> Therefore, many projects require contributions be made under a more liberal license (or even a copyright assignment) that allows the contribution to be sub-licensed to others without conditions.

Most, but not all of European jurisdictions, have a legal stipulation that all copyright assignments are either void or revocable even if the assigner says otherwise, except for work-for-hire. You therefore cannot release yourself from preferred form even if you required a copyright assignment, otherwise you will get stuck in the case any further published modifications to your work, not only for the contributions, but any part those modifications that interact so much so that they are inseparable, even by the original licensor, may become illegal overnight. As GPL does not state "the form deemed preferred for modifications by the licensor(s)", but " preferred form ... for modifications", you need to apply that objective definition I stated above. It would be nice if they explicitly stated that way though, relieving a lot of load from judges in resolving a possible dispute on which forms are preferrable for modification and which are not.


It may help to think about who can sue whom. Generally only a copyright owner can sue an infringer. A license operates as a defense against a claim of infringement. If a licensee fails to meet a condition, then the license is invalid.

So, in the case of the project owner who (1) starts out owning all of the rights to the project, (2) incorporates code licensed from a contributor and, (3) distributes the combination, the only person who could possibly sue the project owner for copyright infringement is the contributor. The claim would only pertain to the contributor's code, because that is the only part he owns the copyright to. The project owner/defendant would raise the license as a defense and the key question would shift to whether the owner/defendant violated any of the conditions of the license.

Where the license is the GPL, one of the conditions is partially affected by the "preferred form" definition of source code. The court would look at what the owner/defendant did and whether he met that condition. Importantly, the condition and "preferred form" definition would only be considered in relation to the plaintiff's code; the owner/defendant's code wouldn't be relevant.

Regarding the contributor's code being "inseparable", that will not be the case for one very simple reason: If the contributor sues the project owner, then he must identify which portion of the code he is suing about. If he can't do that or can't show ownership of it, then he will lose.


> license operates as a defense against a claim of infringement

It works like that in fully assignable IP jurisdictions (like USA), but it works like a contract of adhesion in the author's compulsory rights jurisdictions (like Germany and Czechia).

What I meant by inseparable contribution was a significant contribution, when eliminated, that would make entire work not resemble the current state of the work; i.e. the line that tells derivative work versus near-equal co-authorship apart (which are treated similarly in fully assignable IP jurisdictions, yet have entirely different regimes in the compulsory rights jurisdictions). Not the entirety of the work indeed.

> the condition and "preferred form" definition would only be considered in relation to the plaintiff's code; the owner/defendant's code wouldn't be relevant.

It would, in a compulsory rights jurisdiction, because all copyright assignments are either void or revocable at will in such jurisdictions.


> It would, in a compulsory rights jurisdiction, because all copyright assignments are either void or revocable at will in such jurisdictions.

I didn't believe this, so I looked at a study of EU copyright law[0]. Rights of authors are split into moral rights and economic rights. Economic rights are transferable as property. Moral rights, however, inure to the author and are inalienable. In some countries, the moral rights include the right to withdraw the work from circulation. This right to withdraw is probably what you are referring to when you say that copyright assignments are void or revocable.

The right to withdraw a work from circulation, however, does not come for free. In Spain it is only, "after indemnification of the holders of exploitation rights for damages and prejudice."[1] In Estonia, "The rights ... shall be exercised at the expense of the author and the author is required to compensate for damage caused to the person who used the work."[2] In France, "... he may only exercise that right on the condition that he indemnify the assignee beforehand for any prejudice the reconsideration or withdrawal may cause him."[3] In Romania, the right is "subject to indemnification of any holder of exploitation rights who might be prejudiced by the exercise of the said withdrawal right."[4]

In all of the examples I could find, the withdrawal right essentially extinguishes an assignment of the economic rights. So, in a sense you are correct that an assignment is revocable. Practically, however, the author who exercises that right would be liable for damages to the assignee, which could be significant, and the author would not be able to exercise the right if he could not pay for the economic harm.

Anyway, this has been interesting and I learned something about European copyright regimes. Thanks.

[0] https://www.europarl.europa.eu/RegData/etudes/STUD/2018/6251...

[1] Id. at 134.

[2] Id. at 93.

[3] Id. at 173.

[4] Id. at 301.


What modifications would you make that might be useful? The (proprietary) hardware isn't going to change.


If the generated code is a representation of certain unchangeable data about the hardware, you might still want to

1) represent it more compactly;

2) represent it in a form that can more easily be read and transformed to handle future use-cases for the data;

3) after some future restructuring of the driver, represent the data in a form that better fits with that structure.

If you have to regenerate the code using the proprietary tool in order to restructure the driver, the generated code is not "the preferred form of the work for making modifications".


All you're going to end up doing is changing the names. And for that, in my view, a big long list of defines (or whatever), autogenerated or not, is as good a form of the work as any other.

And, besides, there is an excellent chance that you will never end up changing the names.


You might want to port it to a new language, in which case having the hardware description and a generator tool is easier and better than converting the C headers.

And yeah sure pragmatically it might not make much of a difference in this specific case, but if the AMD devs were to port their driver to a new language they wouldn't edit the C headers they would certainly just update their generator, so the preferred form for modification is clearly not the generated C headers.

Not to mention if all you wanted to do was change the names, maybe prefix them with something, editing the generator is _still_ clearly the preferred format for making that change.


But the GPL applies to the driver they released, not some hypothetical driver that you or somebody else might create in the future. You're already going to have to rewrite it all in this proposed alternative language... this header is the least of your worries.

Strikes me that AMD have supplied everything required: all the driver code in the preferred form for modification of the driver, i.e., a bunch of C files.

Some of these C files are a big long list of slightly opaque magic number defines that relate to the hardware, perhaps generated by some unreleased tool, who can say - it's all speculation at this point - but that's OK! The hardware is not the bit you're going to modify. As far as the people modifying the drivers are concerned, those numbers are never going to change. This portion of the driver is fixed.


You fundamentally can't because the defining code is usually hard core proprietary or a proprietary toolchain artifact from cadence/synopsis. We're talking like a memory map of the entire system or 1MB+ XML blobs.

Honestly lifting it from a header file manually is going to be easier for everyone.


What do you think AMD would do if they decided to port their driver to a new language? Would they update their generator or would they copy and edit the existing header?

They sure as shit didn't type out these header files by hand, so clearly these are not the "preferred form" for modification.


Remove some bugs, or improve its performance. Hardware drivers get updated all the time even when the hardware remains the same.

I'm not an open-source absolutist: I think the pragmatic solution Linux went with is good here. But it's silly to suggest that the driver couldn't be improved if it were more open.


The topic is not the driver - it’s the definition of the lowest level hardware interface.

It’s lists of registers and stuff like that; not things that can really be fixed by external devs.


We can say that it's generated from the "hardware schematics". AMD hardware isn't an opensource hardware.


These are the registers that are authorized for disclosure in the open-source driver by the AMD employed open-source driver developers.

In other words, there's more functionality that they're keeping secret? Sounds like a challenge...

Edit: so the hacker spirit is not welcome here...?


Modern high-end processors have a lot of undocumented features. This is rather widely known, though of course not universally known. These have existed for a long time - https://en.wikipedia.org/wiki/Illegal_opcode .

And ... you know this. Checking your comment history https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... I easily found https://news.ycombinator.com/item?id=8834863

> Intel CPUs have had undocumented features since their introduction; it's not hard to imagine their chipsets do too.

Before I did the search I thought you were one of the 10,000 (https://xkcd.com/1053/ ), surprised that others didn't share your enthusiasm. Now I don't understand your surprise.

You must surely know your comment about "the hacker spirit is not welcome here" comes across like snobbish gatekeeping, yes? At the very least, the implied lack of knowledge about undocumented features makes it seem that you aren't one to judge what the hacker spirit might be. ... which cannot be correct given your posting history.


What you're describing sounds like a binary blob masquerading as C header files.


It's just a list of registers names, offsets, field names, and bit assignments. It is nothing like a binary blob. It is GPU documentation in the form of C header files.

Now I happen to know that the vast majority of it is shared between GPU generations to some extent, so someone could abstract things out manually to remove duplication, but it's a huge task.


> Why not generate it during the build?

If those headers aren't expected to change then, with regards to accountability, it's far better to have the code checked into the version control and processed as is.

More importantly, if the code is already generated then there's no need to make the build system more brittle by adding a non-standard build target that depends on custom/third-party tools.


The Linux kernel doesn't play the Firefox game of requiring Ruby, Python3, Python2, NodeJS and Rust to even be able to build the thing.


Not for the build itself, but there are Perl and Python scripts in the kernel source, which are referenced by the kernel's main Makefile.


If you don’t want to use languages that enable code reuse, then you can’t complain about repeated code


>Why not generate it during the build? Is there a good reason not to do that?

In many hardware shops the C definitions for the visible registers are generated automatically from the hardware's source code


and I should add ... chances are the linux kernel is not the primary user of these addresses, more likely it's the internal DV ('Design Verification' - hardware QA/testing) teams who need access to all those internal debug/setup/etc registers that are not normally architecturally visible to the downstream software teams (like Linux/etc)


Wait hardware has source code? I’m just a web dev so I’m not aware of this. What does the code look like?


Yes, there is "source code" for describing hardware. Here are two you can take a look at:

VHDL: https://en.wikipedia.org/wiki/Vhdl

Verilog: https://en.wikipedia.org/wiki/Verilog


Depends on what you're building, boards tend to be done as net lists (essentially a list of components and wires and how they are connected) - but digital chips of more more than a few dozen gates are normally written in a highlevel language (linked to by other posters here) which can be compiled into both machine code that can be simulated, and synthesised into gates and wires (a net list) that can be laid out onto a chip


I'll tell you more, hardware is often simulated in software from those source descriptions before going to any sort of production. Which is probably one of the reasons for the existence of these definition languages.


If it never changes, generating the headers during the build is just wasted time for whoever/whatever is running that build.


The real shame is that Nvidia is still doing binary blob drivers 15 years after I started caring about Linux. Are they really that afraid of someone taking their Lucky Charms?


My new theory is the Nvidia driver can't be GPL and in the Linux kernel, because then they couldn't ban datacenter usage of their GeForce cards by not licensing the driver for datacenter use. The upcharge on the Tesla series of cards is huge compared to GeForce for mostly the same chips. (For those not aware, see if you can find and GTX 2080 or 3080 from a cloud provider. It's not a thing. This is actually a huge deal for the machine learning industry, massively increasing costs. I doubt Google would have made the TPU if not for this.)

Also, their driver is very complex, and they are constantly improving their hardware. They don't want to be dependent on getting new features and performance improvements upstreamed.


Don't forget the fact that most of their silicon is basically the same and you can easily change it with some hardware/software mods[1] --- I think they have tried to lock that down a bit more, but ultimately it's a cat-and-mouse game and the only ones who win are those willing to ignore the insanity of Imaginary Property laws and take matters into their own hands.

[1] https://www.eevblog.com/forum/general-computing/hacking-nvid...


AMD does the same thing and so does Intel, this is for CPU's too. The yield on silicon has a probability that some transistors wont work, so they disable those cores and create lower end models. Sometimes, to meet demand they do just simply disable cores, as it's also cheaper to have one process. Tesla does the same thing as well with their cars, funny enough.


IBM did it for their mainframes back in the day.


They still do to this day.


I've spent a lot of time trying to come up with a better term for these laws, and I think your "Imaginary Property" phrase here is better than anything I've come up with. Thanks!


> not licensing the driver for datacenter use.

Why are these kind of licenses even allowed. If I buy a product, surely I can do with it as I please?

Also, why doesn't TSMC slap a license on every IC that leaves their fab, taking (say) a 30% profit from every application in which their ICs are being used?


Sure you can do anything with the hardware which you actually ha e bought!

The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses.

If e.g. Noveau could implement interfaces needed for CUDA, you could probably try to use a 3050 in a datacenter. I bet NVidia has provisions against this turn of events, too.


> The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses.

Ok, so who gave software a special status over hardware? Is this desirable? Can we reverse it?


> Ok, so who gave software a special status over hardware?

Software is rarely sold (outside of bespoke development). All the off the shelf software is essentially rented.

Software itself has no legal value - the copyright is what is considered to be property. That property can be leased or sold. This is why copyright infringement is called infringement and not theft.

When you “buy” software, you are actually entering into a lease contract to use the software (sometimes perpetual, but increasingly only temporary) which can have various terms and conditions (that you really should read, but never do). But that lease doesn’t grant you the copyright.


I think that's misleading then, because when I buy a GPU, they make me believe I own it, when, apparently, in reality I don't.

I don't think this way of selling (or as you say renting) stuff should be considered legal.


I think your idea is agreeable but if we did treat hardware this way it changes almost everything. Apple/Nintendo/Sony/etc would all be required to give users root access to the software and remove their ToS.

And then it get even more complex when you get in to online services. Game consoles are going online only next gen. If you buy the ps5 digital edition and you mod your OS and sony bans you from their servers, your console is now a brick. But in many cases its fair to be banned such as banning cheaters.


You own it, and can talk to it the same way you can talk to a brick, lol :D

(I avoid nvidia whenever possible)


What's the point of that? We do the same thing in software all the time. You get basic functionality for one price, and pay for a key to unlock extra features. Why should hardware be any different? So the law would somehow require any feature on a hardware product to have some physical difference and not be purely a software limitation? What is the advantage of that? Just increases cost to the manufacturer (which will get passed down), then also precludes any possibility of upgrades by purchasing a software patch.


It should be clearly communicated and never be misleading.


What's misleading? You get the functionality you pay for. The fact that software controls that is an implementation detail.


As mr_toad says, the software is essentially rented.

This means I'm not buying but renting, which is not how it is advertised.


> I think that's misleading then, because when I buy a GPU, they make me believe I own it, when, apparently, in reality I don't.

If you buy a GPU you own it and the copy of the software it came with. You are free to use that combination as you choose, forever.

It’s not renting because you don’t have to pay rent to continue to use it. There may be software license restrictions, typically against modifying or reverse-engineering the software, However, it is an error to say that those license restrictions convert your ownership into anything like a rental agreement.

Some digital activists say that we don’t really own the devices that we buy because of license restrictions or restricted device firmware. It’s hyperbole. We do own our devices and the copies of the software they came with, even if they came with artificial limitations.


Lets test this idea of ownership: My phone auto-updates, and the manufacturer prevents me from reverting updates. One update has removed my ability to record my calls.

Does that sound like ownership? Can BMW employee pop over to your garage one day remove some bits of the car he thinks you shouldn't have any more?


The problem of features being changed or removed by a software update is real and the owner can be harmed, as you were. As the owner, however, if you are harmed in that way then you may have a claim against the manufacturer. For example, in a recent class action case by PlayStation 3 owners against Sony over the removal of the Linux OS feature, the court seemed to agree that owners were entitled to damages because Sony ended up paying millions of dollars to class members in a settlement. If you or the PlayStation 3 owners were not owners, then you wouldn’t have a good claim.


By the sounds of it playstation 'owners' were paid compensation, but could not get the Linux feature back, in other words they were not made whole. They don't control what is happening to their properly, and without Sony's agreement they cannot repair damage done by sony to it.

That does not sound like ownership to me - again, think back to car ownership. Firstly tampering with your car would have been criminal damage.

Secondly, BMW does not get a say in how you use your car. They can't stop you going over the speed limit. You could get your car fixed without having to involve BMW or going to court to force their hand.

In my view this Sony case looks like compensation for breach of a lease-like contract.


> By the sounds of it playstation 'owners' were paid compensation, but could not get the Linux feature back, in other words they were not made whole.

Members of the class could opt out of the settlement and sue Sony individually. A court could theoretically enjoin Sony to restore the feature for those individual plaintiffs, but the plaintiffs would have to show that monetary damages would be insufficient. Generally courts don’t like to force defendants to do things when paying money would be an acceptable outcome.

> In my view this Sony case looks like compensation for breach of a lease-like contract.

I haven’t read the complaint in that case but the plaintiffs probably alleged a breach of the implied covenant of good faith and fair dealing. So, yes, possibly a breach of contract claim but not a lease. (Note: A lease is a specific form of contract in which a lessor transfers possession of property to a lessee, but retains a future interest in the property after the contract term ends.)


Your idea of ownership is way too primitive and doesn't reflect reality.

You do NOT own the software that comes with your GPU!

Ownership implies the ability to transfer, modify, and resell, none of which are within the rights granted by the license of said software.

It's not "rental" either - it's licensing. You don't have to become a lawyer, but knowing and understanding the difference between proprietorship (ownership) and possession is a good start. Same goes for renting vs. licensing vs. ownership.

TL;DR you do not have ownership of any software that came with any device you bought and it's not hyperbole at all.


> You do NOT own the software that comes with your GPU!

When you purchase a consumer GPU that comes with software, you acquire the GPU, the copy of the software it came with, and a license to use the software subject to particular terms and conditions. That is what you own, no more, no less.


You do own it. And you're free to use the open source drivers if you want.


> When you “buy” software, you are actually entering into a lease contract to use the software

This is inaccurate, at least as to purchases software. A license is not a contract because the licensee is not required to do anything. A license can have conditions (restriction), but not covenants (promises to do something). A license basically functions as a defense against a claim of infringement.

Note: For purchased software there is a contract for the sale of the software subject to the license, but that shouldn’t be confused with the license itself.


> For purchased software there is a contract for the sale of the software subject to the license, but that shouldn’t be confused with the license itself.

That's simply not true. You are indeed making a contract for the sale of the license itself. Otherwise subscription models wouldn't work and would even be legally allowed to share and resell the software, which you aren't (i.e. just because it's possible to resell an acquired license while keeping a working copy, doesn't make it legal to do so).


I agree with you. My earlier point was that a license is not a contract, and shouldn’t be confused with one. My note at the end was that there is also a contract when you acquire a license through a purchase. The contract is typically of the form “you pay us money, we give you license”. That contract too shouldn't be confused with the license acquired.

As you correctly point out, one who sells his only license to a piece of software no longer has a license. If he kept a copy of the software and continues to use it, he is committing an act of infringement. That is the same whether the license is for a term (subscription) or perpetual.


Keep in mind it's the same special status that allows the GPL to have the condition that you must release your source code if you distribute something that includes GPL code. So, "reversing" it would also reverse the GPL.


Not exactly. The GPL's special status generally comes from the fundamentals of copyright law: it attaches conditions to the duplication, modification, and distribution of a work. If not for the GPL, you'd have no right to distribute something containing the copyrighted code.

The datacenter-versus-personal conditions of NVidia drivers attach instead to the use of the copyrighted work. These restrictions are based on the idea of an end user license agreement as an enforceable contract, either agreed-upon when the driver is downloaded or through a theory that copyright attaches to the temporary (in-memory) copy of the driver necessary to run it.


Yes, maybe, but step #1 is to sue Oracle.

See if you can convince them to "let you" publish a benchmark of their database management system.

Start there.


Amusingly, Oracle is known as the slowest major DB despite their heavy handed tactics. So, actual benchmarks might actually help their sales rather than people simply assuming it’s unacceptably slow.


Did this change recently? I remember my database professor in college was adamant that when they talk about databases, I am to assume some things as a given (going by memory, am probably not completely accurate):

that the data set is large enough that cannot fit in memory

that storage is orders of magnitude slower than memory and memory is orders of magnitude slower than processor cache

Oracle has the “best implementation” given these constraints.

Is that not the case?


It's worth noting that in current conditions the assumptions may be unwarranted.

First, while storage used to be orders of magnitude slower than memory, not SSD storage is just a single order of magnitude slower;

Second, in many domains now it's often practical to ensure that your data set can fit in memory. For example, if your system is for storing financial transactions (which is a prime market for Oracle), then your enterprise has to be quite large to get a terabyte of transactions and you can put a terabyte (or much more) of RAM in a database system if you choose to.


That's precisely the point, Oracle forbids benchmarking and comparisons in its licensing.

So how would anyone (legally) know?


Well, you cannot legally publish a benchmark, but you can set up your own for your private uses. It is not like Oracle DB detects it is being benchmarked and shuts off itself.


It's not a special status, anyone has the right to deny you a hardware product as well. I don't have to sell cars to anyone if I don't want to. If I do want to sell someone a car, I can specify a contract or license that they must follow if they buy my car. Ferrari famously only sells exclusive models to customers who have been pre-approved, i.e. they have a certain amount of income and own 5+ Ferraris already. I also cannot walk into a Lockheed Martin dealership and tell them to sell me a F-22, even if I can afford it, even if my country has permissive laws regarding the ownership of fighter aircraft.

As for software, well, EA has the right to ban me from their servers if I hack their games, even if I did pay for the product, and this makes sense because it ruins everyone else's experience. I don't pay for HN but if I did they still would have a right to ban my account if I start posting slurs or other abusive content.

Is it desirable? Of course it's desirable; imagine having no control over your own creations and having to deal with the consequences of other people abusing it.


None of those examples are equivalent. The hardware examples are where companies refuse to sell a product (so you never own the product to begin with) where as the EA example is where you’ve been kicked off online services (you still have the capability to play the game offline, you just can’t access their servers, but you don’t buy their servers when you buy the game) and the HN example is a termination of subscription. Neither of those examples demonstrate legal limitations to software usage with a product you own (though the EA one at least comes close from a superficial perspective).


> you still have the capability to play the game offline

EA famously uses online-only DRM in many of their modern titles; if you get banned from, say, SimCity, you can't run the game at all. There is no "offline mode".


And is this desirable? How does hacking your copy of a single player game harm anyone?

Also, SimCity eventually got an offline mode.


It's not a single player game I'm pretty sure - there are leaderboards and achievements that allow you to compete with your friends. Obviously these features are moot if the top 10,000 players have a score of MAX_INT. It would have been nice to "disconnect" your city from the leaderboards if you wanted to go crazy, but unfortunately this mode was not added.

For the record I am against always on DRM so I did not buy this game nor any other game that uses it. I don't believe we need to codify laws banning the practice or any such thing that requires software developers to build things they don't want to build (with the exception of critical fields such as healthcare and aviation).

It's desirable in that a one time purchase does not entitle a customer to a lifetime of server resources; they paid for the game and they can certainly keep the game, but they don't have a right to the services required by the game (those are recurring costs). This makes sense since the alternative is forcing EA to pay to host servers for people that violated their terms of service.

You are correct that it got an offline mode eventually, I overlooked this. But this demonstrates that the market corrected this problem: Enough consumers complained to force a change. Therefore, is there need for external intervention? The simple solution to always-on DRM seems to be to just avoid buying any products that use it.


>If I do want to sell someone a car, I can specify a contract or license that they must follow if they buy my car.

Only in a limited form e.g. exhaustion doctrine prevents you from restricting resale. If someone wants to resell their exclusive Ferrari, there's nothing Ferrari can legally do (though this'll probably get you blacklisted from ever receiving an exclusive vehicle).

In general, terms can't go against existing laws and have to be 'conscionable' to be enforceable (i.e. they can't be obviously 'unfair').


The examples of software you list aren't close to equivalent. They are all services and you can get banned from a service if you misbehave. But a piece of software such as a driver is not a service.


Software updates aren't a service? Doesn't Nvidia provide updates to its drivers over time? They can choose to cut anyone off from those, including the very first initial driver download. Sure, you can own and do whatever you want with the hardware - but good luck getting it to do anything useful if you can't access Nvidia's driver download service.


Yes software updates can be a service. Nvidia however doesn't provide a working version of their driver with a card. Just like buying a game is not a service but recievig updates can be.


Sure, but a reseller could resell it under the first sale doctrine. https://en.wikipedia.org/wiki/First-sale_doctrine


> Ferrari famously only sells exclusive models to customers who have been pre-approved, i.e. they have a certain amount of income and own 5+ Ferraris already.

Sounds like discrimination to me, and not desirable.


Only a few very specific types of discrimination (religion, sex, ethnicity/race, etc) are prohibited, every other discrimination is fair game and often desired. For example, it's quite desirable to discriminate potential developer hires according to their programming ability, to discriminate potential borrowers according to their ability to repay the loan, etc.


And discrimination is only discrimination if it's illegal where PeterisP lives?


Discrimination is discrimination everywhere, as it applies to all activities where people make distinction and treat some things, people or activities differently.

But I'm arguing that when you hear "X is discrimination" then it's wrong to automatically imply that X is bad or X should be changed - there's just a narrow subset of discrimination that's immoral and should be avoided; and there's a narrow subset of discrimination that's illegal discrimination (there's some overlap between these two subsets but they are not exactly the same of course), but most discrimination - and certainly the default situation - is just reasonable human activity of us applying common sense and acting according to the specific situation instead of blindly acting the same no matter what like robots would, it's completely normal to adapt to the specific person and act differently to be most suitable with them, make adjustments and custom approaches for different individuals which definitely is discrimination but there's nothing a priori wrong with that. For example, custom pricing is one form of discrimination - offering a discount for students or senior peole is certainly discrimination, but we generally consider that it's entirely appropriate.

And in certain cases a lack of discrimination would be completely immoral - for example, the concept of "reasonable accomodations" is a requirement for discrimination; for example, a policy that forbids electronic devices in an exam does not discriminate in any way and applies equally to anyone (in colloquial language one might call it a "discriminatory policy" but that's wrong; perhaps I'm nitpicking on that but it a misuse of words to mean their exact opposite), but as it forbids hearing aids for people who need them, then that non-discrimination is bad; and also simply equally allowing all devices would be bad for other reasons, so ADA and equivalent laws require to discriminate and apply different rules to people with different abilities.

So if you see a practice that seems definitely bad and harmful, then "is it discrimination?" is the wrong question to ask, since it's very likely that it may be harmful but not discrimination, or it may be discrimination but nothing wrong with it; these aren't edge cases, the overlap is just partial. The proper question to ask is whether the criteria of the discrimination is fair (the up-thread issue of discriminating upon wealth certainly is debatable whether that should or should not be acceptable) and whether the results of that discrimination are appropriate.


But building a brand by only selling to rich people?


Sure, everything that's not explicitly prohibited is permitted, and wealth is not one of those very few things prohibited for discrimination. You're free to have a club that only admits billionaires or offer a discount that applies only to people below a certain amount of income.

The example on ability to repay is closely related to discrimination by pure wealth, but there are businesses with even more straightforward criteria, e.g. financial services that are offered only to individuals with net worth above a certain (quite large) amount, and having less money than that automatically disqualifies you from that service even if you were able and willing to pay the involved fees.


> Sure, everything that's not explicitly prohibited is permitted

That was not the issue. The question was whether it is desirable.

Personally it leaves a bad taste. It reminds me of a fashion brand that doesn't sell to obese people (can't remember the name but it was in a documentary).


So don't buy one


> Sounds like discrimination to me, and not desirable.

Are you allowing everybody who want to have sex with you to have sex with you or are you discriminating to a select few / unique person ?

Discriminations is part of human nature.


> Discriminations is part of human nature.

Glad to see someone also came to this conclusion!


That's the point of luxury models. Not everyone can have them.


The difference between hardware and software is that copying is free for software. You can own the hardware and do whatever you want with it because for you to reproduce it would require you to effective be Nvidia. For software you can't give a user ownership of exactly 1 copy of software. If the purchaser has all of the rights of ownership they would have the right to distribute copies for free, which obviously make selling the same software impossible. Software is copied and hardware is moved, they're fundamentally different so the have to be treated differently


> Ok, so who gave software a special status over hardware?

Some american politician extended copyright protection towards software. The rest of the world eventually did the same.

> Is this desirable?

No.

> Can we reverse it?

Sure. We just need to have billions of dollars just like the copyright industry. That money can buy a lot of influence.



> Some american politician extended copyright protection towards software. The rest of the world eventually did the same.

>> Is this desirable?

> No.

So I'm sure you'd be happy if I just took the software for whatever great startup idea you'd been slaving away on for the last two years, slapped better marketing on it, and undercut you by 50% since I didn't have to employ all those pesky overpaid engineers.


If someone purchased a product from me, and then used it however they want, that's fine.

It is their product at that point, because they purchased it.

You should not have the rights to control someone else's product, such a a graphics card, or whatever, after they have purchased it. It's theirs now.


You're free to write your own graphics driver for the hardware you own, just as Nvidia is free to not help you.


Nvidia is free to not give out any graphics driver, however, that would make their graphics cards unfunctional and hard to sell.

However, if nvidia has sold me a functional graphics card including the driver as an unalienable part of the package that I purchased (since the driver being functional is part of the card being 'fit for purpose' of the sale), I should be free to use the driver without any unreasonable restrictions. I have legally bought [a copy] of it, it's not copyright infringement for me to run it on a computer - even if it resides in a datacenter.


> your own graphics driver

My whole point is that there needs to be more efforts to hack and modify these things, and that this would be "more desireable".

And that orgs should be using their power to cause this to happen more. For example, if open source orgs can weaponize licensing agreements against nvidia, in order to force them to do this, then they should and this would be desireable.


If only! The hardware is made so that only software written by nvidia can drive the graphics card.


Buy a Radeon. I'm not seeing the problem. You have options? Nobody is forcing you to buy Nvidia hardware.


> You have options? Nobody is forcing you to buy Nvidia hardware.

Laptops are generally an all or nothing proposition. I wanted a laptop with a high performance CPU and the nvidia GPU just came along with it. Couldn't even disable the thing in firmware since hardware video decoding with the Intel GPU caused kernel panics.


If you claim it matters, but not enough to impact the purchasing decision, did it actually matter?

Like you showed your disapproval of Nvidia by giving them your money anyway. So... They're right - people care enough to complain, but not buy something else, so it doesn't really matter.


> So... They're right - people care enough to complain, but not buy something else

Are you aware of the concept of market power, switching costs, barriers to entry, and market lock in?

If so, then that should enlighten you as to the explanation for this.

> did it actually matter?

Yes. It matters, yet still did not change consumer behavior, due to the concept of market power.


How can you take software that was never published in the first place? There's a reason everything is a service these days. So what's the point of these protections?


There's nothing stopping nvidia from theoretically offering a gpu that is only rented out rather than sold. It's just not really considered acceptable for hardware (at the moment).


That's only because the hardware is a useless dongle without the software.

Sure in theory you could run an open source driver, and in practice sometimes the river won't crash, but there's no point because you could get an equally good open source driver video card for the same price, since you can't get the fancy card's peak performance from the open source driver


Isn't that what IAAS is in essence?


rms, amongst others, long maintained that firmware is different to software.


If the hardware is effectively useless without the software driver, it could be argued the whole thing is a bit of a fraud / misrepresentation. But I guess nobody wants to sue somebody with pockets as deep as nVidia to change the status quo.


GTX/RTX cards are sold as gaming cards so its not misrepresentation. If it was then every bit of hardware with locked down software would be fraud.


“Locking down” is not the problem - the problem is that you are told you’re buying goods when you are actually buying goods which require access to rented software in order to function at all.

It’s like buying a blender and then finding out that you’re not allowed to blend anything unless someone in the manufacturers’ operation approves of it.


It's not fraud; you can return the purchase if you don't like the software license.


"The problem is in the software (the driver) which you never can buy, only license under a long list of conditions which prohibit specific uses."

Well, buying and licensing are not so different in Europe (first sale doctrine). The company can not forbid you to resell a license (Exhaustion of intellectual property rights) in Europe.


The US respects first sale doctrine as well. See Vernor v. Autodesk, Inc. 555 F. Supp. 2d 1164. [0]

However, the ability to resell a license doesn’t remove other conditions, e.g., restrictions on data center usage.

[0] https://en.m.wikipedia.org/wiki/Vernor_v._Autodesk,_Inc.


lol. When was the last time anyone actually bought a copy of software? You own nothing. You are a party to a contract written by nvidia and signed by you when you installed their driver. You can do only what they allow and they can yank thier permission whenever they see fit.

Not happy? That is what makes FOSS so appealing.


> When was the last time anyone actually bought a copy of software?

Satya Nadella bought Skyrim recently. All of it.


Skyrim includes licensed code such as Bink Video.


Microsoft did, even satya is perhaps not rich enough to spend 7.5 B on his own.


Have enough details about TSMC contracts ever been released/leaked to know they don't do this?

I don't follow the semiconductor industry closely enough to know anything about TSMC's business practices, but these kind of contracts are far from unheard of in other sectors.


Maybe you could just use it for the datacenter anyway. NVIDIA doesn't have a right to know how you are using it.

What are they going to do, call you and ask how you are using the GPUs? Don't answer. Message you on Facebook? Don't answer. Visit you? Don't publish your address.

Alternatively, just don't call it a datacenter. Just call it a private internet gaming cafe or something of that sort. NVIDIA doesn't have a right to know what's actually inside.


I suppose you may get raided by an organization like the BSA upon suspicion (e.g. when you get reported by a disgruntled employee).

https://en.wikipedia.org/wiki/BSA_(The_Software_Alliance)


Build your dreams in a country whose government won't give a damn about enforcing it then. I can think of several where you can safely do so, and the government will just laugh it off as a waste of time if someone tried to file a suit about something like this.

The US will fall behind in tech if it insists on enforceability of things like this.


Most companies would rather just buy the enterprise card than go through all of this hassle. Its not even a rip off when you consider that the enterprise cards pay for the research and development on cuda which puts enterprise grade tools in the hands of students and hobbiests.

AMDs version is simply not supporting their version of cuda (rocm) on consumer cards (the navi ones anyway)


> Its not even a rip off when you consider that the enterprise cards pay for the research and development on cuda which puts enterprise grade tools in the hands of students and hobbiests.

Monopolists can do anything with your money including sitting on their hands. Also, supporting students and hobbyists may be noble, but education is something we all pay tax for.

Also, hobbyists would be better served if they could develop their own version of cuda.


All of the BSA members are US tech companies. One could argue the enforceability of US Copyright law is to protect this industry.


I suppose to the idea is that you can't use the driver..


Let's say I do this in Russia or Morocco or whatever.

How are they going to prevent me from doing so?


How are you going to sell said cloud service to companies doing business in the US or EU?


Using the internet and a payment processor, of course. The true hardware would be hidden from the client in one way or another to protect from judgements, and any inspection would be respectfully denied.


> If I buy a product, surely I can do with it as I please?

We no longer buy products these days. We license them. Another form of rent that allows the true owner to maintain control. Somehow this became the norm.


There's nearly no market for people willing to pay to own.


That's because most people don't know they don't really own the stuff they "buy" these days.


The EULA / driver license may prevent you from reverse engineering the driver to enable these features, but that is only legal protection. nVidia sells these cards saying they don't provide feature X; nVidia also sells some cards, which do provide feature X (at a different price point). There is imho nothing per se wrong with this practice. The silicon being the same in both products is an implementation detail.


If you sold pork with different price on the condition of eating it in a wood vs. a stone house, some people would consider it market segmentation or maximizing profits. Others might call it illegal price discrimination.


Who calls it illegal?


> nothing per se wrong with this practice.

There is nothing wrong with someone doing whatever they want with a product that they now own.

If someone wants to modify their own hardware, that is their right.


It's cheaper than designing, manufacturing, and stocking more chip models. It's also cheaper than designing and manufacturing 1 model and physically disabling the pieces after.

You could try to regulate that what is manufactured is not gimped on it's way to the consumer for ideological reasons but in the end you'd just end up paying more for a separate physical model because the profit margins on these advanced use cases are simply what drives GPU design.

As for the royalty licensing TSMC is ahead in abilities and has captured an enormous portion of the market but it's not so far ahead that it can eat however far into customer income streams as it wants. Other manufacturers still exist and get deals, Nvidia is using Samsung 8nm for the latest round of GPUs for example. If it continues to increase its lead then we may see that type of agreement grow though.


> Also, why doesn't TSMC slap a license on every IC that leaves their fab, taking (say) a 30% profit from every application in which their ICs are being used?

Because companies would stop using TSMC chips...?

Not to mention the logistical problems to attribute "profit" to any chip in particular.


I don't think not using TSMC is a viable option anymore. There's (slightly worse?) Samsung alternative, but they can't satisfy the demand.


Also, if TSMC introduces the license, then Samsung may do the same.

But perhaps I'm too much thinking from the perspective of what large US based businesses would do.


US trusts are nothing compared to Asian trusts.

en.wikipedia.org/wiki/Keiretsu

en.m.wikipedia.org/wiki/Chaebol


Not about trusts but about too much capitalism on US (like pharmaceutical industry?)


For GPUs, I agree that you currently have to use TSMC. But if TSMC were to charge 30% of profits, you would almost certainly see a migration to other fabs which would harm TSMC's long-term profitability.


> For GPUs, I agree that you currently have to use TSMC.

RTX 30 is manufactured by Samsung.


Not the Quadros used in datacenters. GA100 is on TSMC 7nm according Nvidia themselves: https://developer.nvidia.com/blog/nvidia-ampere-architecture...


Give it 10 years and a lot of engineering time, maybe half a trillion USD and you'll get the equivalent in the mainland US. Until then, it's more convenient to use the US military resource to protect Taiwan from PRoC.


This is called a royalty and is a pretty common business arrangement when licensing e.g. a patent for your product or some stock footage for your movie.


>Why are these kind of licenses even allowed.

They're not in (some parts of) Europe.


This sounds dubious, isn't it easier/more reliable to just block parts of the hardware using fuses?


To differentiate products for consumer and enterprise, Intel disables ECC RAM support for Core i5 or upper series and enables for Xeon E series (i3 or below is sold for both market so not disabled).

NVIDIA reduces (actually reduced on die) FP64 calculation units and disables ECC RAM support for GeForce (except some Titans) to not to be used in datacenter. Previously it works because most scientific calculations requires FP64 calculation and reliability is matter.

But now is the deep learning era, it won't need FP64 calculation and rare RAM error isn't matter. So they must enforce the EURA to avoid dirt cheap Geforce to be used in datacenter for deep learning.


I don't think the latest i3s have EcC anymore, but they are releasing Xeon Ws which are the same chips as i7s with ECC enabled.

The price premium is like ~10%, which is fair.


I've caught up latest SKUs, thanks. Rebranding Xeon E to Xeon W looks not meaningful.


No because they want students to have cuda to learn with and home devs to have so they develop tools for it. Then when the enterprises use it for profit they have to pay for the development of the platform.


It's basically the Adobe route.


How? That’s like saying an Intel i7 can’t be used in a datacentre.


Meanwhile people say capitalism drives innovation.

How much is nvidia single handedly holding back innovation and new discoveries?


That viewpoint is adorably naïve. The Computer History Museum in Mountain View pretty clearly falls into three categories: government projects, genuinely innovative ideas from the private sector that failed, and the people who ripped off those ideas and made a killing. There is very little overlap between the last two categories.


> The Computer History Museum in Mountain View pretty clearly falls into three categories: government projects, genuinely innovative ideas from the private sector that failed, and the people who ripped off those ideas and made a killing. There is very little overlap between the last two categories.

This is ignoring two very important things.

The first is the number of government-funded projects that burned a mountain of cash and led to nothing. Unfortunately this is the rule rather than the exception in modern times because modern government has been captured by interest groups that divert money from where it's supposed to be going to themselves, which makes everything cost ten times more than it did when the government was funding the Apollo program and ARPANET. So you can't just say "government fund more stuff" without fixing that first.

And the second is that private companies inventing stuff only to see somebody else successfully commercialize it is still causing it to be invented. And the overlap between invention and commercial success can be very little and still cause people to do it, because the reward when it happens is very large.


> Meanwhile people say capitalism drives innovation.

The saying is really that free market competition drives innovation.

Obviously patents and copyrights are government-issued monopolies, and monopolies are by definition lacking in competition.

The theory is that by granting the monopolies we get more innovation. Often the theory is wrong.

Especially when we allow the company to leverage the monopoly on the thing they actually invented into a monopoly on ancillary things that are only used in combination with that class of product.


I mean Nvidia is just cashing in their innovation advantage, AMD stack was worse forever and OSS is their white flag/hope someone else picks up the ball and creates an ecosystem to leverage their HW.


Your second sentence doesn't contradict the first sentence. Capitalism (or more precisely, IP law) can simultaneously drive innovation and hold back innovation. The more worthwhile question is whether capitalism drives more innovation overall, but that's hard to prove either way with snarky 1 liner.


"The more worthwhile question is whether capitalism drives more innovation overall, but that's hard to prove either way with snarky 1 liner."

Sent from my iPhone.


I think you proved the parent's point.


Maybe they can't open-source it because they don't own all the IP? That's very likely the case for Windows as well, for example, Microsoft just didn't licence all the code they used for releasing the source, and now you can't go back to 1000 different IP owners and negotiate anything reasonable.


Didn't they have to manually prepare a binary patch for a security issue in the Word Equation Editor, because they either lost or could not compile the source code anymore?


The original Equation Editor was licensed from a third party (Design Science), and it is possible that Microsoft never had the source code. Maybe the third party vendor lost the source code, but I think it is more likely that getting the third party vendor to fix the bug would have required negotiation with that vendor, and maybe Microsoft and that vendor were having trouble agreeing. (This is speculation on my part, I have no inside info.)


Or equally likely, that vendor no longer exists.


Likely, perhaps, but not true, since they still exist: https://www.dessci.com/en/

Microsoft probably started wondering internally why they don't just write their own equation editor, but didn't have time, so decided to do a crazy patch to this one and then start on a rewrite.


I think that Microsoft can open-source most of Windows sources. Nobody would care too much about few binary blobs and I don't believe that they don't own license for a significant portion of OS.


This is exactly what happened with Solaris, and it turned out to be a rather massive problem because it meant that the community couldn't actually functionally produce a derivative distribution because the original released source code didn't actually represent the entire distribution. And a project that the community can't build will always be critically undermined by that flaw.


I think that momentum behind Open Source Windows would be immense so community would overcome any problems. I mean, people are making Windows distributions right now, with all sources closed, and they're making amazing work if you ask me, with all those reverse-engineered knobs and whistles. Solaris is niche OS after all unlike Windows.


It was a solvable problem though, right? A bunch of different OpenSolaris distributions exist now.


I dunno if I'd call it "solved" or not... Illumos reduced the binary blobs, but to this day you have to download a bundle of them when you build it. The whole issue also added significant friction early on, which I personally think stunted the project's growth, but I'm not sure that really knowable.

But yes, it did eventually get mitigated.


Well, at the very least, they could allow loading unsigned firmware, or allow their firmware to be redistributed, then.

This is the number one usability issue with nouveau: no firmware means no re-clocking, which means bad perf.


Here's a kernel engineer from Microsoft answering the question: What do you think about open sourcing windows and getting rid of the licensing code? [0]

[0] https://www.quora.com/What-do-you-think-about-open-sourcing-...


The last assertion made in that answer is unfounded and false - "Even if the entire OS code was made public tomorrow morning, it would take years before someone figures out how to build it, the complexity of the build system itself is mind boggling."

is contradicted by the fact that just recently a version of windows source (old, but still) was leaked, and people did manage to successfully build and boot the leaked windows (xp and server 2003 IIRC) code within days of that source becoming available.


that's what kept Solaris from being open sourced for years.

There's a talk by Bryan Cantrill about that.

They basically could not provide a fully functional OS because some marginal yet used-everywhere parts where licensed and proprietary (Bryan cites the internationalization library as an example).


second-hand information but apparently the reason they can't is that the driver contains code licensed from other companies and they can't open-source that.

https://www.reddit.com/r/hardware/comments/j217oo/gamers_nex...

while obviously not an official source, that isn't particularly surprising either.

as an additional relatively-well-known-but-possibly-incorrect bit of internet lore, right now their Linux driver is basically a wrapper around their Windows driver, so that explanation makes a lot of sense. They would have to go through and disentangle what parts they own and what needs to be stripped out / replaced for the linux version at an absolute minimum.


Why is it a shame? They had been providing quality Linux drivers for years, when nobody else cared about high-end graphics for Linux. Remember fglrx?

Now AMD is opensource? Great! However, it's still very far from perfect. You only have to take a look at the list of AMDGPU issues at freedesktop[1]... because being opensource is easy, but working fine in a stable manner is another.

[1]https://gitlab.freedesktop.org/drm/amd/-/issues


It's shame, because they hinder the progress of Linux desktop and prevent Nouveau from reclocking properly.

And what about AMD bug tracker? It's open, so you can see the bugs. That's a plus, not a minus. Nvidia blob has all the bugs hidden somewhere, so you don't see them. It doesn't mean the blob doesn't have them.


> It's shame, because they hinder the progress of Linux desktop and prevent Nouveau from reclocking properly.

I think it's the opposite. Some years ago, Nvidia was your only chance to have accelerated graphics on Linux. ATI/AMD didn't care about it at all, and Intel cards were not for gaming. So Nvidia made it possible to do things in Linux when nobody else allowed you to... how's that hindering the progress of anything? Specially when nobody forces you to get an Nvidia card.

> And what about AMD bug tracker? It's open, so you can see the bugs. That's a plus, not a minus. Nvidia blob has all the bugs hidden somewhere, so you don't see them. It doesn't mean the blob doesn't have them.

Yes, I didn't say Nvidia was bug free. I just said AMD drivers for Linux are, at the moment, far from perfect, despite being opensource. I'd say, for newer cards, they're worse than Nvidia's. I value opensource, but if I have to choose between having an opensource desktop crashing twice a day, vs. the Nvidia blob, of course I'd go for the latter, as much as I'd love to have a fully opensource OS.


I was Nvidia user for a long time due to the above, but today they aren't worth bothering with. AMD can be slower to fix bugs or have more of them on release day due to having smaller teams, but they are gradually ramping that up, and their current level of support already doesn't bother me, while they are providing a proper open source driver. Nvidia don't and have no plans to. I'd take AMD over Nvidia today any time.

Regarding slowing down progress, I was talking about modern desktop like Wayland compositors and so on. Nvidia was hindering it for years. And their attitude towards Nouveau is disgusting.


Well, I've been using Nvidia cards for years, and the last time I built a new computer (some months ago) I had a hard time deciding whether to stick to Nvidia or switching to AMD. Eventually, I chose to stay with Nvidia, because getting a new AMD card (apart from the fact that there seems to be no budget AMD cards...) seemed like a lottery in terms of having a stable Linux desktop environment, being my best bet to get an older generation card (RX570 or RX580), that had not much availability and they were overpriced here.

As for slowing down Linux desktop progress, I think it's not Nvidia's fault: you could always get a card from another vendor, although the alternatives were not as good. Well, maybe those other vendors are to blame, and not Nvidia...


I'd say it's their fault, since due to the above situation, there were a lot of Linux users with Nvidia cards. Nvidia didn't care to upstream things and that caused them not to support Wayland and many other modern use cases for years.

Today it's less relevant, since usage of Nvidia on Linux is gradually dropping, so their damage to the progress is also diminishing thanks to that. Wayland compositors' developers can simply say - we don't support the blob and don't plan to and be done with it. In the past it was much harder, due to how many Linux users had Nvidia still while alternatives were way less viable.


Providing support for a marginal platform is also much harder when you’re <0.1 times the size.


Are they really that afraid of someone taking their Lucky Charms?

They're afraid of patent trolls.


That's my dream... or in a short time frame at least wayland support.


Well, in their defense, I generally haven't run into any issues with their driver and it's also pretty easy to package/install as an enduser.


I have. Like keeping a copy of the last page you viewed in Chrome and overlaying it on the screen after exiting the browser.

That one is fun.


I encountered a real fun bug where a game on linux crashed and a ghost of the game remained on the monitor even after it was connected to a different computer and power cycled, it remained for days. Some really interesting cascading bugs there.


That's not a software bug, that's an image retention issue with your monitor. If it was that severe then it probably got into a state where it was sending severely invalid timings to the TFT LCD array, DC biasing it, which causes long-term retention and may even cause permanent damage if done for too long.

Software isn't supposed to be able to cause that. That's on your monitor.


I _think_ it was every other frame, and I am fairly sure that it was a software/driver issue. It had never happened before, it has not happened since, and it started immediately when starting the game and the symptoms got progressively worse and triggering the mild symptoms happened every time I started the game.

What I convinced myself of after a few minutes being sure I wasn't hallucinating was that the graphics driver was pushing out malformed data in some way or the other which was triggering bugs in the monitor hardware/firmware, which is easy to believe are plentiful. It would be an interesting project to try to track down and replicate the bug.


That reminds me of the spookiest bug I've ever encountered: once, when resuming a Dell laptop from suspend at work, it showed a Windows desktop. Said laptop had been running Linux exclusively for several months (but it had previously been used with Windows). Interacting with the laptop made the expected xscreensaver unlock screen appear, and everything worked normally afterwards. The only explanation I could come up with was that, somehow, a snapshot of the Windows screen had survived intact in a corner of the framebuffer which the Linux driver didn't touch, even after months of power off/on and suspend/resume cycles, and a bizarre driver glitch made it visible in that particular resume cycle.


The said windows desktop ghost screen didn’t have date time on start menu, did it?


If I recall correctly, the Windows XP default was to show only the time, so the date probably wasn't visible.


I get this on Macbook Pros pretty often when connecting external displays. I think the nvidia drivers are universally bad.


I agree, if it weren't for the fact that they give zero shits about Wayland support. I'd be totally fine with them staying closed source as long as they kept up with the standards.


What are you talking about? Wayland is supported on DEs that wanted to support nVidia chips.

Meanwhile projects like Sway have a direct "Go to hell if you use nVidia, we won't let you run this code." It's bizarre that you blame nVidia for this.


It's possible to create a Wayland compositor that works with the proprietary nVidia drivers, but it requires using nVidia-specific interfaces because nVidia refuses to support the same interfaces for non-GLX, non-X11 hardware acceleration provided by every other Linux graphics driver.

It's hardly surprising that a lot of Wayland compositor developers would rather not put in a ton of extra effort to add a special case for one particular set of proprietary drivers, which they would then need to maintain and support separately from the common code path.


To tell the whole story, the NVIDIA argument is that they want cross platform standard interfaces (EGLStream), which would use the same code the Windows driver uses, but the Linux world is pushing for Linux-only interfaces (EGL)


That may be, but the fact remains that nVidia is pushing an interface that no other Linux drivers currently support, for reasons that really only benefit them. The Linux kernel team has never been particularly supportive of middleware layers designed to promote common drivers between Linux and other operating systems, and for good reason—it impedes the development of optimized, native Linux drivers.

The only way I see nVidia succeeding here is if they clearly demonstrate that EGLStreams is a technically superior alternative to GBM, not just for their own hardware but in general, and also contribute the changes needed to support EGLStreams for all the other graphics drivers currently using GBM so that applications don't need to deal with both systems. As long as the EGLStreams code path can only be exercised in combination with the proprietary nVidia drivers it will remain a second-class citizen and projects would be well-advised to avoid it. (Drew DeVault goes into more detail[1] in his objection to the inclusion of EGLStreams support in KWin, which I agree with 100%.)

Or they could just acknowledge that this is a Linux driver, not a Windows driver, and implement the standard Linux GBM interfaces like everyone else even if that means less shared code.

[1] https://lists.sr.ht/~sircmpwn/public-inbox/%3C20190220154143...


Typo, by Linux-only interface I meant GBM.


It’s “””supported”””. It’s apparently very buggy and very difficult to debug. Sway lets you run it after you set a flag making it very clear that if something is broken you may not report a bug since the developers are unable to reasonably fix it.

Most Linux distro will also prevent you submitting a bug report for a kernel issue if you have a tainted kernel.


wayland doesn't work at all so if you have a 4k monitor and a non-4k monitor and an nvidia card, you're basically just fucked, because you can't selectively scale things


I have. I needed a prerelease kernel for a new driver but nvidia had not released a binary for the new kernel yet so I was unable to use anything but the open source nvidia driver.


You could just as easily blame the author of the driver that only works on a prerelease kernel.


I still haven't been able to get my laptop's 1660Ti to display anything else than glitch art, proprietary or nouveau.


Is it? This is the same driver that makes me shut down X and fucks up xorg.conf every time I need to update?


Not to mention that nvidia uses proprietary configuration options even in Xorg.conf. A multi-monitor configuration which works fine in nouveau (or really any other driver) refuses to work with nvidia, because if you use the binary driver you have to set bizarre metamode options to make it work.


Most of it is dumb automatic header generation as the article points out, but looking at the source[0], there seems to be a lot of code duplication, e.g. 5 different vcn_*.c files that seem of significant size and largely identical at a glance.

That frankly seems like horrible software design.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


How do you figure? VCN 1.0[0] and VCN 2.0[1] differ within the first few lines and only do so further, the more you compare them. The files seem to correlate to each version of AMD's Video Core Next hardware[2]; and not anything to do with autogeneration.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[2] https://en.wikipedia.org/wiki/Video_Core_Next


Actually, there's a lot of commonality between those two files. Just look at all the "is_idle", "get_wptr", "set_wptr" type functions. They're either literally identical or the v2_0 version is a superset of functionality of the v1_0 version.

This is a case where code is identical not by random chance but due to an underlying logical pattern (in this case: ring buffers for communication between CPU and GPU seem to continue working the same way). Hiding that via code duplication is going to make the code overall less maintainable, because the code lacks implicit knowledge of these patterns.

P.S.: It's interesting how many people missed that my complaint was specifically not about the auto-generated part, but about the part that humans copy&paste.


Code duplication sometimes make sense to better decouple different parts of the code base, however that often only becomes apparent long after the "code cleanup" to remove duplicates has been done.


Code duplication this way (versions of code for different hw) is common in Linux though and makes a certain kind of sense.


IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware. Pretty sad if you ask me and very unmaintainable, but, at least if you ask the companies, people should just buy newer generations of their hardware if they want those bugfixes :).


Wild speculation from someone who doesn't do hardware, but I'd guess that it's not so much writing the software as testing it that motivates that. It'd be hard to keep the testing matrix of apps X hardware X driver version from exploding out of control.

Your customers will be hella pissed at you if their favorite game suddenly doesn't work on their system after a driver update. So you need to test a representative sample of previously-working software across all hardware that the updated driver applies to.

That sounds like a nightmare, and a capital-intensive one because you need to keep machines up with all of these cards, both in continuous integration and for interactive debugging.

I can understand the impulse to keep the list of hardware that new driver versions support small.


This is the real reason in my experience, combined with the bringup/support cycle that comes with hardware development. If you fork the driver for bringup, you buy yourself a lot of freedom to change things. During support phase, when the hardware is already in the hands of the customer, changing things is more risky and so you have to be conservative.

For example, maybe you had a chip bug in the previous hardware generation which caused the system to hang after several days of stress testing. You found a software workaround for the bug, but every time you touch the code you need to re-verify the workaround, which takes days.

Of course, the downsides of forking the code are also very apparent...


Another example of this in the kernel is filesystems: in theory, the ext4 driver can read/write ext2/ext3 filesystems fine. And yet the kernel for years had ext2, ext3, and ext4 implementations, each later version derived from the previous. The ext3 driver was eventually removed in 2015, but ext2 and ext4 survive.


Yeah exactly. Every new driver at AMD and NVIDIA have to fill a massive spreadsheet with most major game benchmarks from the last 5 years. Along with a collection of particularly nasty games.


The thing is, DRY has some downsides. If you've pretty much nailed down the functionality of the old thing, especially if it involves some kind of manual testing/open beta test with users, the last thing you want to do is touch the old code.

I'm not defending them, but sometimes something is good enough and you have to move on.


In the context of the Linux kernel, you've never truly nailed down the functionality of the old thing. Interfaces external to the driver evolve over time (which bleeds into changes internal to the driver), requirements change, people want to add new features that old hardware ought to support -- all this adds up and puts evolutionary pressure on the drivers. Code duplication really hurts here.

It's possible that the AMD developers are from a Windows culture where they don't have to worry about this because Microsoft gives them a stable interface. Though even there, I find it questionable whether it's a win. Surely the occasional bug will surface that affects drivers for older hardware as well?


This doesn't sound like an issue with DRY as much as an issue of monolithic design.

If driver version X worked but X+1 doesn't on your driver - don't update the driver. But when the driver is in the kernel trunk then that option isn't that simple.


> IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware.

The rule of three is not exclusive to the embedded world. It´s far better to have duplicate code around then to have to deal with poor generalizations that affect multiple independent modules, which in this case mean hardware support.

To put it in perspective, how many pieces of hardware would you require to re-test and validate if you decided to refactor bits of drivers that might share a common code path?


This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:

  abstract class BaseDriver {
    abstract commonThing()
  }

  abstract class BaseDriverV1 extends BaseDriver {
    commonThing() {
      console.log('This is the common method implementation for all V1 drivers')
    }

    abstract doThingOnlyV1Does() 
  }

  class DriverV1_1 extends BaseDriverV1 {
    doThingOnlyV1Does() {
      console.log('This is how Driver V1.1 does the V1 thing')
    }
  }
This way you can use either an interface or an "abstract" definition that declares the base driver methods and the versioned driver methods, then provide version-dependent implementations where needed, or else share the common implementations by inheritance/extension.

Maybe this turns into spaghetti at scale and it actually is easier to just copy-paste all of it, who knows.


> This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:

How sure are you?

I mean, if the code path of a driver is touched, that triggers all verification and validation steps on all pieces of hardware that are required to ensure that the pieces of hardware affected by it will continue to work as expected.

Furthermore, how many bugs are introduced on a daily basis because people like you and me push "fairly clean patterns" that end up triggering all sorts of unexpected consequences? I know I did, and still do, my share.


C doesn't provide any language support for this, including any notion of abstracts or classes. You can still do it manually and that's basically how drivers are implemented in Linux, but it doesn't address the combinatorial test matrix explosion.


I don't know, to me this is a bit like taking two c programs that print "hello world" and "goodbye world", looking at the assembly that is compiled then remarking "pretty sad and very unmaintainable."

Point being, maintain the code generator and not the generated code.


Right. But here the generated code is checked into the source tree and the code generator is kept proprietary, which is why it's sad and unmaintainable.


Code generator probably runs of their verilog/hdl source of the chips. It probably relies (or is build on top of) on proprietary EDA tools.

Without open sourcing their chips, it would probably be useless.

Even if you work for chip maker, that is what you usually get from hardware guys, so we are probably getting the same as AMD own driver programmers.

\s Welcome to wonderful world of driver development at majority of hardware houses \s


They could still separate the extraction of the data (keeping it proprietary) from the code generation. The latter could work off of the extracted data kept in a more sensible format.


That's how we do things at work. If we need to change a class function we outright copy/paste it then change what we need. This is so everything is backwards compatible. A bug in one version might be expected or documented in a previous. We can't fix the bug in an older version unless we get the OK from our clients because they might be depending/expecting it for their code to run correctly

It's a good system. You shouldn't complain. And deleting code is easy since not everything depends on everything


> That frankly seems like horrible software design.

Why? If it's auto-generated and the generated code is never touched by humans, that's not too bad.

This information is sourced by some CAD/synthesis tool and just describes the hardware register layout.

Manually maintaining that would be much worse and error prone.


That part of the comment was specifically talking about the parts that are not auto-generated, but copy&pasted from one hardware generation to the next -- and then bugs need to be fixed and interface changes need to be applied in N different places forever.


Compile times increase superlinearly with program size. How many cycles and developer hours are wasted if this duplication is not necessary?


Looking at things like e.g. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... , this is very obviously specific to the underlying hardware's control model in all sorts of ways. If the hardware has a gazillion control registers, then the software will have to handle a gazillion registers and will therefore need several gazillion definitions in order to do it. The hardware won't get any simpler just because it takes a long time to compile the driver.


> Compile times increase superlinearly with program size. How many cycles and developer hours are wasted if this duplication is not necessary?

Care to weight in the impact on developer hours and cycles of having to debug problems caused in how the autogenerated headers are created?

If the code is not expected to change and was already subjected to verification and validation, ensuring that it stays the same is a good way to avoid wasting time tracking bugs on code paths that assuredly were already verified.


"Compile times increase superlinearly with program size."

I've seen this said a couple of times lately. What is the source?


The other time that you saw it was also probably me. It's from this talk, which is about how a large amount of generated protocol buffer code at Google led to a quadratic increase in compile times: https://youtu.be/rHIkrotSwcc?t=720.

TL;DW: The reasoning is because if you use a distributed build system, then your compile time is gated by the file with the longest compile time (tail latency). The more files you have, the greater chance that one of them takes a long time. When you generate source files, you tend to produce more files than if you didn't.

Most users don't use a distributed build system to compile the kernel, so on further thought, in that case compile times probably scale closer to linear with the number of translation units. But wasted cycles are still wasted cycles, and regardless of how exactly compile times scale, you should still consider the cost of longer compile times when you duplicate code.

With regards to link time optimization: sophisticated analyses take superlinear complexity: https://cs.stackexchange.com/questions/22435/time-complexity....

Disclaimer: I work at Google.


Not knowing the context of the quote, I can guess a few causes:

* Compiler optimizations mostly work on a per-function basis, so that the 'n' in O(n) or O(n²) is the size of a function and not the size of the entire codebase. Not all optimization algorithms are linear, and while good superlinear optimizations should have code threshold cutoffs, in practice these may be omitted. Function sizes follow something like a power law distribution, which means larger code bases contain larger functions that are more expensive to optimize.

* For C/C++, you compile code by pasting all the header definitions into your translation unit, and feeding that to the actual lexer/parser/codegen. Larger projects are probably going to have correspondingly larger headers, and files will probably include more of them, so that the actual translation unit sizes are going to be larger even than the on-disk source files would indicate.

* This is more speculative, but larger projects are also probably likelier to blow out the various caches on your computer, so that the operations you figure are O(1) (such as hashtable lookup!) are actually more like O(lg N) in practice. And as you get larger projects that are less likely to fit any cache, the effect of the non-constant time becomes more apparent.


Internal compiler datastructures and the need to keep more things in memory?


This particular thought thread is about more translation units, not larger translation units. I can't see why those would not mostly scale linearly.


Tail latency. See my other comment (https://news.ycombinator.com/item?id=24749000).


Are you saying C compilers are O(n)?


> Are you saying C compilers are O(n)?

OP asked for sources that substantiated a technical claim. No assertion was given.

Unless you have a source that either supports or refutes the claim, trying to deflect the question does nothing to address the problem.


I very much expect O(n) compilation of stuff like array declarations, enums, CPP macro definitions and so forth. What else would it be?


They may well be, provided one stays away from bad patterns such as humongous complex functions?


Yes. Despite being an avid code minimalism and refactoring advocate, I have zero problems with auto generated code size, as long as the size of "the source that generated the code" is reasonable.

Here's how I count the size of the code:

So if a codebase is 1 million lines, but 900k was autogenerated using, let's say, a 10k line 'metaprogram' (which is otherwise not included in the source), the real size of the code is 100k + 10k = 110k.

Now, with the stupid vcn_* .c duplication, that I have serious problems with. But what you could do is take a diff of the two vcn_* .c files, keep those diffs as part of the source code, plus only one version of vcn_* .c file, but now add a small script that generates the remaining vcn_* .c files during the build process by using the diffs, e.g., using the patch command. Now the size of the diffs and the size of the patch script, plus the size of one vcn_* .c files, is a better measure of the size of the code.

Of course this is a short-term workaround. There's still a need to refactor the code properly.


That can be a useful way to calculate lines of code, however it is only nominally correct.

In many cases it is more useful to calculate the lines of code according to what was checked into the build system, since those files can drift from the primary source. This is true, even if a million lines of checked-in source originated from 1k lines of meta-source.


I'm not saying a line-of-code calculating program like cloc should change its behavior.

I'm talking about the right way (IMO) of assessing the software complexity of the codebase.

And I'm also talking about the fact that it is trivial to make cloc match my definition of source code size "if" ... the developers are willing to do a small amount of work by moving code-gen into the build process and not committing pre-generated code into the source, but the metaprogram scripts instead.


> I'm talking about the right way (IMO) of assessing the software complexity of the codebase.

Yes, I get that, and that is what I meant, too. The reason is that complexity scales with the code that is checked into the build system, not just with the meta-code.

I've written a lot of code generators for various purposes, and despite the efforts to only get meta-code checked into the build system, what has happened in almost every case is the generated code has been checked into the vcs. Of course, this depends on the particular teams (or consumers) of the generator.

Eventually, the checked-in code gets changed. Maybe not right away, but it will most likely happen someday. That is why the complexity scales with the checked-in code and not with the original meta-code.

There are also other force-multipliers, so to speak. For example, if there is a vulnerability in the generated code that was checked-in, the actual attack surface of the company can be multiplied by the number of instances of generated code not by the meta code. Fixing one instance doesn't fix any other instance. Complexity and risk are inseparably entwined and should not be looked at separately.


Why would you check-in generated code and not the meta code?

> Eventually, the checked-in code gets changed.

Doesn't have to be. Especially with giant headers like in AMDGPU, it should be much easier to change the meta-code if there is a need to make modification in the generated code. Essentially you look at what needs to be changed, and work backwards to figure out what change in the meta-code will result in the same change in generated code.

I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.


> Why would you check-in generated code and not the meta code?

I wouldn't. However, many teams do that. Perhaps they view it as an efficiency. In many cases people check-in generated code in order to perform their own risk reduction, removing a dependency and the possiblity of the generated code changing outside of their control.

> I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.

No. I'm definitely not referring to boilerplate code.


I find it funny that developers (especially less experienced ones?) are so fascinated with the "big size" of the Linux kernel. I've seen this in various discussions, both online and offline.

By commercial standards, the Linux kernel is really small. The average enterprise application which has been in development for that long (almost 30 years) usually has at least several million lines of code, if not tens of millions or more.

User facing functionality is HUGE in size compared to tech functionality, usually.


The size is impressive relative to other kernels. A line of kernel code is much more expensive to develop than a line of application code. Of course, size is not be best metric by which to judge a project; it's always better to have more capability with less code if you can manage it. For its size Linux supports quite a wide variety of architectures and add-on hardware. Also, a great deal of effort goes into keeping the kernel codebase maintainable, which is more likely to manifest as lines of code removed rather than added.


Linux is the only kernel in the world with wide-ranging hardware support (due to its development policy). Everything else is either orders of magnitude more niche and only supports a tiny subset of hardware, or has a stable ABI and relies on out of tree drivers almost exclusively (Windows). Nevermind architecture support.

No given person is ever actually running most kernel code. If you took only lines compiled into the core on an average system plus the line count of currently loaded modules, you'd come up with a much smaller number.


A Google dev once told me that every day they write as much code as the whole kernel contains.


A quick math tells us that every employee (that includes non-technical people) of Google would have to write around 200 lines of code per day, which seems completely implausible in a company with a heavy software development process.


The order of magnitude is off but the scale is still much, much higher. I think they have billions of lines of code.


maybe if you also count tests, that's not very unreasonable.


And it still has a bug with residual cursor after wake from sleep on my desktop running Ubuntu 20.04...

https://gitlab.gnome.org/GNOME/mutter/-/issues/1108

Hopefully not for long though with this

https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-stag...


Oh, let's not also forget (to pile on) the GPU hangs on the 5000 series cards across many games, and for some many ordinary applications.

https://gitlab.freedesktop.org/drm/amd/-/issues/914


What I find worse is the lack of analog output support with DC. One of my monitors only has a VGA input and I have no intentions of replacing it.

However, the new Display Core does not support analog output and it will stay dark.

Fortunately amdgpu.dc=0 is still an option, but I dread the day when this code path is removed (or bit rots away).


Is it a really good VGA monitor or something?

You can buy relatively inexpensive active DVI/HDMI to VGA converters if that helps.


> Is it a really good VGA monitor or something?

Not at all. It's an old 19" with a resolution of 1280x1024, you can get those pretty much for free these days.

But hey, n + 1 screens > n screens :D

I usually just have my browser on it in full screen. And because of the resolution I don't have to worry about overly long lines on websites that aren't restricted in with (like hacker news).

At work I picked up an old 19" TFT from the trash that connects neatly to the unused VGA port of the docking station. It has abysmal ghosting issues, so it's no use for the browser, but still good enough for a terminal.

With modern graphics cards supporting many outputs and monitors that you can literally get for free, you can have the luxury of dedicating a display to a single task or application, so that's pretty cool.


Buying an LCD monitor with no digital input was always a bad way to cut costs, but you can buy active DVI-to-VGA or DisplayPort-to-VGA adapters to keep these devices working with new hardware.

Moving rarely-used DAC hardware to an external dongle seems like a good design decision, though it makes more sense for driving a CRT, where the conversion is D->A rather than D->A->D->A. A CRT can be better than a modern LCD in some ways, whereas VGA-input LCDs are simply obsolete.


I have an early Dell IPS from circa 2005 which is 1280x1024 19" and it supports DVI input, fortunately.


So does my latest macos with intel video and second monitor.


Yeah, the intel driver in Mojave on my 2015 13" pro has a bunch of issues.

I see random buffer garble when waking the external screen, sometimes the machine freezes with hw video decoding and the resolution switches to low-def on monitor sleep only to switch back on monitor wake. It's a shitshow.


AMDGPU It was significantly larger than the entire OpenBSD kernel sources. The dev/pci/drm directory ballooned to around ~130M. As of 6.8/-current, it's now 191M. It was under ~20M before May of 2019, prior to it first being committed.

This driver is I believe part of the common codebase that AMD upstreamed to the Linux kernel, it's shared with its proprietary driver on other platforms.


What percentage of the Linux kernel is actually still there after you run it through the preprocessor on a modern system?


A rough rule of thumb is 20 bytes of object for 1 line of C.[1] So if you have a 5MB kernel you might have 250k source lines of code.

That said, the Linux kernel is not typical code. Great swaths of it are architectures and devices you will not need, so they are rightly ignored.

[1] https://www.leshatton.org/Documents/LOC2005.pdf


I think that’s the core of the question. How much of the kernel is left if you strip out all of the sections that would be stripped by the preprocessor when building for a current-generation 64-bit x86 target.

AWS had a blog post about this topic a few years back. They created a cut-down Linux variant for running Lambda functions that only supported exactly their server architecture, only contained exactly the drivers they actually use, and notably only had 1 keyboard driver for a keyboard with only 1 key, which was mapped to Halt.



It is another highly complex computer after all.


Honest question: why these gpu drivers need some many lines of code? Anybody with more background can explain?


GPUs are complex computers in themselves, and the driver needs to provide compilers etc that target them (e.g. when it comes to shaders, applications hand the driver source code that is then compiled for the specific GPU architecture). Combine that with many different product generations and configurations being supported, code apparently being autogenerated and thus probably not optimized for size and you get to this.

EDIT: monocasa points out below that the compiler is not a correct example - true, that does live in e.g. mesa userspace code.


GPU drivers are essentially operating systems for the GPU.


And also ~real time compilers for any user code (shaders, etc) that will be running on the GPU.


The compiler isn't in the kernel though. That's in the user space half.


People complain about Nvidia's proprietary driver, but imagine if they opensourced all of their software stack and had it in the kernel. If AMD is 10.5%, Nvidia might be like 35% :)


lol, not really. 10.5% is ~ 190M. if nvidia x2 that, its 380M of code, which would mean lesser, as the deominator expands significantly, so their share falls too lool.


According to the stats in the article, the kernel contains 225085 lines of assembly.

Just for comparison, the whole DOS 2.0 operating system, which was written in assembly entirely, has just 37955.


> Just for comparison, the whole DOS 2.0 operating system, which was written in assembly entirely, has just 37955.

Apples to oranges.

I mean, how many drivers were shipped with DOS 2.0?


I think you're reading the comment backwards from intent. To me this comment is just pointing that while the vast majority of the Linux kernel C it supports so much stuff it still has an order of magnitude more assembly than old operating systems that were 100% written in assembly.

Yes, apples to oranges... but interestingly so not as a knock that DOS was coded better or something.


Right, and how many distinct instruction set architectures did DOS 2.0 support? The fact that there are only 225k lines of assembly in the Linux kernel, across the 30 or so major architecture families Linux supports plus any number of variations within each family, is rather impressive. If you ported DOS 2.0 to one variant of each of those 30 architectures, assuming roughly equal code density, that would be over a million lines of assembly code altogether, for a far, far less capable system. That's considering only the OS core, not device drivers, which is reasonable since DOS 2.0 didn't really have any drivers worth speaking of by modern standards. Most of the real low-level I/O was handled by the BIOS.


> The fact that there are only 225k lines of assembly in the Linux kernel, across the 30 or so major architecture families Linux supports plus any number of variations within each family, is rather impressive.

Except that's not the intent that's conveyed by blindly throwing LoC numbers into the air. The intent is to suggest the exact opposite: bloat. Hence the blind comparisons that mean nothing at all.


But cant it run Crysis on Ultra at 120FPS?


Is the AMD Radeon Graphics Driver part of the Linux Kernel? Does that make sense? Aren't graphics drivers updated independently from the kernel? I'm aware that Linus isn't a fan of microkernels, but this seems a bit overly monolithic to me.


Can someone with more understanding of the Linux Kernel arch/design describe why these drivers are in the kernel tree? Are these not part of a HAL or are they more "integrated" then that?


Linux does not use a stable API to talk to drivers, so drivers are intended to be part of the kernel tree so they can change with the rest of the kernel evolving.


Here's some more HN discussion around this: https://news.ycombinator.com/item?id=14533398

And the relevant doc in the kernel source: https://github.com/torvalds/linux/blob/master/Documentation/...


Eh, that's not exactly true. Most of the API is pretty standard and stable. They don't have a fixed ABI, which is how closed source OSes handle driver compatibility.


There is a significant difference between having a supported defined interface boundary, and a semi-stable interface that doesn't necessarily guarantee backwards comparability to prior interfaces.


> There is a significant difference between having a supported defined interface boundary

Which is literally the delineating mark between an API and and ABI.

A stable ABI means your binary interface will remain the same and your compiled drivers should work, despite any changes on the kernel level. A stable API means you'll probably have to recompile your drivers when the kernel changes, but that they'll require little (if any) modifications.


A stable API means you will definitely not have to make any modifications, and Linux provides no such guarantees. "Pretty" standard and stable is not the same at all.


So, by that logic, there has never been a stable driver API in the history of OSes. Windows 9x, Windows 2000/XP, Windows Vista/7/10 all had different graphic driver models and were incompatible. Same with various releases of OS X/macOS.

Of course I’m not implying that the driver interfaces have a guarantee or will not change, especially with the rolling release model of Linux. However, you can be pretty certain within a major point release that there won’t be significant breaking changes. Thus the “pretty stable”. It’s not a pro nor a con, it’s just the style that works for that open source project.


It's a matter of degree I guess. My understanding is that the kernel pretty much reserves the right to change these APIs as they feel fit, without much regard for the consequences for out-of-tree drivers, even if in practice the interface has settled and isn't in need of changing much currently.



IMHO a fairer comparison would be the amd drivers vs the nvidia drivers, but we don't have the nvidia sources (of course).


83k lines of YAML. Wow.


What percentage of loaded and running binary is it?


0% on mine (cause I'm on Intel and Nvidia HW) and have a custom kernel config.


While the down votes are understandable, this is actually a somewhat legitimate take; the kernel's pretty good about not loading stuff that it's not actually going to use on your actual hardware, so the impact of this is limited to people who are actually using that hardware, which by extension means they kinda do need the driver in some form.


This is what I was interested in. As I understand it, there are extra drivers in most installs and this is reconciled as the kernel boots.


Correct. A bare kernel to get to an initrd. That loads the rest of the modules is the common distro pattern.

I build a kernel w/o initrd, only my necessary hardware and then only modules I need.

It's pretty easy, boots fast, stable, etc.

But, I'm already a type-A for my work-box (Gentoo)


I mean, you can just run lsmod to see what modules are loaded; I don't think that lists anything that's statically compiled in, but most distros don't really do that AFAIK.


I really like that the AMD drivers are open source and up streamed into the kernel. However, that doesn't solve all the use cases. If I want to use newer hardware on an older distribution (RHEL, DEBIAN, Ubuntu LTS, etc) the upstreamed drivers will be too old. Distributing the driver like Nvidia or Radeon Software for Linux will still be necessary.


You can always install a newer kernel - thankfully Linux is notorious for never breaking the userspace as a general rule.


It is a reasonable thing to want to keep the kernel stable for custom kernel drivers. So stable kernel space is also important to me.


The "custom kernel drivers" are the problem. Whatever they are, they should be upstreamed too. Trying to keep a stable ABI just causes problems.


That might be an ok solution for consumer devices, but the kernel drivers I'm talking about are for custom setups that will only be used internally by me. There would never be any reason to upstream them. Constantly upgrading the kernel just creates a lot of work.


Either way what you're looking for is a backport, open or proprietary doesn't change that. I think it's important to note amdgpu-pro already requires amdgpu as well so "will still be necessary" has already passed.


Isn't the implication that, to some extent they are chasing their own tail; adding features on a wobbly foundation gives wobbly features

Never looked at a single line of their code, but still presumptious enough to suggest pruning and repotting before watering the beautiful rain of more features :)

-- An idea to be fertilized with several grains of salt


A kernel is supposed to be a small set of necessary code to boot and manage some hardware. It's host to drivers, applications and UIs. Thus the name, 'kernel'.

To compare a driver with the kernel proper, seems very strange. There are other large codes in the world too.


The title is misleading, there is a nontrivial distinction between the Kernel and its source code.


Does something "being part of the kernel" mean it's part of every Linux distribution? Even on server distros?


"Being part of the kernel" is defined as being in the kernel's source code tree. Not all of the kernel's source code is included in a kernel binary, most pieces are compiled as modules that can be loaded at run-time. However many distribution do indeed ship the binaries for most kernel modules so some server distros like Ubuntu server probably do have GPU drivers installed, although they are never loaded.


If the Linux kernel can be sliced and diced, what's the point in putting non-essential code, that won't be used by the majority of running systems, in the kernel?

Why not offer the driver as additional package / download?


Shouldn't a GPU driver have a significant representation in the Linux Kernel? It represents a lot of the computation.


In source code, or in compiled bytes?


According to the first line of the second paragraph, it is source code


It turns out that it's generated source code. So the actual source code may be substantially smaller.


Are they also open-sourcing the tool that generates the committed code? If not, then there's little difference between "generated" and "actual" source code.


Or just the huge output from cloc that is showing the count of lines of code.


109 lines of a "Windows Module Definition"? What's that about?


I found [this](https://old.reddit.com/r/linux/comments/2e255r/this_is_how_b...)

> False positives because def and config are pretty common suffixes:


What is the one sed file?


And it still doesn’t support 8K MST monitors.


And they still don't work.


Downvote all you want - AMD OS drivers still do not work. Even the proprietary drivers do not work.

The driver has had issues for ages with AMDs new cards. Guaranteed if you’re downvoting this you have no used a recent AMD card on Linux.


As someone developing a cross-platform game engine, the AMD Linux drivers perform quite well, notably better than many proprietary Windows drivers do, both from a performance and stability perspective.


What doesn't work? I have a 5700XT and it seems to be fine to me. What am I missing?


While compiling kernel

    yes '' | make localmodconfig 
will create a config based on current config and loaded modules (lsmod). Disables any module option that is not needed for the loaded modules.


Hacker News: Nvidia is evil because they don't have an open source driver !

Also Hacker News: AMD is evil because they increase the size of the kernel by 10%.

Honestly, I think Nvidia made the right call by just focused on ordinary users who just want the card to work and can just focus on their work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: