Run CUDA, unmodified, on AMD GPUs

modeless · 2024-07-15T19:22:27 1721071347

A lot of people think AMD should support these translation layers but I think it's a bad idea. CUDA is not designed to be vendor agnostic and Nvidia can make things arbitrarily difficult both technically and legally. For example I think it would be against the license agreement of cuDNN or cuBLAS to run them on this. So those and other Nvidia libraries would become part of the API boundary that AMD would need to reimplement and support.

Chasing bug-for-bug compatibility is a fool's errand. The important users of CUDA are open source. AMD can implement support directly in the upstream projects like pytorch or llama.cpp. And once support is there it can be maintained by the community.

eslaught · 2024-07-15T20:13:18 1721074398

Are you aware of HIP? It's officially supported and, for code that avoids obscure features of CUDA like inline PTX, it's pretty much a find-and-replace to get a working build:

https://github.com/ROCm/HIP

Don't believe me? Include this at the top of your CUDA code, build with hipcc, and see what happens:

https://gitlab.com/StanfordLegion/legion/-/blob/master/runti...

It's incomplete because I'm lazy but you can see most things are just a single #ifdef away in the implementation.

currymj · 2024-07-15T20:21:58 1721074918

if you're talking about building anything, that is already too hard for ML researchers.

you have to be able to pip install something and just have it work, reasonably fast, without crashing, and also it has to not interfere with 100 other weird poorly maintained ML library dependencies.

eslaught · 2024-07-15T21:55:29 1721080529

If your point is that HIP is not a zero-effort porting solution, that is correct. HIP is a low-effort solution, not a zero effort solution. It targets users who already use and know CUDA, and minimizes the changes that are required from pre-existing CUDA code.

In the case of these abstraction layers, then it would be the responsibility of the abstraction maintainers (or AMD) to port them. Obviously, someone who does not even use CUDA would not use HIP either.

To be honest, I have a hard time believing that a truly zero-effort solution exists. Especially one that gets high performance. Once you start talking about the full stack, there are too many potholes and sharp edges to believe that it will really work. So I am highly skeptical of original article. Not that I wouldn't want to be proved wrong. But what they're claiming to do is a big lift, even taking HIP as a starting point.

The easiest, fastest (for end users), highest-performance solution for ML will come when the ecosystem integrates it natively. HIP would be a way to get there faster, but it will take nonzero effort from CUDA-proficient engineers to get there.

currymj · 2024-07-16T14:48:09 1721141289

I agree completely with your last point.

As other commenters have pointed out, this is probably a good solution for HPC jobs where everyone is using C++ or Fortran anyway and you frequently write your own CUDA kernels.

From time to time I run into a decision maker who understandably wants to believe that AMD cards are now "ready" to be used for deep learning, and points to things like the fact that HIP mostly works pretty well. I was kind of reacting against that.

elashri · 2024-07-16T00:56:17 1721091377

As someone doing a lot of work with CUDA in a big research organization, there are few of us. If you are working with CUDA, then you are not from the type of people who wait to have something that just works like you describe. CUDA itself is a battle with poorly documented stuff.

bootsmann · 2024-07-15T20:51:43 1721076703

Don’t most orgs that are deep enough to run custom cuda kernels have dedicated engineers for this stuff. I can’t imagine a person who can write raw cuda not being able to handle things more difficult than pip install.

gaogao · 2024-07-15T21:40:15 1721079615

Engineers who are really, really good at CUDA are worth their weight in gold, so there's more projects for them than they have time. Worth their weight in gold isn't figurative here – the one I know has a ski house more expensive than 180 lbs of gold (~$5,320,814).

Willish42 · 2024-07-16T03:43:24 1721101404

The fact that "worth their weight in cold" typically means in the single-digit millions is fascinating to me (though I doubt I'll be able to get there myself, maybe someday). I looked it up though and I think this is undercounting the current value of gold per ounce/lb/etc.

5320814 / 180 / 16 = ~1847.5

Per https://www.apmex.com/gold-price and https://goldprice.org/, current value is north of $2400 / oz. It was around $1800 in 2020. That growth for _gold_ of all things (up 71% in the last 5 years) is crazy to me.

It's worth noting that anyone with a ski house that expensive probably has a net worth well over twice the price of that ski house. I guess it's time to start learning CUDA!

atwrk · 2024-07-16T08:10:43 1721117443

> That growth for _gold_ of all things (up 71% in the last 5 years) is crazy to me.

For comparison: S&P500 grew about the same during that period (more than 100% from Jan 2019, about 70 from Dec 2019), so the higher price of gold did not outperform the growth of the general (financial) economy.

dash2 · 2024-07-16T12:34:48 1721133288

But that's still surprising performance, because the S&P generates income and pays dividends. Its increase reflects (at least, is supposed to!) expectations of future higher income. Gold doesn't even bear interest....

t-3 · 2024-07-17T08:01:31 1721203291

Gold is commonly seen as a hedge against inflation and a decently stable non-currency store of value. With many countries having/being perceived to have high inflation during this time, the price of gold is bound to rise as well. Pretty much any economic or sociopolitical tremor will bounce up the price of the gold at least temporarily.

roenxi · 2024-07-17T04:10:40 1721189440

The S&P doesn't really pay much in the way of dividends does it? Last time I checked it was order-of-magnitude 1% which is a bit of a joke figure.

Anyway, there isn't a lot of evidence that the value of gold is going up. It seems to just be keeping pace with the M2. Both doubled-and-a-bit since 2010 (working in USD).

boulos · 2024-07-16T04:05:42 1721102742

Note: gold uses troy ounces, so adjust by ~10%. It's easier to just use grams or kilograms :).

Willish42 · 2024-07-16T17:02:44 1721149364

Thanks, I'm a bit new to this entire concept. Do troy lbs also exist, or is that just a term when measuring ounces?

someguydave · 2024-07-17T05:22:15 1721193735

yes, there are troy pounds, they are 12 troy ounces (not 16 ounces, like normal (avoirdupois) pounds)

https://en.wikipedia.org/wiki/Troy_weight

180 avoirdupois pounds is 2,625 ounces troy. The gold price is around $2470/ounce troy today, so $2470*2625 ~= $6.483 million

bbkane · 2024-07-15T21:53:22 1721080402

Would you (or your friend) be able to drop any good CUDA learning resources? I'd like to be worth my weight in gold...

throwaway81523 · 2024-07-16T06:43:05 1721112185

A working knowledge of C++, plus a bit of online reading about CUDA and the NVidia GPU architecture, plus studying the LCZero chess engine source code (the CUDA neural net part, I mean) seems like enough to get started. I did that and felt like I could contribute to that code, at least at a newbie level, given the hardware and build tools. At least in the pre-NNUE era, the code was pretty readable. I didn't pursue it though.

Of course becoming "really good" is a lot different and like anything else, it presumably takes a lot of callused fingertips (from typing) to get there.

suresk · 2024-07-16T21:56:30 1721166990

Having dabbled in CUDA, but not worked on it professionally, it feels like a lot of the complexity isn't really in CUDA/C++, but in the algorithms you have to come up with to really take advantage of the hardware.

Optimizing something for SIMD execution isn't often straightforward and it isn't something a lot of developers encounter outside a few small areas. There are also a lot of hardware architecture considerations you have to work with (memory transfer speed is a big one) to even come close to saturating the compute units.

mosselman · 2024-07-16T07:56:49 1721116609

The real challenge is probably getting your hands on a 4090 for a price you can pay before you are worth your weight in gold. Because an arm and a limb in gold is quite a lot.

throwaway81523 · 2024-07-16T09:05:21 1721120721

You don't really need a 4090. An older board is plenty. The software is basically the same. I fooled around with what I think was a 1080 on Paperspace for something like 50 cents an hour, but it was mostly with some Pytorch models rather than CUDA directly.

saagarjha · 2024-07-17T09:44:48 1721209488

Modern GPU architectures are quite different than what comes before them if you truly want to push them to their limits.

throwaway81523 · 2024-07-18T10:37:48 1721299068

Really old GPU's were different but the 1080 is similar to later stuff with a few features missing. Half precision and "tensor cores" iirc. It could be that the very most recent stuff has changed more (I haven't paid attention) but I thought that the 4090 was just another evolutionary step.

saagarjha · 2024-07-19T21:08:37 1721423317

Those are the features everyone is using, though.

wing-_-nuts · 2024-07-17T15:09:30 1721228970

Everyone and I mean everyone I know doing AI / ML work values VRAM above all. The absolute bang for buck are buying used p40's and if you actually want to have those cards be usable for other stuff, used 3090's are the best deal around and they should be ~ $700 right now.

saagarjha · 2024-07-19T21:07:50 1721423270

What they really value is bandwidth. More VRAM is just more bandwidth.

wing-_-nuts · 2024-07-22T23:42:25 1721691745

Well, to give an example, 32GB of vram would be vastly more preferable to 24GB of higher bandwidth vram. You really need to be able to put the entire LLM in memory for best results, because otherwise you're bottlenecking on the speed of transfer between regular old system ram and the gpu.

You'll also note that M1/2 macs with large amounts of system memory are good at inference because of the fact that the gpu has a very high speed interconnect between the soldiered on ram modules and the on die gpu. It's all about avoiding bottlenecks whereever possible.

touisteur · 2024-07-18T18:54:11 1721328851

Not really any paradigm shift since the introduction of Tensor Cores in NVIDIA archs. Anything Ampere or Lovelace, will do to teach yourself CUDA up to the crazy optimization techniques and the worst libraries that warp the mind. You'll only miss on HBM which allows you to cheat on memory bandwidth, amount of VRAM (teach yourself on smaller models...), double precision perf and double precision tensor cores (go for an A30 then and not sure they'll keep them - either the x30 bin, or DP tensor cores - ever since "DGEMM on Integer Matrix Multiplication Unit" - https://arxiv.org/html/2306.11975v4 ). FP4, DPX, TMA, GPUDirect are nice but you must be pretty far out already for them to be mandatory...

saagarjha · 2024-07-19T21:08:17 1721423297

"Cheating on bandwidth" is the name of the game right now.

ahepp · 2024-07-16T13:24:43 1721136283

I was looking into this recently and it seems like the cheapest AWS instance with a CUDA GPU is something on the order of $1/hr. It looks like an H100 instance might be $15/hr (although I’m not sure if I’m looking at a monthly price).

So yeah it’s not ideal if you’re on a budget, but it seems like there are some solutions that don’t involve massive capex.

throwaway81523 · 2024-07-16T13:44:05 1721137445

Look on vast.ai instead of AWS, you can rent machines with older GPU's dirt cheap. I don't see how they even cover the electricity bills. A 4090 machine starts at about $.25/hour though I didn't examine the configuration.

A new 4090 costs around $1800 (https://www.centralcomputer.com/asus-tuf-rtx4090-o24g-gaming...) and that's probably affordable to AWS users. I see a 2080Ti on Craigslist for $300 (https://sfbay.craigslist.org/scz/sop/d/aptos-nvidia-geforce-...) though used GPU's are possibly thrashed by bitcoin mining. I don't have a suitable host machine, unfortunately.

dotancohen · 2024-07-16T16:17:59 1721146679

Thrashed? What type of damage could a mostly-solid state device suffer? Fan problems? Worn PCi connectors? Deteriorating Arctic Ice from repeated heat cycling?

mschuster91 · 2024-07-16T20:46:31 1721162791

Heat. A lot of components - and not just in computers but everything hardware - are spec'd for something called "duty cycles", basically how long a thing is active in a specific time frame.

Gaming cards/rigs, which many of the early miners were based on, rarely run at 100% all the time, the workload is burst-y (and distributed amongst different areas of the system). In comparison, a miner runs at 100% all the time.

On top of that, for silicon there is an effect called electromigration [1], where the literal movement of electrons erodes the material over time - made worse by ever shrinking feature sizes as well as, again, the chips being used in exactly the same way all the time.

[1] https://en.wikipedia.org/wiki/Electromigration

ssl-3 · 2024-07-16T19:50:52 1721159452

Nope, none of those.

When people were mining Ethereum (which was the last craze that GPUs were capable of playing in -- BTC has been off the GPU radar for a long time), profitable mining was fairly kind to cards compared to gaming.

Folks wanted their hardware to produce as much as possible, for as little as possible, before it became outdated.

The load was constant, so heat cycles weren't really a thing.

That heat was minimized; cards were clocked (and voltages tweaked) to optimize the ratio of crypto output to Watts input. For Ethereum, this meant undervolting and underclocking the GPU -- which are kind to it.

Fan speeds were kept both moderate and tightly controlled; too fast, and it would cost more (the fans themselves cost money to run, and money to replace). Too slow, and potential output was left on the table.

For Ethereum, RAM got hit hard. But RAM doesn't necessarily care about that; DRAM in general is more or less just an array of solid-state capacitors. And people needed that RAM to work reliably -- it's NFG to spend money producing bad blocks.

Power supplies tended to be stable, because good, cheap, stable, high-current, and stupidly-efficient are qualities that go hand-in-hand thanks to HP server PSUs being cheap like chips.

There were exceptions, of course: Some people did not mine smartly.

---

But this is broadly very different from how gamers treat hardware, wherein: Heat cycles are real, over clocking everything to eek out an extra few FPS is real, pushing things a bit too far and producing glitches can be tolerated sometimes, fan speeds are whatever, and power supplies are picked based on what they look like instead of an actual price/performance comparison.

A card that was used for mining is not implicitly worse in any way than one that was used for gaming. Purchasing either thing involves non-zero risk.

metadat · 2024-07-16T23:14:48 1721171688

> That heat was minimized; cards were clocked (and voltages tweaked) to optimize the ratio of crypto output to Watts input. For Ethereum, this meant undervolting and underclocking the GPU -- which are kind to it.

> Fan speeds were kept both moderate and tightly controlled; too fast, and it would cost more (the fans themselves cost money to run, and money to replace). Too slow, and potential output was left on the table.

In the ideal case, this is spot on. Annoyingly however, this hinges on the assumption of an awful lot of competence from top to bottom.

If I've learned anything in my considerable career, it's that reality is typically one of the first things tossed when situations and goals become complex.

The few successful crypto miners maybe did some of the optimizations you mention. The odds aren't good enough for me to want to purchase a Craigslist or FB marketplace card for only a 30% discount.

I do genuinely admire your idealism, though.

ssl-3 · 2024-07-17T17:25:58 1721237158

It isn't idealism. It's background to cover the actual context:

A used card is it sale. It was previously used for mining, or it was previously used for gaming.

We can't tell, and caveat emptor.

Which one is worse? Neither.

SonOfLilit · 2024-07-16T16:35:08 1721147708

replying to sibling @dotancohen, they melt, and they suffer from thermal expansion and compression

robotnikman · 2024-07-16T16:50:53 1721148653

Are there any certifications or other ways to prove your knowledge to employers in order to get your foot in the door?

8n4vidtmkvmk · 2024-07-16T07:40:12 1721115612

Does this pay more than $500k/yr? I already know C++, could be tempted to learn CUDA.

throwaway81523 · 2024-07-16T09:00:49 1721120449

I kinda doubt it. Nobody paid me to do that though. I was just interested in LCZero. To get that $500k/year, I think you need up to date ML understanding and not just CUDA. CUDA is just another programming language while ML is a big area of active research. You could watch some of the fast.ai ML videos and then enter some Kaggle competitions if you want to go that route.

almostgotcaught · 2024-07-16T12:37:37 1721133457

You're wrong. The people building the models don't write CUDA kernels. The people optimizing the models write CUDA kernels. And you don't need to know a bunch of ML bs to optimize kernels. Source: I optimize GPU kernels. I don't make 500k but I'm not that far from.

HarHarVeryFunny · 2024-07-16T14:58:00 1721141880

How much performance difference is there between writing a kernel in a high level language/framework like PyTorch (torch.compile) or Triton, and hand optimizing? Are you writing kernels in PTX?

What's your opinion on the future of writing optimized GPU code/kernels - how long before compilers are as good or better than (most) humans writing hand-optimized PTX?

throwaway81523 · 2024-07-16T20:54:32 1721163272

The CUDA version of LCZero was around 2x or 3x faster than the Tensorflow(?) version iirc.

throwaway81523 · 2024-07-16T12:56:20 1721134580

Heh I'm in the wrong business then. Interesting. Used to be that game programmers spent lots of time optimizing non-ML CUDA code. They didn't make anything like 500k at that time. I wonder what the ML industry has done to game development, or for that matter to scientific programming. Wow.

eigenvalue · 2024-07-15T22:09:41 1721081381

That’s pretty funny. Good test of value across the millennia. I wonder if the best aqueduct engineers during the peak of Ancient Rome’s power had villas worth their body weight in gold.

Winse · 2024-07-15T23:26:47 1721086007

Lol. For once being overweight may come with some advantages here.

necovek · 2024-07-16T06:02:22 1721109742

Or disadvantages: you may be as rich as your skinny neighbour, but they are the only ones worth their weight in gold ;)

radarsat1 · 2024-07-17T06:33:18 1721197998

Selection bias. I'm sure there are lots of people who are really good at CUDA and don't have those kind of assets. Not everyone knows how to sell their skills.

Der_Einzige · 2024-07-17T12:37:52 1721219872

Right now, nvidias valuations have made a lot of people realize that their CUDA skills were being undervalued. Anyone with GPU or ML skills who hasn’t tried to get a pay raise in this market deserves exactly the life that they are living.

smallnamespace · 2024-07-17T07:16:07 1721200567

Unfortunately it's also hard to buy (find) people who don't know how to sell.

iftheshoefitss · 2024-07-16T08:02:59 1721116979

What do people study to figure out CUDA? I’m studying to get me GED and hope to go to school one day

paulmd · 2024-07-16T18:17:36 1721153856

Computer science. This is a grad level topic probably.

Nvidia literally wrote most of the textbooks in this field and you’d probably be taught using one of these anyway:

https://developer.nvidia.com/cuda-books-archive

“GPGPU Gems” is another “cookbook” sort of textbook that might be helpful starting out but you’ll want a good understanding of the SIMT model etc.

amelius · 2024-07-16T10:32:43 1721125963

Just wait until someone trains an ML model that can translate any CUDA code into something more portable like HIP.

GP says it is just some #ifdefs in most cases, so an LLM should be able to do it, right?

FuriouslyAdrift · 2024-07-16T13:52:17 1721137937

OpenAI Triton? Pytorch 2.0 already uses it.

https://openai.com/index/triton/

phkahler · 2024-07-16T13:26:35 1721136395

>> Don’t most orgs that are deep enough to run custom cuda kernels have dedicated engineers for this stuff. I can’t imagine a person who can write raw cuda not being able to handle things more difficult than pip install.

This seems to be fairly common problem with software. The people who create software regularly deal with complex tool chains, dependency management, configuration files, and so on. As a result they think that if a solutions "exists" everything is fine. Need to edit a config file for your particular setup? No problem. The thing is, I have been programming stuff for decades and I really hate having to do that stuff and will avoid tools that make me do it. I have my own problems to solve, and don't want to deal with figuring out tools no matter how "simple" the author thinks that is to do.

A huge part of the reason commercial software exists today is probably because open source projects don't take things to this extreme. I look at some things that qualify as products and think they're really simplistic, but they take care of some minutia that regular people are will to pay so they don't have to learn or deal with it. The same can be true for developers and ML researchers or whatever.

ezekiel68 · 2024-07-15T23:10:58 1721085058

> if you're talking about building anything, that is already too hard for ML researchers.

I don't think so. I agree it is too hard for the ML researches at the companies which will have their rear ends handed to them by the other companies whose ML researchers can be bothered to follow a blog post and prompt ChatGPT to resolve error messages.

currymj · 2024-07-16T14:43:23 1721141003

I'm not really talking about companies here for the most part, I'm talking about academic ML researchers (or industry researchers whose role is primarily academic-style research). In companies there is more incentive for good software engineering practices.

I'm also speaking from personal experience: I once had to hand-write my own CUDA kernels (on official NVIDIA cards, not even this weird translation layer): it was useful and I figured it out, but everything was constantly breaking at first.

It was a drag on productivity and more importantly, it made it too difficult for other people to run my code (which means they are less likely to cite my work).

jokethrowaway · 2024-07-16T00:50:26 1721091026

a lot of ML researchers stay pretty high level and reinstall conda when things stop working

and rightly so, they have more complicated issues to tackle

It's on developers to provide better infrastructure and solve these challenges

LtWorf · 2024-07-16T04:13:37 1721103217

Not rightly. It'd be faster on the long term to address the issues.

bayindirh · 2024-07-16T05:57:01 1721109421

Currently nobody think that long term. They just reinstall, that’s it.

jchw · 2024-07-15T20:56:00 1721076960

The target audience of interoperability technology is whoever is building, though. Ideally, interoperability technology can help software that supports only NVIDIA GPUs today go on to quickly add baseline support for Intel and AMD GPUs tomorrow.

(and for one data point, I believe Blender is actively using HIP for AMD GPU support in Cycles.)

Agingcoder · 2024-07-15T21:08:06 1721077686

Their target is hpc users, not ml researchers. I can understand why this would be valuable to this particular crowd.

klik99 · 2024-07-16T01:45:59 1721094359

God this explains so much about my last month, working with tensorflow lite and libtorch in C++

SushiHippie · 2024-07-15T22:16:50 1721081810

AMD has hipify for this, which converts cuda code to hip.

https://github.com/ROCm/HIPIFY

3abiton · 2024-07-16T19:15:46 1721157346

There is more glaring issue, ROCm doesn't even work well on most AMD devices nowadays, and hip performance wise deterioriates on the same hardware compared to ROCm.

boroboro4 · 2024-07-16T21:54:15 1721166855

It supports all of current datacenter GPUs.

If you want to write very efficient CUDA kernel for modern datacenter NVIDIA GPU (read H100), you need to write it with having hardware in mind (and preferably in hands, H100 and RTX 4090 behave very differently in practice). So I don't think the difference between AMD and NVIDIA is as big as everyone perceives.

jph00 · 2024-07-15T22:50:56 1721083856

Inline PTX is hardly an obscure feature. It's pretty widely used in practice, at least in the AI space.

saagarjha · 2024-07-16T00:44:40 1721090680

Yeah, a lot of the newer accelerators are not even available without using inline PTX assembly. Even the ones that are have weird shapes that are not amenable to high-performance work.

HarHarVeryFunny · 2024-07-16T14:17:05 1721139425

Are you saying that the latest NVIDIA nvcc doesn't support the latest NVIDIA devices?

adrian_b · 2024-07-16T15:03:12 1721142192

For any compiler, "supporting" a certain CPU or GPU only means that they can generate correct translated code with that CPU or GPU as the execution target.

It does not mean that the compiler is able to generate code that has optimal performance, when that can be achieved by using certain instructions without a direct equivalent in a high-level language.

No compiler that supports the Intel-AMD ISA knows how to use all the instructions available in this ISA.

HarHarVeryFunny · 2024-07-16T15:21:08 1721143268

Sure, but I'm not sure if that is what the parent poster was saying (that nvcc generates poor quality PTX for newer devices).

It's been a while since I looked at CUDA, but it used to be that NVIDIA were continually extending cuDNN to add support for kernels needed by SOTA models, and I assume these kernels were all hand optimized.

I'm curious what kind of models people are writing where not only is there is no optimized cuDNN support, but also solutions like Triton or torch.compile, and even hand optimized CUDA C kernels are too slow. Are hand written PTX kernels really that common ?

saagarjha · 2024-07-17T07:53:13 1721202793

Yes. Take a look at, say, CUTLASS: you'll see that they use PTX instructions because there are no intrinsics, much less automatic compiler lowering, for the accelerators they target.

HarHarVeryFunny · 2024-07-17T13:26:51 1721222811

Yes, but that's an NVIDIA project, so would be expected to be hand optimized, same as their cuDNN kernels.

I'm more curious about what types of model people in research or industry are developing, where NVIDIA support such as this is not enough, and they are developing their own PTX kernels.

pjmlp · 2024-07-16T05:12:43 1721106763

How does it run CUDA Fortran?

blitzar · 2024-07-15T20:03:53 1721073833

It would be good if AMD did something, anything.

Support this, reimplement that, support upstream efforts, dont really care. Any of those would cost a couple of million and be worth a trillion dollars to AMD shareholders.

chatmasta · 2024-07-15T23:40:18 1721086818

Is it weird how the comments here are blaming AMD and not Nvidia? Sure, the obvious argument is that Nvidia has no practical motivation to build an open platform. But there are counterexamples that suggest otherwise (Android). And there is a compelling argument that long term, their proprietary firmware layer will become an insufficient moat to their hardware dominance.

Who’s the root cause? The company with the dominant platform that refuses to open it up, or the competitor who can’t catch up because they’re running so far behind? Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard. No matter what they do, they’ll need to have a compatibility layer. And in that case maybe it makes sense for them to invest in the best one that emerges from the community.

lmm · 2024-07-16T00:41:15 1721090475

> Is it weird how the comments here are blaming AMD and not Nvidia?

Nvidia has put in the legwork and are reaping the rewards. They've worked closely with the people who are actually using their stuff, funding development and giving loads of support to researchers, teachers and so on, for probably a decade now. Why should they give all that away?

> But there are counterexamples that suggest otherwise (Android).

How is Android a counterexample? Google makes no money off of it, nor does anyone else. Google keeps Android open so that Apple can't move everyone onto their ad platform, so it's worth it for them as a strategic move, but Nvidia has no such motive.

> Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard.

Maybe. But again, that's because NVidia has been putting in the work to make something better for a decade or more. The best time for AMD to start actually trying was 10 years ago; the second-best time is today.

Zambyte · 2024-07-16T15:23:08 1721143388

> Google makes no money off of it, nor does anyone else

Google makes no money off of Android? That seems like a really weird claim to make. Do you really think Google would be anywhere near as valuable of a company if iOS had all of the market share that the data vacuum that is Android has? I can't imagine that being the case.

Google makes a boatload off of Android, just like AMD would if they supported open GPGPU efforts aggressively.

rjurney · 2024-07-16T18:45:45 1721155545

Android is a complement to Google's business, which is when open source works. What would be the complement worth $1 Trillion to NVIDIA to build a truly open platform? There isn't one. That was his point.

chatmasta · 2024-07-16T20:27:29 1721161649

There’s an entire derivative industry of GPUs, namely GenAI and LLM providers, that could be the “complement” to an open GPU platform. The exact design and interface between such a complement and platform is yet undefined, but I’m sure there are creative approaches to this problem.

rjurney · 2024-07-16T20:56:32 1721163392

And NVIDIA is playing in that game too. Why would they not play in higher level services as well? They already publish the source to their entire software stack. A comparison to Android is completely useless. Google is a multi-sided platform that does lots of things for free for some people (web users, Android users) so it can charge other people for their data (ad buyers). That isn't the chip business whatsoever. The original comment only makes sense if you know nothing about their respective business models.

chatmasta · 2024-07-16T21:56:51 1721167011

Yes, so when the ground inevitably shifts below their feet (it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate), wouldn’t it be better for them to own that platform?

On the other hand, they could always wait for the most viable threat to emerge and then pay a few billion dollars to acquire it and own its direction. Google didn’t invent Android, after all…

> Google is a multi-sided platform that does lots of things for free for some people… That isn't the chip business whatsoever.

This is a reductionist differentiation that overlooks the similarities between the platforms of “mobile” and “GPU” (and also mischaracterizes the business model of Google, who does in fact make money directly from Android sales, and even moved all the way down the stack to selling hardware). In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.

Yes, Google’s top level money printer is advertising, and everything they do (including Android) is about controlling the maximum number of layers below that money printer. But that doesn’t mean there is no benefit to Nvidia doing the same. They might approach it differently, since they currently own the bottom layer whereas Google started from the top layer. But the end result of controlling the whole stack will lead to the same benefits.

And you even admit in your comment that Nvidia is investing in these higher levels. My argument is that they are jeopardizing the longevity of these high-level investments due to their reluctance to invest in an open platform at the bottom layer (not even the bottom, but one level above their hardware). This will leave them vulnerable to encroachment by a player that comes from a higher level, like OpenAI for example, who gets to define the open platform before Nvidia ever has a chance to own it.

lmm · 2024-07-17T00:12:53 1721175173

> it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate

30 years ago people were making the same argument that MS should have kept DirectX open or else they were going to lose to OpenGL. Look how that's worked out for them.

> Google, who does in fact make money directly from Android sales

They don't though. They have some amount of revenue from it, but it's a loss-making operation.

> In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.

But which layer is the differentiator, and which layer is just commodity? Google gives away Android because it isn't better than iOS and isn't trying to be; "good enough" is fine for their business (if anything, being open is a way to stay relevant where they would otherwise fall behind). They don't give away the ad-tech, nor would they open up e.g. Maps data where they have a competitive advantage.

NVidia has no reason to open up CUDA; they have nothing to gain and a lot to lose by doing so. They make a lot of their money from hardware sales which they would open up to cannibalisation, and CUDA is already the industry standard that everyone builds on and stays compatible with. If there was ever a real competitive threat then that might change, but AMD has a long way to go to get there.

rjurney · 2024-07-22T18:24:09 1721672649

"Open up CUDA" - guys, its all open source. What do you want them to do? Do tech support to help their competitors compete against them? AMD is to blame for not building this project 10 years ago.

michaelt · 2024-07-16T17:23:02 1721150582

Google gave away the software platform - Android - to hardware vendors for free, vendors compete making the hardware into cheap, low-margin commodity items, and google makes boatloads of money from ads, tracking and the app store.

nvidia could give away the software platform - CUDA - to hardware vendors for free, making the hardware into cheap, low-margin commodity items. But how would they make boatloads of money when there's nowhere to put ads, tracking or an app store?

nemothekid · 2024-07-16T04:00:49 1721102449

>Is it weird how the comments here are blaming AMD and not Nvidia?

It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).

Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.

pjmlp · 2024-07-16T05:17:16 1721107036

Khrons, AMD and Intel have had 15 years to make something out of OpenCL that could rival CUDA.

Instead they managed 15 years of disappointment, in a standard stuck in C99, that adopted C++ and a polyglot bytecode too late to matter, never produced an ecosystem of IDE tooling and GPU libraries.

Naturally CUDA became the standard, when NVIDIA provided what the GPU community cared about.

roenxi · 2024-07-16T00:53:06 1721091186

> Is it weird how the comments here are blaming AMD and not Nvidia?

Not even a little bit. It simply isn't Nvidia's job to provide competitive alternatives to Nvidia. Competing is something AMD must take responsibility for.

The only reason CUDA is such a big talking point is because AMD tripped over their own feet supporting accelerated BLAS on AMD GPUs. Realistically it probably is hard to implement (AMD have a lot of competent people on staff) but Nvidia hasn't done anything unfair apart from execute so well that they make all the alternatives look bad.

jkmcf · 2024-07-16T02:21:03 1721096463

I agree with you, but replace NVIDIA with Apple. What would the EU say?

LtWorf · 2024-07-16T04:17:00 1721103420

I don't think nvidia bans anyone from running code on their devices.

Zambyte · 2024-07-16T15:29:34 1721143774

https://www.pcgamer.com/nvidia-officially-confirms-hash-rate...

Also: look into why the Nouveau driver performance is limited.

paulmd · 2024-07-16T18:26:51 1721154411

so terrible that vendors can enforce these proprietary licenses on software they paid to develop /s

Zambyte · 2024-07-16T19:18:54 1721157534

Huh? Why the sarcasm? You think it's a good thing that someone besides the person who owns the hardware has the final say on what the hardware is allowed to be used for?

paulmd · 2024-07-18T04:01:32 1721275292

of course not, but that's not actually a thing.

and you don't have the final say on what NVIDIA is allowed to do with their software either

Zambyte · 2024-07-18T05:01:25 1721278885

That's not actually a thing? I specifically moved away from Nvidia because

1) they choose (chose?) not to supprt standard display protocols that Wayland compositors target with their drivers (annoying, but not the end of the world)

2) they cryptographically lock users out of writing their own drivers for their own graphics cards (which should be illegal and is exactly contradictory to "that's not actually a thing").

Again: look into why the Nouveau driver performance is limited.

padthai · 2024-07-16T06:26:39 1721111199

They do from time to time: https://wirelesswire.jp/2017/12/62708/

kbolino · 2024-07-16T13:59:37 1721138377

This seems to be more about certain devices (consumer-grade GPUs) in certain settings (data centers), though I do question how enforceable it actually is. My guess is that it can only apply when you try to get discounts from bulk-ordering GPUs.

Also, was there any followup to this story? It seems a bit unnecessary because nVidia has already neutered consumer cards for many/most data center purposes by not using ECC and by providing so few FP64 units that double precision FLOPS is barely better than CPU SIMD.

paulmd · 2024-07-16T18:28:34 1721154514

it’s also not really a thing anymore because of the open kernel driver… at that point it’s just MIT licensed.

of course people continued to melt down about that for some reason too, in the customary “nothing is ever libre enough!” circular firing squad. Just like streamline etc.

There’s a really shitty strain of fanboy thought that wants libre software to be actively worsened (even stonewalled by the kernel team if necessary) so that they can continue to argue against nvidia as a bad actor that doesn’t play nicely with open source. You saw it with all these things but especially with the open kernel driver, people were really happy it didn’t get upstreamed. Shitty behavior all around.

You see it every time someone quotes Linus Torvalds on the issue. Some slight from 2006 is more important than users having good, open drivers upstreamed. Some petty brand preferences are legitimately far important than working with and bringing that vendor into the fold long-term, for a large number of people. Most of whom don’t even consider themselves fanboys! They just say all the things a fanboy would say, and act all the ways a fanboy would act…

whywhywhywhy · 2024-07-16T11:43:10 1721130190

>Is it weird how the comments here are blaming AMD and not Nvidia?

Because it IS AMD/Apple/etcs fault for the position they're in right now. CUDA showed where the world was heading and where the gains in compute would be made well over a decade ago now.

They even had OpenCL, didn't put the right amount of effort into it, all the talent found CUDA easier to work with so built there. Then what did AMD, Apple do? Double down and try and make something better and compete? Nah they fragmented and went their own way, AMD with what feels like a fraction of the effort even Apple put in.

From the actions of the other teams in the game it's not hard to imagine a world without CUDA being a world where this tech is running at a fraction of it's potential.

immibis · 2024-07-16T11:55:30 1721130930

It's always been on the straggler to catch up by cheating. That's just how the world works - even in open source. If AMD supported CUDA, it would have a bigger market share. That's a fact. Nvidia doesn't want that. That's a fact. But when Reddit started, it just scraped feeds from Digg, and when Facebook started, it let you link your MySpace credentials and scraped your MySpace account. Adversarial interoperability is nothing new.

cogman10 · 2024-07-16T15:56:46 1721145406

Funnily, who I blame the most for there not being real competition to CUDA is apple. As of late, Apple has been really pushing for vender lock in APIs rather than adopting open standards. The end result is you can get AMD and Intel onboard with some standard which is ultimately torpedoed by apple. (See apple departing from and rejecting everything that comes from the khronos group).

With the number of devs that use Apple silicon now-a-days, I have to think that their support for khronos initiatives like SYCL and OpenCL would have significantly accelerated progress and adoption in both.

We need an open standard that isn't just AMD specific to be successful in toppling CUDA.

aprilthird2021 · 2024-07-18T01:37:59 1721266679

Because Nvidia has made a compelling product and AMD has not...

slashdave · 2024-07-15T21:37:00 1721079420

ROCm counts as "something"

curt15 · 2024-07-15T22:13:37 1721081617

Pretty much any modern NVIDIA GPU supports CUDA. You don't have to buy a datacenter-class unit to get your feet wet with CUDA programming. ROCm will count as "something" when the same is true for AMD GPUs.

jacoblambda · 2024-07-15T23:12:10 1721085130

ROCm supports current gen consumer gpus officially and a decent chunk of recent gen consumer gpus unofficially. Not all of them of course but a decent chunk.

It's not ideal but I'm pretty sure CUDA didn't support everything from day 1. And ROCm is part of AMD's vendor part of the Windows AI stack so from upcoming gen on out basically anything that outputs video should support ROCm.

ChoGGi · 2024-07-16T06:43:14 1721112194

No, but CUDA at least supported the 8800 gt on release [1]. ROCm didn't support any consumer cards on release, looks like they didn't support any till last year? [2]

[1]https://www.gamesindustry.biz/nvidia-unveils-cuda-the-gpu-co...

[2]https://www.tomshardware.com/news/amd-rocm-comes-to-windows-...

muxr · 2024-07-15T22:31:39 1721082699

I don't think AMD needs to support 5+ year old GPUs personally. And all the recent generations are already practically supported.

AMD only claims support for a select few GPUs, but in my testing I find all the GPUs work fine if the architecture is supported. I've tested rx6600, rx6700xt for example and even though they aren't officially supported, they work fine on ROCm.

Dylan16807 · 2024-07-16T02:15:58 1721096158

> 5+ year old GPUs

AMD had a big architecture switchover exactly 5 years ago, and the full launch wasn't over until 4.5 years ago. I think that generation should have full support. Especially because it's not like they're cutting support now. They didn't support it at launch, and they didn't support it after 1, 2, 3, 4 years either.

The other way to look at things, I'd say that for a mid to high tier GPU to be obsolete based on performance, the replacement model needs to be over twice as fast. 7700XT is just over 50% faster than 5700XT.

imtringued · 2024-07-16T08:16:22 1721117782

I'm on a 5+ year old GPU, because I don't trust AMD to offer a compelling GPU that actually works. An RX 7 570 is good enough for the little gaming I do. It mostly acts as an oversized iGPU that has good Linux drivers, but since AMD is not supporting ROCm on this GPU, there is no need to hurry on upgrading to a better GPU or to get my feet wet on running things locally on the GPU like Stable Diffusion, LLMs, etc.

squidgyhead · 2024-07-16T00:13:04 1721088784

Here is the support list:

https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

mappu · 2024-07-16T00:57:15 1721091435

AMD's definition of "support" I think is different than what people expect, and pretty misleading - ROCm itself will run on almost anything, back as far as the RX 400/500 series:

https://en.wikipedia.org/wiki/ROCm#:~:text=GCN%205%20%2D%20V...

Stable Diffusion ran fine for me on RX 570 and RX 6600XT with nothing but distro packages.

slavik81 · 2024-07-16T18:00:33 1721152833

There are out-of-bounds writes in the BLAS libraries for gfx803 GPUs (such as the RX 570). That hardware might work fine for your use case, but there's a lot of failures in the test suites.

I agree that the official support list is very conservative, but I wouldn't recommend pre-Vega GPUs for use with ROCm. Stick to gfx900 and newer, if you can.

Nab443 · 2024-07-16T08:59:12 1721120352

The last time I checked, I was stuck with a pretty old kernel if I wanted to have the last version of ROCm available for my rx470. It's compatible at some point in time, but not kept compatible with recent kernels.

mappu · 2024-07-16T22:04:35 1721167475

It's the responsibility of your distro to ship things that work together,

imtringued · 2024-07-16T08:18:34 1721117914

I don't buy it. Even running things like llama.cpp on my RX 570 via Vulkan crashes the entire system.

slashdave · 2024-07-16T17:56:42 1721152602

AMD should focus their efforts on competitive hardware offerings, because that is where the need and the money is. Sorry, I don't think the hobbyist should be a priority.

bavell · 2024-07-16T15:43:14 1721144594

Huh? I've been running ROCm for SD and LLMs for over a year and a half on my puny consumer 6750X - not even latest gen.

oezi · 2024-07-15T21:12:18 1721077938

A couple of million doesn't get you anything in corporate land

spacebanana7 · 2024-07-15T22:12:50 1721081570

A couple dozen billion for a 10% chance of becoming NVIDIA competitive is worth it, looking at the stock prices.

oezi · 2024-07-16T19:39:47 1721158787

Billions. Now we are talking.

langcss · 2024-07-17T10:52:21 1721213541

That or Geohot

fngjdflmdflg · 2024-07-15T19:52:18 1721073138

>Nvidia can make things arbitrarily difficult both technically and legally.

I disagree. AMD can simply not implement those APIs, similar to how game emulators implement the most used APIs first and sometimes never bother implementing obscure ones. It would only matter that NVIDIA added eg. patented APIs to CUDA if those APIs were useful. In which case AMD should have a way to do them anyway. Unless NVIDIA comes up with a new patented API which is both useful and impossible to implement in any other way, which would be bad for AMD in any event. On the other hand, if AMD start supporting CUDA and people start using AMD cards, then developers will be hesitant to use APIs that only work on NVIDIA cards. Right now they are losing billions of dollars on this. Then again they barely seem capable of supporting RocM on their cards, much less CUDA.

You have a fair point in terms of cuDNN and cuBLAS but I don't know that that kind of ToS is actually binding.

selimnairb · 2024-07-15T23:28:31 1721086111

Patented API? I thought Google v. Oracle settled this? Making an implementation of an API spec is fair use, is it not?

fngjdflmdflg · 2024-07-16T01:18:12 1721092692

My understanding is that Google v. Oracle only applies to copyright.

nl · 2024-07-16T03:18:18 1721099898

Well you can't patent an API so....

fngjdflmdflg · 2024-07-16T05:22:15 1721107335

You can patent the implementation. You can't patent the API name DecodeH265Video() but you can still sue someone for implementing that function correctly.

anticensor · 2024-07-16T19:40:27 1721158827

If there is only one way to solve a problem, there is nothing to invent, just discover, and discoveries are decidedly not patentable.

fngjdflmdflg · 2024-07-17T15:50:15 1721231415

>If there is only one way to solve a problem

h.265 is one way, av1 is another way.

apatheticonion · 2024-07-16T02:40:26 1721097626

Agreed. Rather than making CUDA the standard; AMD should push/drive an open standard that can be run on any hardware.

We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal.

All of the big tech companies are obsessed with ring-fencing developers behind the thin veil of "innovation" - where really it's just good for business (I swear it should be regulated because it's really bad for consumers).

A CUDA translation layer is okay for now but it does risk CUDA becoming the standard API. Personally, I am comfortable with waiting on an open standard to take over - ROCm has serviced my needs pretty well so far.

Just wish GPU sharing with VMs was as easy as CPU sharing.

naasking · 2024-07-16T12:12:07 1721131927

> AMD should push/drive an open standard that can be run on any hardware.

AMD has always been notoriously bad at the software side, and they frequently abandon their projects when they're almost usable, so I won't hold my breath.

ChoGGi · 2024-07-16T06:48:51 1721112531

"We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal."

I'll definitely agree with you on Sync and Vulkan, but dlss and xess are both better than fsr.

https://youtube.com/watch?v=el70HE6rXV4

amy-petrik-214 · 2024-07-16T02:50:14 1721098214

we actually also saw this historically with openGL. openGL comes from an ancient company whispered about by the elderly programmers (30 + year old) known as SGI. Originally it was CLOSED SOURCE and SGI called it "SGI-GL" for a computer codename IRIS which was cool looking with bright popping color plastic and faux granite keyboard. Good guy SGI open sourced SGI-GL to become what we called "openGL" (get it, now it's open), and then it stuck.

That's all to say NVIDIA could pull a SGI and open their stuff, but they're going more sony style and trying to monopolize. Oh, and SGI also wrote another ancient lore library known as "STL" or the "SGI Template Library" which is like the original boost template metaprogramming granddaddy

usr1106 · 2024-07-16T06:27:16 1721111236

Nice story, but is it correct? Wikipedia says STL was first implemented by HP and later by the same authors at SGI.

adrian_b · 2024-07-16T15:15:35 1721142935

STL started even earlier, obviously without using the name "STL", as a library of generic algorithms for the programming language Ada (David R. Musser & Alexander A. Stepanov, 1987).

gpderetta · 2024-07-18T09:08:42 1721293722

Rumor is that STL stands for STepanov and (Meng) Lee.

adrian_b · 2024-07-16T05:25:40 1721107540

Also the XFS file system.

gjulianm · 2024-07-16T08:04:56 1721117096

OpenCL was released in 2009. AMD has had plenty of time to push and drive that standard. But OpenCL had a worse experience than CUDA, and AMD wasn't up to the task in terms of hardware, so it made no real sense to go for OpenCL.

pjmlp · 2024-07-16T05:24:49 1721107489

Vulkan only matters on Android (from version 10 onwards) and GNU/Linux.

Zero impact on Switch, Playstation, XBox, Windows, macOS, iOS, iPadOS, Vision OS.

ChoGGi · 2024-07-16T06:50:27 1721112627

"Windows"

dxvk-gplasync is a game changer for dx9-11 shader stutter.

pjmlp · 2024-07-16T08:16:54 1721117814

Sure, for the 2% folks that enjoy Windows games, written againt DirectX, on Linux Steam Store.

Which Android Studios can't even be bothered to target with their NDK engines, based on GL ES, Vulkan.

ChoGGi · 2024-07-16T13:33:12 1721136792

I'm on windows 11, if I see not dx12 in my afterburner overlay, I use it.

Even if there's no shader stutter, Vulkan tends to use less juice than DX.

gpderetta · 2024-07-18T09:11:39 1721293899

Doesn't Intel now ship DXVK with their new graphic card on Windows as their DX11 drivers were "less than optimal"?

ChoGGi · 2024-07-18T13:33:20 1721309600

Their dx9 drivers seem to, so I wouldn't be surprised if they did it for 10-11 as well.

https://news.ycombinator.com/item?id=33903420

imtringued · 2024-07-16T08:21:41 1721118101

AMD shouldn't push on anything. They have the wrong incentives. They should just make sure that software runs on their GPUs and nothing else.

Karol Herbst is working on Rusticl, which is mesa's latest OpenCL implementation and will pave the way for other things such as SYCL.

apatheticonion · 2024-07-21T22:32:31 1721601151

I agree with aspects of this take. In my original post I think that "should" is a strong word.

Realistically companies only have an obligation to make themselves profitable so really companies "should" only strive for profitability within the boundaries of the law above all else.

AMD have no obligation to drive an open standard, it's at their discretion to choose that approach - and it might actually come at the cost of profitability as it opens them up to competitors.

In this case - I believe that hardware & platform software companies that distribute a closed platform which cannot be genuinely justified as anything other than intending to prevent consumers from using competitor products "should" be moderated by regulator intervention as it results in a slower rate of innovation and poor outcomes for consumers.

That said, dreaming for the regulation of American tech giants is a pipe dream, haha.

Const-me · 2024-07-15T20:08:51 1721074131

> Nvidia can make things arbitrarily difficult both technically and legally

Pretty sure APIs are not copyrightable, e.g. https://www.law.cornell.edu/supremecourt/text/18-956

> against the license agreement of cuDNN or cuBLAS to run them on this

They don’t run either of them, they instead implement an equivalent API on top of something else. Here’s a quote: “Open-source wrapper libraries providing the "CUDA-X" APIs by delegating to the corresponding ROCm libraries. This is how libraries such as cuBLAS and cuSOLVER are handled.”

dralley · 2024-07-15T20:24:02 1721075042

I believe it was decided that they are copyrightable but that using them for compatibility purposes is fair use.

kbolino · 2024-07-15T20:45:55 1721076355

No, it's stranger than that: SCOTUS did not rule on copyrightability of APIs at all, but simply ruled that even if they are copyrightable, what Google did (completely reimplement Sun/Oracle's public API) was still fair use.

mrandish · 2024-07-15T20:59:40 1721077180

It would have been nice to get a clear SCOTUS precedent on this. On the other hand, I also value a SCOTUS which rules minimally and narrowly by default (I also appreciate SCOTUS' return to stricter constitutional grounding in the past decade).

hobs · 2024-07-15T21:51:10 1721080270

Incredibly loud laughing from the lawyers whose study of law is being thrown around willy nilly because of all the unprecedented joke decisions they are making right now.

kbolino · 2024-07-15T22:00:08 1721080808

We are stuck between a rock and a hard place politically. The real decisions should be coming from Congress not the courts. However, Congress is too disorganized and disconnected to answer the important questions, leaving the courts to either muddle along or else become semi-dictatorial. In most countries, this would cause a constitutional crisis, but the modern U.S. system seems to be a little too resilient to such otherwise concerning signals.

hobs · 2024-07-15T22:51:08 1721083868

We're far past a constitutional crisis, and the courts taking power nobody wanted to give to them (who wasn't interested in a unitary executive at least) isn't a good solution.

kbolino · 2024-07-15T23:05:40 1721084740

What constitutional crisis has occurred that hasn't been resolved?

Constitutional crises involve fundamental breaks in the working of government that bring two or more of its elements into direct conflict that can't be reconciled through the normal means. The last of these by my accounting was over desegregation, which was resolved with the President ordering the Army to force the recalcitrant states to comply. Before that was a showdown between the New Deal Congress and the Supreme Court, which the former won by credibly threatening to pack the latter (which is IMO a much less severe crisis but still more substantial than anything happening today). However, that was almost a century ago, and Congress has not been that coherent lately.

ted_dunning · 2024-07-16T00:52:40 1721091160

I would think the latest one where SCOTUS ruled that the president was a king except in matters where the SCOTUS decides they aren't counts as a constitutional crisis.

FeepingCreature · 2024-07-16T01:34:51 1721093691

Constitutional crises are not a matter of opinion but of occurrence, arising from an actual power conflict between arms of the government that is caused by a conflicted reading of the constitutional text. Basically, if the system just ticks on, it's not a constitutional crisis.

If "I think this is a very bad decision" was cause for a constitutional crisis, any state with more than three digit population would be in constitutional crisis perpetually.

jolux · 2024-07-16T03:52:29 1721101949

> Constitutional crises are not a matter of opinion but of occurrence, arising from an actual power conflict between arms of the government that is caused by a conflicted reading of the constitutional text. Basically, if the system just ticks on, it's not a constitutional crisis.

This happened as recently as 2021-01-06; strong evidence that the military subverted the president to call the National Guard into Washington DC and secure the electoral count.

kbolino · 2024-07-16T12:59:50 1721134790

That's close. Both the excessively long lame duck period (2 months for Congress and 2.5 months for the President) and disunity between the President and the rest of the executive branch have also been fodder for crises in the past (Marbury v Madison, Andrew Johnson's impeachment).

hnfong · 2024-07-16T05:33:18 1721107998

If Trump didn't back down it could have definitely been a constitutional crisis.

I'd say it was narrowly averted though.

not2b · 2024-07-15T23:03:20 1721084600

That is how the SC used to work: they would decide cases on the narrowest possible grounds. If they don't have to decide a tough question, but they can finesse it with something simpler, good enough. More recently they have been willing to tear up decades of established law on a regular basis.

hnfong · 2024-07-16T05:29:19 1721107759

"Used to work"... this was 2021.

And generally courts/judges just choose the scope of their legal opinions based on how far reaching they want the legal principles to apply.

IMHO, copyright-ability of APIs is so far away from their political agenda that they probably just decided to leave the issue on a cliffhanger...

immibis · 2024-07-16T12:58:39 1721134719

Yes, "used to". Now, in 2024, the same supreme court has decided that presidents have immunity in all official acts, from stealing documents, up to and including assassination attempts on their opponents. This is a radical shift in how the court operates.

kbolino · 2024-07-16T13:03:47 1721135027

This "opponent assassination" hypothetical gets bandied about a lot but I have not seen any evidence that any court considers that to be an "official act". Official acts are constrained to legitimate exercises of constitutional authority and are not merely anything a President (or especially, an ex-President) does.

not2b · 2024-07-16T19:10:57 1721157057

It's specifically mentioned in the dissents.

tiberious726 · 2024-07-17T17:20:51 1721236851

The dissents, not the opinion

jpadkins · 2024-07-16T13:21:01 1721136061

the only thing radical is the opinions of people you are listening to if you believe SCOTUS enabled legally sanctioned assassinations. It was political hyperbole based on nothing, and it worked (with you). Think for yourself.

Wowfunhappy · 2024-07-15T22:02:02 1721080922

> CUDA is not designed to be vendor agnostic and Nvidia can make things arbitrarily difficult [...] technically.

(Let's put the legal questions aside for a moment.)

nVidia changes GPU architectures every generation / few generations, right? How does CUDA work across those—and how can it have forwards compatibility in the future—if it's not designed to be technologically agnostic?

saagarjha · 2024-07-16T00:47:04 1721090824

PTX is meant to be portable across GPU microarchitectures. That said, Nvidia owns the entire spec, so they can just keep adding new instructions that their GPUs now support but AMD GPUs don't.

andy_ppp · 2024-07-15T22:04:23 1721081063

One way is to make sure the hardware team does certain things to support easy transition to new architectures, we have seen this with Apple Silicon for example!

rjurney · 2024-07-16T01:21:39 1721092899

Not having a layer like this has left AMD completely out of the AI game that has made NVDA the world's most valuable company.

HarHarVeryFunny · 2024-07-16T19:10:06 1721157006

Well, they kinda have it with their hipify tool, although this is for porting CUDA code to AMD's HIP which supports both AMD and NVIDIA. This supports CUDA C code and libraries with AMD equivalents like cuDNN, cuBLAS, cuRAND, but doesn't support porting of CUDA C inline PTX assembler. AMD have their own inline GCN assembler, but seem to discourage it's use.

There are also versions of PyTorch, TensorFlow and JAX with AMD support.

PyTorch's torch.compile can generate Triton (OpenAI's GPU compiler) kernels, with Triton also supporting AMD.

ChoGGi · 2024-07-16T06:53:48 1721112828

Self-inflicted wounds hurt the most.

magic_hamster · 2024-07-16T03:57:08 1721102228

CUDA is the juice that built Nvidia in the AI space and allowed them to charge crazy money for their hardware. To be able to run CUDA on cost effective AMD hardware can be a big leap forward, allow more people to research, and break away from Nvidia's stranglehold over VRAM. Nvidia will never open source their own platform unless their hand is forced. I think we all should support this endeavor and contribute where possible.

amelius · 2024-07-15T20:09:26 1721074166

Like supporting x86 was a bad idea as well?

modeless · 2024-07-15T20:53:32 1721076812

Before starting, AMD signed an agreement with Intel that gave them an explicit license to x86. And x86 was a whole lot smaller and simpler back then in 1982. A completely different and incomparable situation.

nostrademons · 2024-07-15T21:30:23 1721079023

Technically it was after starting - AMD was founded in 1969 as a second-sourcer for Fairchild and National Semiconductor, and had reverse-engineered the 8080 by 1975 and acquired a formal license to it by 1976.

The 1982 deal you speak of was actually pretty interesting: as a condition of the x86's use in the IBM PC, IBM requested a second source for x86 chips. AMD was that source, and so they cross-licensed the x86 in 1982 to allow the IBM PC project to proceed forward. This makes the Intel/AMD deal even more important for both companies: the PC market would never have developed without the cross-licensing, which would've been bad for all companies involved. This gave Intel an ongoing stake in AMD's success at least until the PC market consolidated on the x86 standard.

karolist · 2024-07-15T20:13:05 1721074385

Was there a large entity steering x86 spec alone with a huge feature lead against their competition, free to steer the spec in any ways they choose? Also, hardware is not opensource software, you get big players onboard and they will be able to implement the spec they want every gen, software has more moving parts and unaligned parties involved.

cherryteastain · 2024-07-15T20:22:30 1721074950

> Was there a large entity steering x86 spec alone with a huge feature lead against their competition, free to steer the spec in any ways they choose?

Ever heard of Intel?

karolist · 2024-07-15T20:35:39 1721075739

I had't considered that angle. Is your point that Intel was the creator of x86, but software chose to support it, then AMD had nothing else but to play catch up in x86 support to be part of the software target market? If so and factual (I've no idea), fair point, I didn't know.

marshray · 2024-07-15T22:57:38 1721084258

It was exactly the same instruction set.

C compilers didn't offer an "AMD" CPU target* until AMD came out with the "AMD64" instruction set. Today we call this "x86_64" or "x64".

* Feel free to point out some custom multimedia vector extensions for Athlons or something, but the point remains.

gmokki · 2024-07-16T12:01:25 1721131285

And Intel named its licenced implementation of AMD64 as IA-32e, just to make it clear to everyone that it is based on Intel architecture 32bit version with an extension. Luckily they dropped that name few years later

viraptor · 2024-07-15T20:35:49 1721075749

Isn't cuDNN a much better case for reimplementing than CUDA? It has much more choice in how things actually happen and cuDNN itself chooses different implementations at runtime + does fusing. It seems way more generic and the reimplementation would allow using the best AMD-targeted kernel rather than one the original has.

ckitching · 2024-07-15T21:16:28 1721078188

AMD have "MIOpen" which is basically cuDNN-for-AMD. Ish.

mmis1000 · 2024-07-16T06:24:47 1721111087

And that thing is left for unreleased on windows for almost a whole year for unknown reason. Even though there is activity on github and build fix frequently. There is just no .exe or .msi for you to download. In fact, the rocm for linux is on major 6 release (which includes miopen). But somehow windows is still on major 5 (don't have miopen) for almost a whole year.

It almost make me wonder. Is there a shady trade somewhere to ask amd never release sdk for Windows to hike the price of nvidia card higher? Why they keep developing these without release it at all?

viraptor · 2024-07-17T08:29:49 1721204989

Since they cancelled the work on zluda and absolutely fail to do anything about other options, I really believe there's some "don't do it or you'll get sued to hell and back" agreement. They can't be so dumb they just miss it by accident.

flamedoge · 2024-07-16T23:33:57 1721172837

probably because their focus is on data centers that mostly run linux

raxxorraxor · 2024-07-16T11:00:42 1721127642

I really hope they will do what you suggested. With some innovative product placement, GPUs with a lot of memory for example, they could dethrone nvidia if it doesn't change strategy.

That said, easier said than done. You need very specialized developers to build a CUDA equivalent and have people start using it. AMD could do it with a more open development process leveraging the open source community. I believe this will happen at some point anyway by AMD or someone else. The market just gets more attractive by the day and at some point the high entry barrier will not matter much.

So why should AMD skimp on their ambitions here? This would be a most sensible investment, few risks and high gains if successful.

Sparkyte · 2024-07-16T09:47:56 1721123276

That is why an open standard should be made so it isn't locked to a particular piece of hardware and then allow modular support for different hardware to interface with supported drivers.

anigbrowl · 2024-07-15T21:52:28 1721080348

Given AMDs prior lack of interest I'll take whatever options there are. My daily driver has a Vega 10 GPU and it's been quite frustrating not to be able to easily leverage it for doing basic ML tasks, to the point that I've been looking at buying an external nvidia GPU instead just to try out some of the popular Python libraries.

dietr1ch · 2024-07-15T20:03:20 1721073800

How's this situation different than the one around Java, Sun/Oracle and Google?

dboreham · 2024-07-15T20:33:37 1721075617

The judge might not be a coder next time.

viraptor · 2024-07-15T20:39:27 1721075967

The US law is highly dependent on precedents. The Google-Oracle case has set one fortunately, so anything following it won't start from scratch. Fortunately we may not need a closer judge.

jjk166 · 2024-07-15T21:21:00 1721078460

Google-Oracle side stepped the issue of API copyrightability by saying Google's particular implementation would fall under fair use. Whether APIs are copyrightable remains an open question.

dylan604 · 2024-07-15T21:22:56 1721078576

Until you get an activist court

DeepYogurt · 2024-07-15T19:25:15 1721071515

Ya, honestly better to leave that to third parties who can dedicate themselves to it and maybe offer support or whatever. Let AMD work on good first party support first.

hot_gril · 2024-07-19T02:36:01 1721356561

Providing support in pytorch sounds obvious, but I have very little experience here. Why is this still a problem?

neutrinobro · 2024-07-16T01:24:30 1721093070

Cries in OpenCL

koolala · 2024-07-15T22:32:25 1721082745

CUDA v1...CUDA v2... CUDA v... CUDA isnt commonly assosiated with a version number...

Uehreka · 2024-07-15T22:43:57 1721083437

…yes it is? https://developer.nvidia.com/cuda-toolkit-archive

ladberg · 2024-07-15T19:41:51 1721072511

I don't really see how any code that depends heavily on the underlying hardware can "just work" on AMD. Most serious CUDA code is aware of register file and shared memory sizes, wgmma instructions, optimal tensor core memory & register layouts, tensor memory accelerator instructions, etc...

Presumably that stuff doesn't "just work" but they don't want to mention it?

lmeyerov · 2024-07-15T20:21:19 1721074879

Sort of

A lot of our hw-aware bits are parameterized where we fill in constants based on the available hw . Doable to port, same as we do whenever new Nvidia architectures come out.

But yeah, we have tricky bits that inline PTX, and.. that will be more annoying to redo.