Hacker Newsnew | past | comments | ask | show | jobs | submit | jmalicki's commentslogin

For lifespan, AWS is still running a ton of T4 GPUs from 2018, that power a lot of computer vision models. A ton of these will have a long life, not all ML is about frontier LLMs.

How can it be economically viable to still run them?

You can get 100x the output with the same energy use.


While the 100× is, I think, rather hyperbolic, there is a real and large efficincy difference, but its economically viable to run them because the supply of newer GPUs is insufficient to meet the demand for compute, so they can charge enough to cover costs for the old ones and a premium (relative to operating costs) for the newer ones.

It would be economically unviable to run the older ones if the supply of newer ones were unconstrained, but that’s not the world we live in.


Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-precision math per watt? Where are the efficiency gains?

Edit: It looks like they replaced INT8 and INT4 with FP8 and FP4, with the same speedups of 2x and 4x relative to FP16. That's an improvement but not that big of an improvement.


As long as you have customers that are willing to pay more than it cost you are fine. And with AWS seemingly there is plenty of those. So question isn't is this most efficient way but will someone pay at price that is above what new hardware could attain.

Presumably people using AWS are paying more than they cost to run, and AWS has finite bandwidth to upgrade things due to personel, etc.

Good question!

Maybe the capabilities of newer GPUs allow AWS to charge higher margins for them? I don't actually know.


There has not been a "100x" in efficiency in the past 6-8 years.

I don't know about this specifically, but I've seen a lot of big data jobs where 99% of the CPU was spent in JSON ser/deser. This might be a reasonable chunk of it.

JSON ser deser is usually dominated by floats rather than ints, and they are more expensive to handle.

Though when we talk about JSON (5 bytes - 40 bits) the processing throughput is 20Gbps (this algorithm) vs 5Gbps (previous implementation, 4x slower).

I doubt CPU cycles are problem in that case.


I agree - I was more talking about naive JSON ser/deser that Lemire was not involved in.

pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.

What would unsupervised mean, would unsupervised be something like alphago playing against itself trillions of times?

Whereas self-supervised, allows learning without explicit annotation of data ; but it doesn't matter if the models already trained on the entire Internet, and it's not like a game where it can come up with effectively new training data for itself?


Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.


fair point — OpenAI's original plan literally said "solve unsupervised learning". the self-supervised distinction wasnt really standard til after BERT/GPT popularized it

I think it's an extremely important distinction because self supervised learning has real inherent reward signals. Something like clustering does not.

A merely wet road (1mm of water) and one with 10cm can be hard to distinguish. If you avoid the former, you can't operate in rain at all.

It has lidar and radar, not just vision.

Pytorch dataloaders are often horribly inefficient, a lot of stuff there can benefit from Rust/C++

If you have ECC memory, you can actually monitor this.

I've typically seen ~a dozen bitflips per year per machine when I looked at this on servers, except for the cases of a faulty RAM module.

I am more worried about SSD corruption than RAM bitflips from data I've seen on my systems.


It benefited them in the past, that allowed them to build up their fortunes. Bill Gates, for example, is now a big holder of farmland. Science allows others to build up fortunes that challenge theirs, and hurts the stasis in which they become gilded aristocracy.

Lowering their taxes while burning everything to the ground benefits them now.


I'd argue that it doesn't actually benefit them now since they have more access to comfort than they could ever conceivably consume in their lifetime but I do absolutely agree that they think it benefits them because people who have accumulated wealth to that degree are highly fixated on making the number go up.

A less just, less stable society is far more likely to demonize and destroy billionaires. If you have such a high level of wealth the most rational action is charitability to insure the wealth of people who surround you to prevent instability and lower the chances you'll be the victim of a crime carried out due to desperation.


They want a less just, more stable society.

Allowing others to build wealth just makes society less stable from their point of view. Better to keep the poor poor.


> Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default?

It's not the default, because the training data is full of unmaintainable code done wrong with mistakes. People literally complain that LLMs write too many tests or add comments.

If instead of "do it right", you give it specific actionable advice of how to right code, it does surprisingly well. Newer frontier models also do a great job of mimicking the style and rigor of the surrounding codebase without prompting, if you're working in an established codebase, for better or worse.


That's because it's law, and the IRS doesn't get to decide that, courts do.

There are parts of the tax code that mean different things in different parts of the US because of conflicting circuit court decisions that haven't made it to the Supreme Court.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: