More

jmalicki · 2026-05-24T19:28:29 1779650909

For lifespan, AWS is still running a ton of T4 GPUs from 2018, that power a lot of computer vision models. A ton of these will have a long life, not all ML is about frontier LLMs.

epolanski · 2026-05-24T23:00:21 1779663621

How can it be economically viable to still run them?

You can get 100x the output with the same energy use.

dragonwriter · 2026-05-25T02:47:35 1779677255

While the 100× is, I think, rather hyperbolic, there is a real and large efficincy difference, but its economically viable to run them because the supply of newer GPUs is insufficient to meet the demand for compute, so they can charge enough to cover costs for the old ones and a premium (relative to operating costs) for the newer ones.

It would be economically unviable to run the older ones if the supply of newer ones were unconstrained, but that’s not the world we live in.

Dylan16807 · 2026-05-25T03:21:46 1779679306

Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-precision math per watt? Where are the efficiency gains?

Edit: It looks like they replaced INT8 and INT4 with FP8 and FP4, with the same speedups of 2x and 4x relative to FP16. That's an improvement but not that big of an improvement.

Ekaros · 2026-05-25T07:59:45 1779695985

As long as you have customers that are willing to pay more than it cost you are fine. And with AWS seemingly there is plenty of those. So question isn't is this most efficient way but will someone pay at price that is above what new hardware could attain.

Marsymars · 2026-05-25T00:27:56 1779668876

Presumably people using AWS are paying more than they cost to run, and AWS has finite bandwidth to upgrade things due to personel, etc.

jmalicki · 2026-05-25T00:08:14 1779667694

Good question!

Maybe the capabilities of newer GPUs allow AWS to charge higher margins for them? I don't actually know.

HDBaseT · 2026-05-25T02:14:10 1779675250

There has not been a "100x" in efficiency in the past 6-8 years.

jmalicki · 2026-05-24T14:03:19 1779631399

I don't know about this specifically, but I've seen a lot of big data jobs where 99% of the CPU was spent in JSON ser/deser. This might be a reasonable chunk of it.

aardvark179 · 2026-05-24T19:08:48 1779649728

JSON ser deser is usually dominated by floats rather than ints, and they are more expensive to handle.

xlii · 2026-05-24T20:45:07 1779655507

Though when we talk about JSON (5 bytes - 40 bits) the processing throughput is 20Gbps (this algorithm) vs 5Gbps (previous implementation, 4x slower).

I doubt CPU cycles are problem in that case.

jmalicki · 2026-05-24T21:26:30 1779657990

I agree - I was more talking about naive JSON ser/deser that Lemire was not involved in.

jmalicki · 2026-05-24T12:06:35 1779624395

pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.

sigbottle · 2026-05-24T14:04:46 1779631486

What would unsupervised mean, would unsupervised be something like alphago playing against itself trillions of times?

Whereas self-supervised, allows learning without explicit annotation of data ; but it doesn't matter if the models already trained on the entire Internet, and it's not like a game where it can come up with effectively new training data for itself?

jmalicki · 2026-05-24T18:05:42 1779645942

Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.

cold_harbor · 2026-05-24T14:10:27 1779631827

fair point — OpenAI's original plan literally said "solve unsupervised learning". the self-supervised distinction wasnt really standard til after BERT/GPT popularized it

jmalicki · 2026-05-24T18:08:07 1779646087

I think it's an extremely important distinction because self supervised learning has real inherent reward signals. Something like clustering does not.

jmalicki · 2026-05-23T17:26:34 1779557194

A merely wet road (1mm of water) and one with 10cm can be hard to distinguish. If you avoid the former, you can't operate in rain at all.

jmalicki · 2026-05-23T17:25:33 1779557133

It has lidar and radar, not just vision.

jmalicki · 2026-05-23T17:17:13 1779556633

Pytorch dataloaders are often horribly inefficient, a lot of stuff there can benefit from Rust/C++

jmalicki · 2026-05-23T16:47:11 1779554831

If you have ECC memory, you can actually monitor this.

I've typically seen ~a dozen bitflips per year per machine when I looked at this on servers, except for the cases of a faulty RAM module.

I am more worried about SSD corruption than RAM bitflips from data I've seen on my systems.

jmalicki · 2026-05-22T17:09:13 1779469753

It benefited them in the past, that allowed them to build up their fortunes. Bill Gates, for example, is now a big holder of farmland. Science allows others to build up fortunes that challenge theirs, and hurts the stasis in which they become gilded aristocracy.

Lowering their taxes while burning everything to the ground benefits them now.

munk-a · 2026-05-22T17:42:53 1779471773

I'd argue that it doesn't actually benefit them now since they have more access to comfort than they could ever conceivably consume in their lifetime but I do absolutely agree that they think it benefits them because people who have accumulated wealth to that degree are highly fixated on making the number go up.

A less just, less stable society is far more likely to demonize and destroy billionaires. If you have such a high level of wealth the most rational action is charitability to insure the wealth of people who surround you to prevent instability and lower the chances you'll be the victim of a crime carried out due to desperation.

jmalicki · 2026-05-22T20:11:13 1779480673

They want a less just, more stable society.

Allowing others to build wealth just makes society less stable from their point of view. Better to keep the poor poor.

jmalicki · 2026-05-22T17:04:07 1779469447

> Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default?

It's not the default, because the training data is full of unmaintainable code done wrong with mistakes. People literally complain that LLMs write too many tests or add comments.

If instead of "do it right", you give it specific actionable advice of how to right code, it does surprisingly well. Newer frontier models also do a great job of mimicking the style and rigor of the surrounding codebase without prompting, if you're working in an established codebase, for better or worse.

jmalicki · 2026-05-21T04:17:13 1779337033

That's because it's law, and the IRS doesn't get to decide that, courts do.

There are parts of the tax code that mean different things in different parts of the US because of conflicting circuit court decisions that haven't made it to the Supreme Court.