More

refulgentis · 2025-01-12T21:24:23 1736717063

Non-trivial, IMHO - automating it sounds non trivial, and doing by hand is quite non-trivial, right? We gotta go make hand edits line by line?

omoikane · 2025-01-13T06:28:44 1736749724

For a language like C, I would describe the formatting process as "straightforward" as opposed to "trivial". It's straightforward in that it feels very mechanical, but not trivial in the sense that it's difficult to fully automate the process. (The mechanical part is inserting spaces at the right places, the not so mechanical part is swapping tokens and rewriting expressions so that things line up).

You can see an example recording here of how it's done:

https://www.ioccc.org/2019/yang/obfuscation.html

First half of it is writing and golfing, second half is the formatting bits.

For languages like Perl and Ruby, the formatting process is easily automated and mostly trivial.

refulgentis · 2025-01-13T06:44:46 1736750686

That visualizer is incredible, appreciate it and the insight - and 4 hours! Wow.

mianos · 2025-01-12T21:50:58 1736718658

Considering the language, C and the size of the file, I would not argue it is not trivial. The key feature of donut.c is how small the file actually is.

For something larger or in other languages, or having text strings, certainly not trivial.

refulgentis · 2025-01-12T21:23:25 1736717005

I've been deeply curious about the sort of speedup you get from doing what was in software, on hardware:

I know the chips hasn't been delivered yet, but, the statements at the beginning re: we can expect a new frame every N nanoseconds, give me hope there's a rough heuristic for what speedup we'd expect in this particular case.

Do we have a rough* understanding of what the speedup will be?

* Within 2 OOMs

Someone · 2025-01-13T09:24:38 1736760278

Speedup compared to what? You can’t compare it to doing it in software on the same machine. So, you need to pick a reference CPU.

Finding a somewhat comparable CPU would be challenging. It would need to run at around 48 MHz and have 16 bit integer addition. I think the fastest Z80 clone would be a candidate (https://en.wikipedia.org/wiki/Zilog_eZ80), but that’s a pipelined CPU. You may find that unfair.

If you don’t pick something comparable but just something cheap, given the 48MHz clock of this hardware and the limited amount of parallelism, it wouldn’t surprise me if the typical modern smartphone could do it faster, even without the use of a GPU.

megous · 2025-01-12T22:48:03 1736722083

New pixel. Not new frame.

refulgentis · 2025-01-12T21:11:21 1736716281

I get what you're saying, but it's a dizzying problem - the proximate problem to HR in these scenarios isn't that people think HR is bad or worry they'll relay the complaint, it's that HR has no teeth.

At Google, I had an...unusual...de-facto manager for 2 years. After declining their offer to join their team, they spent six months engaging in behavior that violated company policies - though not in obvious ways like sexual harassment or violence.

Going to HR seemed futile. Even assuming they could keep anonymity, their role is simply to investigate and report to management three levels up. Without clear-cut violations, it becomes a credibility contest. Since upper management selected the middle managers, they're inherently biased toward defending their choices.

The situation came to a head between December 1st and January 1st. My performance rating plummeted from "O" (top 10%) to "MI" (bottom 10%) and a technically separate formal warning. Things came to a head because they were nominally leading work that was a huge political warzone between organizations, and they didn't really want to do anything. After I politely demurred their invitation to transfer, they hired a childhood friend as my replacement - they're still launching "new" features with the code I wrote two years ago.

I documented various incidents over those six months that violated Google's code of conduct. But even in the best-case scenario:

- HR confirms the violations.

- They report findings to my third-level manager.

- My second-level manager (who chose my de-facto manager) blames me.

- The third-level manager (who chose the second-level manager) defers to their choice.

I entered Google wondering how they handled professional disagreements. (I was a young CEO for the 5 years prior) Seven years later, I realized they don't - disagreement itself is seen as inappropriate. This dynamic inevitably hurts those lowest in the hierarchy. It's not malicious; everyone believes they're acting appropriately. But once anyone decides to prioritize comfort over strict policy enforcement (a natural human tendency), standard human politics take over. I honestly, swear to god, saw this in abundance at Google in ways I never saw at a fast food job.

What I once dismissed as complaints from underperformers now seems depressingly accurate.

refulgentis · 2025-01-11T00:54:45 1736556885

I'm sorry, I don't understand what you mean. I checked the original article again too. As it stands, my understanding is you are claiming:

- blowing on a GPU (which I take to mean doing roughly nothing)

- gets roughly the same perf change

- as moving from fp16 to q4

danielhanchen · 2025-01-11T02:36:52 1736563012

Update - the Phi-4 team is working on adding all our fixes to the original model! https://huggingface.co/microsoft/phi-4/discussions/21

make3 · 2025-01-12T02:34:06 1736649246

hey this is great work, I'm sorry I complained, I'm thankful for what you're doing here

danielhanchen · 2025-01-12T02:58:08 1736650688

No worries at all! :)

danielhanchen · 2025-01-11T01:23:06 1736558586

Are you referring to the finetuning part?

The multiple bug fixes are separate from the finetuning sections - Unsloth itself makes finetuning 2x faster and use 70% less memory - the bug fixes are totally detached from finetuning - ie you can take the fixed version we uploaded at https://huggingface.co/unsloth/phi-4, and use it in any framework or inference engine.

Apologies I'm confused on the comment sorry.

If you're questioning the credibility of the bug fixes - we fixed 8 bugs in Gemma https://x.com/danielhanchen/status/1765446273661075609, multiple bugs in Llama, Mistral, Qwen, a gradient accumulation bug https://x.com/danielhanchen/status/1846235913443262891 and much more

grumpopotamus · 2025-01-11T04:29:52 1736569792

2x faster than what?

danielhanchen · 2025-01-11T04:40:09 1736570409

Oh 2x faster and uses >70% less memory than Hugging Face + Flash Attention 2! I did a CUDA / GPU Mode talk about it here: https://www.youtube.com/watch?v=hfb_AIhDYnA Also to the PyTorch team here: https://www.youtube.com/watch?v=MQwryfkydc0 and the PyTorch Conference here: https://www.youtube.com/watch?v=PdtKkc5jB4g

kouteiheika · 2025-01-11T12:00:36 1736596836

> Oh 2x faster and uses >70% less memory than Hugging Face + Flash Attention 2!

Is this doing the same type of fine-tuning, or are you comparing full bf16 fine-tuning in HF with 4-bit QLoRA in Unsloth (in which case it's not really an apples-to-apples comparison)? If it's the latter then do you have a comparison of the former?

danielhanchen · 2025-01-11T21:12:47 1736629967

Oh I compared 4bit QLoRA HF+FA2 with Unsloth 4bit QLoRA.

16bit LoRA have similar boosts in performance!

Full bf16 full finentuning is not yet supported, but it'll come out soon!

refulgentis · 2025-01-07T21:15:47 1736284547

Because of that factor, I'm not quite sure what's going on with the article or comments here altogether.

If you gave it to me in a cleanroom and told me I had to share my honest opinion, I'd say it was repeating universally agreeable things, and hitching it to some sort of solo endeavor to wed together a couple old 3D engines, with a lack of technical clarity, or even prose clarity beyond "I will be better than the others."

I assume given the other reactions that I'm missing something, because I don't know 3D engines, and it'd be odd to have universally positive responses just because it repeats old chestnuts.

refulgentis · 2025-01-06T05:08:22 1736140102

Excellent point, I'm not sure people are aware, but these are straight-up lifted from standard IQ tests, so they're definitely not all trivially humanly solvable.

I needed an official one for medical reasons a few years back

refulgentis · 2025-01-06T05:07:44 1736140064

Honestly, after that, I'm tuned out completely on him and ARC-AGI. Nice minor sidestory at one point in time.

He's right that this isn't solving all human-intelligence domain level problems.

But the whole stunt, this whole time, was that this was the ARC-AGI benchmark.

The conceit was the fact LLMs couldn't do well on it proved they weren't intelligent. And real researchers would step up to bench well on that, avoiding the ideological tarpit of LLMs, which could never be intelligent.

It's fine to turn around and say "My AGI benchmark says little about intelligence", but, the level of conversation is decidedly more that of punters at the local stables than rigorous analysis.

refulgentis · 2025-01-06T02:42:41 1736131361

> Seriously, most car thefts go un investigated.

This isn't true

> Am I really being asked to believe a warrant and nighttime raid was ordered over a hedge trimmer?

Yes, in fact, you say directly right before you'd like us to assume so, and assume that there was special attention to this case because someone important asked for it.

That seems reasonable.

The rest...not sure what you mean by "nighttime" or "raid" here.

Multiple cops doesn't imply what people generally would understand by " raid".

By default, due to safety protocols, there are multiple cops. The edge case is when there's one, you're extremely cooperative, and it's a minor infraction. I've watched too many police body cam videos, if its anything outside of that, they call for backup immediately.

I'd bet $1000 it was a search warrant served over a weed eater that had particular interest to a powerful figure, yet that it was all standard behavior.

Occam's razor here is what I see over and over again on bodycams: police protocols are designed to keep cops safe, and the proliferation of handguns creates charged situations that don't need to exist.

mindslight · 2025-01-06T03:45:36 1736135136

> the proliferation of handguns creates charged situations that don't need to exist.

This is an excellent point. If there isn't the political will to end sovereign/qualified immunity, then we should reduce the immediate damage that it does by ending this proliferation of guns to municipal police. When the rare situation does end up escalating into armed resistance, the local police can call in a special state police unit. That state police unit will have better training on respect for firearms, containment, and deescalation. And a second chain of command will serve as an additional check on the situation.

uberman · 2025-01-06T03:35:45 1736134545

The article directly calls it a raid. It also says the raid happened at 11:55pm.

Also only 10% of auto thefts result in an actual police investigation.

refulgentis · 2025-01-06T04:58:39 1736139519

Nah Mandela Effect: the article you read / source you read said 10% were solved, not that 10% were investigated.

To wit:

Title: Big Rise In US Car Thefts Date: January 24th, 2024 Link: https://www.newsweek.com/car-theft-rise-fbi-council-criminal...

"The "clearance rate" or rate of solving car thefts, has dropped from 26 percent in 1964 to 9 percent in 2022.

In comparison, the 2022 clearance rate was 12 percent for larceny and 13 percent for burglary. The homicide clearance rate in 2022 was about 50 percent."

refulgentis · 2025-01-06T00:16:47 1736122607

Boils down to "use Frida to find the arguments to the TensorFlow call beyond the model file"

Key here is, a binary model is just a bag-of-floats with primitively typed inputs and outputs.

It's ~impossible to write up more than what's here because either:

A) you understand reverse engineering and model basics, and thus the current content is clear you'd use Frida to figure out how the arguments are passed to TensorFlow

or

B) you don't understand this is a binary reverse engineering problem, even when shown Frida. If more content was provided, you'd see it as specific to a particular problem. Which it has to be. You'd also need a walkthrough by hand about batching, tokenization, so on and so forth, too much for a write up, and it'd be too confusing to follow for another model.

TL;Dr a request for more content is asking for a reverse engineering article to give you a full education on modal inference

TeMPOraL · 2025-01-06T13:28:12 1736170092

> It's ~impossible to write up more than what's here

Except you just did - or at least you wrote an outline for it, which is 80% of the value already.

refulgentis · 2025-01-06T16:41:46 1736181706

The more impolite version of this basically says "If you can't figure out you're supposed to also use Frida to check the other arguments, you have no business trying." I agree, though, wrote a more polite version.

littlestymaar · 2025-01-06T11:16:22 1736162182

> TL;Dr a request for more content is asking for a reverse engineering article to give you a full education on modal inference

I don't understand what you mean: I have no clue about anything related to reverse engineering, but I ported the mistral tokenizer to Rust and also wrote a basic CPU Llama training and inference implementation in Rust, so I definitely wouldn't need an intro to model inference…

refulgentis · 2025-01-06T16:42:30 1736181750

You're also not the person I'm replying to, nor do you appear in any of this comment chain, so I've definitely not implied you need an intro to inference, so I'm even more confused than you :)

littlestymaar · 2025-01-08T08:37:54 1736325474

I share the sentiment of the person you're responding to, and I didn't understand your response, that's it.

refulgentis · 2025-01-05T21:09:00 1736111340

This is a good comment, but only in the sense it documents a model file doesn't run the model by itself.

An analogous situation is seeing a blog that purports to "show you code", and the code returns an object, and commenting "This is cool, but doesn't show you how to turn a function return value into a human readable format" More noise, than signal.

The techniques in the article are trivially understood to also apply to discovering the input tokenization format, and Netron shows you the types of inputs and outputs.

Thanks for the article OP, really fascinating.

ipsum2 · 2025-01-05T22:31:38 1736116298

Just having the shape of the input and output are not sufficient, the image (in this example) needs to be normalized. It's presumably not difficult to find the exact numbers, but it is a source of errors when reverse engineering a ML model.

refulgentis · 2025-01-06T00:17:43 1736122663

Right, you get it: it's a Frida problem.