Hacker Newsnew | past | comments | ask | show | jobs | submit | refulgentis's commentslogin

n^2 isn't a setting someone chose, it's a mathematical consequence of what attention is.

Here's what attention does: every token looks at every other token to decide what's relevant. If you have n tokens, and each one looks at n others, you get n * n = n^2 operations.

Put another way: n^2 is when every token gets to look at every other token. What would n^3 be? n^10?

(sibling comment has same interpretation as you, then handwaves transformers can emulate more complex systems)


There are lots more complicated operations than comparing every token to every other token & the complexity increases when you start comparing not just token pairs but token bigrams, trigrams, & so on. There is no obvious proof that all those comparisons would be equivalent to the standard attention mechanism of comparing every token to every other one.

While you are correct at a higher level, comparing bigrams/trigrams would be less compute not more because there’s fewer of them in a given text

I'm correct on the technical level as well: https://chatgpt.com/s/t_698293481e308191838b4131c1b605f1

That math is for comparing all n-grams for all n <= N simultaneously, which isn't what was being discussed.

For any fixed n-gram size, the complexity is still O(N^2), same as standard attention.


I was talking about all n-gram comparisons.

Thanks for clarifying. I was hoping to clarify the disconnect between you two, looked like on on "bigrams, trigrams, & so on." It reads idiomatically as enumerating fixed-n cases. Parsing "& so on" as "their simultaneous union" asks quite a bit of those two words. Either way, as ChatGPT showed you and you shared, all-ngram comparison brings us to O(N^3), still several exponents short of N^10 that started this thread.

This is getting tiresome. I can make the operations as complicated as necessary by comparing all possible permutations of the input string w/ every other permutation & that will not be reducible to standard attention comparisons. The n-gram was a simple example anyone should be able to understand. You can ask your favorite chatbot to compute the complexity for the permutation version.

No worries! I enjoyed it fwiw, appreciate your time :) (The permutation version would be factorial, fwiw, not polynomial. Different beast entirely.)

That skips an important part: the "deep" in "deep learning".

Attention already composes across layers.

After layer 1, you're not comparing raw tokens anymore. You're comparing tokens-informed-by-their-context. By layer 20, you're effectively comparing rich representations that encode phrases, relationships, and abstract patterns. The "higher-order" stuff emerges from depth. This is the whole point of deep networks, and attention.

TL;DR for rest of comment: people have tried shallow-and-wide instead of deep, it doesn't work in practice. (rest of comment fleshes out search/ChatGPT prompt terms to look into to understand more of the technical stuff here)

A shallow network can approximate any function (universal approximation theorem), but it may need exponentially more neurons. Deep networks represent the same functions with way fewer parameters. There's formal work on "depth separation",functions that deep nets compute efficiently, but shallow nets need exponential width to match.

Empirically, People have tried shallow-and-wide vs. deep-and-narrow many times, across many domains. Deep wins consistently for the same parameter budget. This is part of why "deep learning" took off, the depth is load-bearing.

For transformers specifically, stacking attention layers is crucial. A single attention layer, even with more heads or bigger dimensions, doesn't match what you get from depth. The representations genuinely get richer in ways that width alone can't replicate.


Moltbot is not de regieur prompt injection, i.e. the "is it instructions or data?" built-in vulnerability.

This was "I'm going to release an open agent with an open agents directory with executable code, and it'll operate your personal computer remotely!", I deeply understand the impulse, but, there's a fine line between "cutting edge" and "irresponsible & making excuses."

I'm uncertain what side I would place it on.

I have a soft spot for the author, and a sinking feeling that without the soft spot, I'd certainly choose "irresponsible".


The feeling I get is 'RCE exploits as a Service'

Based on this tall tale being the top comment, and the replies to you, I hereby request dang remove the “HN is becoming Reddit” nono from the guidelines.


I agree. I don't understand the argument that just because people have incorrectly claimed HN is turning into reddit before, it automatically means it can never happen. I think HN has become a lot more like reddit and in some ways even worse.

> I don't understand the argument that just because people have incorrectly claimed HN is turning into reddit before, it automatically means it can never happen.

I don’t think the argument is it can never happen, but rather that at the time of writing of the guidelines it hadn’t. And that it is an argument that doesn’t advance the discussion. Complaining that HN is becoming like Reddit is one of the things which makes HN more like Reddit.



Can you read the article a little more closely?

> - MiniMax can't fit on an iPhone.

They asked MiniMax on their computer to make an iPhone app that didn't work.

It didn't work using the Apple Intelligence API. So then:

* They asked Minimax to use MLX instead. It didn't work.

* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.

* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.

> Better to dig in a bit more.

The author already did 100% of the digging and then some.

Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.


Emptied out post, thanks for the insight!

Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.

EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.


> Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.

No, the claim is their particular device has a hardware defect that causes MLX not to work (which includes Apple Intelligence).

> EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.

Your comment originally read:

> This is blinkered.

> - MiniMax can't fit on an iPhone.

> - There's no reason to expect models to share OOMs for output.

> - It is likely this is a graceful failure mode for the model being far too large.

> No fan of Apple's NIH syndrome, or it manifested as MLX.

> I'm also no fan of "I told the robot [vibecoded] to hammer a banana into an apple. [do something impossible]. The result is inedible. Let me post to HN with the title 'My thousand dollars of fruits can't be food' [the result I have has ~nothing to do with the fruits]"

> Better to dig in a bit more.

Rather than erase it, and invite exactly the kind of misreading you don't want, you can leave it... honestly, transparently... with your admission in the replies below. And it won't be downvoted as much as when you're trying to manipulate / make requests of others to try to minimize your downvotes. Weird... voting... manipulating... stuff, like that, tends to be frowned upon on HN.

You have more HN karma than I do, even, so why care so much about downvotes...

If you really want to disown something you consider a terrible mistake, you can email the HN mods to ask for the comment to be dissociated from your account. Then future downvotes won't affect your karma. I did this once.


Oh no, all my meaningless internet points, gone!

Then future downvotes won't affect your karma.

Who cares? The max amount of karma loss is 4 points, we can afford to eat our downvotes like adults.


Huh. I thought the minimum comment score was -4 (which would make the maximum amount of karma loss 5, since each comment starts at 1 point), but I didn't know if that was a cap on karma loss or just a cap on comment score.

45 years of writing code, and I still commit fence post errors. :eyeroll:

I'm an AI rage enthusiast too. Feel free to downvote me for free.

Really good question...TL;DR: I'd put it around when Mark Z decided Instagram also had to be Snapchat. (copied Stories) It normalized a behavior of copying.

I had gotten completely out of these apps, then ended up in a situation where I needed to use Snapchat daily if not hourly for messaging, and needed to use TikTok to be culturally literate. (i.e. I got into something romantic with someone younger).

It was a stunning experience. Seeing _everyone_ had normalized this "copy our competitor" strat, hill-climbing on duration of engagement.

YouTube Shorts is a crappy copy of TikTok with mostly TikTok reposts and no sense of community.

Snapchat has a poor clone of TikTok that I doubt anyone knows exists.

TikTok is the ur-engagement king. Pure dopamine, just keep swiping until something catches your attention, and swipe as soon as it stops. No meaningful 1:1 communicating aspect (there's messages, but AFAICT from light quizzing of Gen Zers, it's not used for actual communication)

Instagram specifically is hard for me to speak to, because Gen Zers seem to think its roughly as cool as Facebook, but my understanding is millennials my age or younger (I'm 37) use it more regularly, whereas Gen Z uses it more as like we'd think of Facebook, a generic safe place where grandma can see your graduation photos, as opposed to spontaneous thirst traps.


> and needed to use TikTok to be culturally literate

I wish we had something like Lurkmore for the modern Internets™. KYM could be it, but it seems to focus on random celebrity gossip instead? Idk.


This is a really good point.

It made me realize there's something weird about TikTok, it's kind of like Twitter except with an audience much more compliant with/sanguine about The Algorithm. i.e. no one's fighting for a "people I follow only" feed (there is one, but it's not worth fighting for, in a cultural sense)

They will overpromote one thing and some story you're not part of will be hyperviral for 6 hours, so there's almost no time for someone else to digest it. (example that comes to mind is Solidcore Guy, https://people.com/man-finds-empty-6-a-m-solidcore-class-fil..., took People 10 days to catch up to something you'd need to know for small talk in a 48 hour window)


Putting it all on the table: do you agree with the claim that binary analysis is just as good as source code analysis?

Binary analysis is vastly better than source code analysis, reliably detecting bugdoors via source code analysis tends to require an unrealistically deep knowledge of compiler behavior.

Empirically it doesn't look like there's a meaningful difference, does it?

Not having the source code hasn't stopped people from finding exploits in Windows (or even hardware attacks like Spectre or Meltdown). Having source code didn't protect against Heartbleed or log4j

I'd conclude it comes down to security culture (look how things changed after the Trustworthy Computing initiative, or OpenSSL vs LibreSSL) and "how many people are looking" -- in that sense, maybe "many eyes [do] make bugs shallow" but it doesn't seem like "source code availability" is the deciding factor. Rather, "what are the incentives" -- both on the internal development side and the external attacker side


I don't agree with "vastly better" but its arguable both in the direction and magnitude. I don't think you could plausibly argue that binary analysis is "vastly harder".

Nono, analyzing binaries is harder.

But it's still possible. And analyzing source code is still hard.


Agents are sort of irrelevant to this discussion, no?

Like, it's assuredly harder for an agent than having access to the code, if only because there's a theoratical opportunity to misunderstand the decompile.

Alternatively, it's assuredly easier for an agent because given execution time approaches infinity, they can try all possible interpretations.


Agents meaning an AI iteratively trying different things to try to decompile the code. Presumably in some kind of guess and check loop. I don’t expect a typical LLM to be good at this on its first attempt. But I bet Cursor could make a good stab at it with the right prompt.

Cursor is a bit old at this point, the state of the art is Claude Code and imitators (ChatGPT Codex, OpenCube).

Devin is also going very strong, but it's a bit quieter and growing in enterprises (and pretty sure it uses Claude Opus 4.5 and possibly Claude Code itself). In fact Clawdbot/Moltbot/OpenClaw was itself created with devin.

The big difference is the autonomy these models have (Devin more than Claude Codes), Cursor was meant to work in an IDE and that was a huge strength during the 12 months that the models still weren't strong enough to work autonomously, but they are getting to the point where that's becoming a weakness. Models like Devin are getting a slower acceleration but higher top speed advantage. My chips are on Devin


This comment comes across as unnecessarily aggressive and out of nowhere (Stallman?), it's really hard to parse.

Does this rewording reflect it's meaning?

"You don't actually need code to evaluate security, you can analyze a binary just as well."

Because that doesn't sound correct?

But that's just my first pass, at a high level. Don't wanna overinterpret until I'm on surer ground about what the dispute is. (i.e. don't want to mind read :) )

Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure", which sounds facile. (presumably something is being written to somewhere for the whole chat thing to work, can't do P2P because someone's app might not be open when you send)


https://www.gnu.org/philosophy/free-sw.html

Whether the original comment knows it or not, Stallman greatly influenced the very definition of Source Code, and the claim being made here is very close to Stallman's freedom to study.

>"You don't actually need code to evaluate security, you can analyze a binary"

Correct

>"just as well"

No, of course analyzing source code is easier and analyzing binaries is harder. But it's still possible (feasible is the word used by the original comment)

>Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure",

I didn't say anything about that? I mean those are valid tactics as part of a wider toolset, but I specifically said binaries, because it maps one to one with the source code. If you can find something in the source code, you can find it in the binary and viceversa. Analyzing file accesses and networks, or runtime analysis of any kind, is going to mostly be orthogonal to source code/binary static analysis, the only difference being whether you have a debug map to source code or to the machine code.

This is a very central conflict of Free Software, what I want to make clear is that Free Software refuses to study closed source software, not because it is impossible, but because it is unjustly hard. Free Software never claims it is impossible to study closed source software, it claims that source code access is a right, and they prefer rejecting to use closed source software, and thus never need to perform binary analysis.


Binaries absolutely don't map one-to-one with source code. Compilers optimize out dead code, elide entire subroutines to single instructions, perform loop unrolling and auto-vectorization, and many many more optimizations and transformations that break exact mapping.

That is true, but I don't think I ever said that binaries map one-to-one with source code.

I was referring to source code to binary maps, these are files that map binary locations to source code locations. In C (gcc/gdb) these are debug objects, they are also used in gdb style debuggers like Python's pdb, Java's jdb. They also exist in js/ts when using minifiers or react, so that you are able to debug in production.


>> So they don’t effectively block communication to and from the device?

> That is impossible to know without knowing the characteristics of the signal

Do you dispute that de facto these products work?


Yes I have spent thousands of dollars and months testing them.

You can cut off GPS and high frequency cell spectrum pretty easily. Most cell spectrum is effectively attenuated by good quality professional RF enclosures designed for those frequencies. 2.4 ghz signals like wifi (with good quality radios) are hard to attenuate to the point where they can’t connect to other radios nearby, even with very expensive RF test enclosures.

If you’re trying to block against an unknown threat you are fucked. If someone wanted to back door a baseband they’d probably make it transmit at low speeds and low frequencies to be resistant to attenuation.


What kind of RF can travel through a few mm of metal?


If you blindly “put metal around” an antenna, most of the time, you have just created another antenna!

Go put a phone in a filing cabinet and call it. It will ring.


Jesus. That sucks. Thanks for sharing, fascinating stuff.


I will, I've got two and neither one can shut off a longwave FM radio.


AI written with zero sources, links, anything, shouldn't be acceptable, especially to HNers. Makes me sad. :(

Apologies you both have to deal with this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: